Associate Professor Hongyu Zhang

Associate Professor Hongyu Zhang

Associate Professor

School of Electrical Engineering and Computing (Computer Science and Software Engineering)

Career Summary

Biography


Dr Hongyu Zhang is currently an Associate Professor at The University of Newcastle, Australia. Previously, he was a Lead Researcher at Microsoft Research Asia, an Associate Professor at Tsinghua University, China, and a Lecturer at RMIT University, Australia. He received the PhD degree from National University of Singapore in 2003. His research is in the area of Software Engineering, in particular, software analytics, testing, maintenance, metrics, and reuse. The main theme of his research is to improve software quality and productivity by mining and analyzing software data. He has published more than 100
Research Papers in international journals and conferences, including TSE, TOSEM, ICSE, FSE, ASE, ISSTA, POPL, AAAI, ICSM, ICDM, and USENIX ATC. He received two ACM Distinguished Paper awards. He has also served as a program committee member for many software engineering conferences. More information about him can be found at his Personal Webpage (or GitHub page).

I am always looking for smart, self-motivated students! Please send me your CV if you are interested in working with me.

Information about UON Research Scholarships and International Scholarships.
For Chinese students: UON has an agreement with the China Scholarship Council (CSC) to enroll PhD candidates.


Qualifications

  • Doctor of Philosophy, National University of Singapore

Keywords

  • Software Engineering

Languages

  • Mandarin (Mother)
  • English (Fluent)

Fields of Research

Code Description Percentage
080309 Software Engineering 100

Professional Experience

UON Appointment

Title Organisation / Department
Associate Professor University of Newcastle
School of Electrical Engineering and Computing
Australia
Edit

Publications

For publications that are currently unpublished or in-press, details are shown in italics.


Journal article (22 outputs)

Year Citation Altmetrics Link
2017 Xuan J, Jiang H, Zhang H, Ren Z, 'Developer recommendation on bug commenting: a ranking approach for the developer crowd', Science China-Information Sciences, 60 072105-1-072105-18 (2017) [C1]
DOI 10.1007/s11432-015-0582-8
2016 Wu R, Xiao X, Cheung S-C, Zhang H, Zhang C, 'Casper: An Efficient Approach to Call Trace Collection', ACM SIGPLAN NOTICES, 51 678-690 (2016)
DOI 10.1145/2837614.2837619
Citations Scopus - 3
2016 Xia X, Gong L, Le T-DB, Lo D, Jiang L, Zhang H, 'Diversity maximization speedup for localizing faults in single-fault and multi-fault programs', AUTOMATED SOFTWARE ENGINEERING, 23 43-75 (2016)
DOI 10.1007/s10515-014-0165-z
Citations Scopus - 4Web of Science - 3
2015 Li M, Zhang H, Lo D, Lucia, 'Improving Software Quality and Productivity Leveraging Mining Techniques', ACM SIGSOFT Software Engineering Notes, 40 1-2 (2015)
DOI 10.1145/2693208.2693219
2014 Gong L, Zhang H, Seo H, Kim S, 'Locating Crashing Faults based on Crash Stack Traces.', CoRR, abs/1404.4100 (2014)
2013 Peters F, Menzies T, Gong L, Zhang H, 'Balancing Privacy and Utility in Cross-Company Defect Prediction', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 39 1054-1068 (2013)
DOI 10.1109/TSE.2013.6
Citations Web of Science - 12
2013 Concas G, Lunesu MI, Marchesi M, Zhang H, 'Simulation of software maintenance process, with and without a work-in-process limit', JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 25 1225-1248 (2013)
DOI 10.1002/smr.1599
Citations Web of Science - 5
2012 Li M, Zhang H, Wu R, Zhou Z-H, 'Sample-based software defect prediction with active and semi-supervised learning', AUTOMATED SOFTWARE ENGINEERING, 19 201-230 (2012)
DOI 10.1007/s10515-011-0092-1
Citations Web of Science - 31
2011 Zhang H, Tan HBK, Zhang L, Lin X, Wang X, Zhang C, Mei H, 'Checking enforcement of integrity constraints in database applications based on code patterns', JOURNAL OF SYSTEMS AND SOFTWARE, 84 2253-2264 (2011)
DOI 10.1016/j.jss.2011.06.044
Citations Web of Science - 2
2010 Canfora G, Concas G, Marchesi M, Tempero E, Zhang H, '2010 ICSE workshop on emerging trends in software metrics', ACM SIGSOFT Software Engineering Notes, 35 51-51 (2010)
DOI 10.1145/1838687.1838700
2010 Zhang H, Li Y-F, Tan HBK, 'Measuring design complexity of semantic web ontologies', JOURNAL OF SYSTEMS AND SOFTWARE, 83 803-814 (2010)
DOI 10.1016/j.jss.2009.11.735
Citations Web of Science - 34
2010 Zhang H, Kim S, 'Monitoring Software Quality Evolution for Defects', IEEE SOFTWARE, 27 58-64 (2010)
DOI 10.1109/MS.2010.66
Citations Web of Science - 11
2010 Concas G, Cantone G, Tempero E, Zhang H, 'New Generation of Software Metrics', Advances in Software Engineering, 2010 1-2 (2010)
DOI 10.1155/2010/913892
2009 Zhang H, 'Discovering power laws in computer programs', INFORMATION PROCESSING & MANAGEMENT, 45 477-483 (2009)
DOI 10.1016/j.ipm.2009.02.001
Citations Web of Science - 5
2009 Tan HBK, Zhao Y, Zhang H, 'Conceptual Data Model-Based Software Size Estimation for Information Systems', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 19 (2009)
DOI 10.1145/1571629.1571630
Citations Web of Science - 8
2009 Zhang H, Tan HBK, Marchesi M, 'The Distribution of Program Sizes and Its Implications: An Eclipse Case Study', CoRR, abs/0905.2288 (2009)
2008 Zhang H, 'On the distribution of software faults', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 34 301-302 (2008)
DOI 10.1109/TSE.2007.70771
Citations Web of Science - 18
2007 Zhang H, Zhang X, 'Comments on "data mining static code attributes to learn defect predictors"', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 33 635-636 (2007)
DOI 10.1109/TSE.2007.70706
Citations Scopus - 63Web of Science - 34
2007 Wang HH, Li YF, Sun J, Zhang H, Pan J, 'Verifying feature models using OWL', JOURNAL OF WEB SEMANTICS, 5 117-129 (2007)
DOI 10.1016/j.websem.2006.11.006
Citations Scopus - 66Web of Science - 36
2005 Zhang H, Jarzabek S, 'A Bayesian Network approach to rational architectural design', International Journal of Software Engineering and Knowledge Engineering, 15 695-717 (2005)

In software architecture design, we explore design alternatives and make decisions about adoption or rejection of a design from a web of complex and often uncertain information. D... [more]

In software architecture design, we explore design alternatives and make decisions about adoption or rejection of a design from a web of complex and often uncertain information. Different architectural design decisions may lead to systems that satisfy the same set of functional requirements but differ in certain quality attributes. In this paper, we propose a Bayesian Network based approach to rational architectural design. Our Bayesian Network helps software architects record and make design decisions. We can perform both qualitative and quantitative analysis over the Bayesian Network to understand how the design decisions influence system quality attributes, and to reason about rational design decisions. We use the KWIC (Key Word In Context) example to illustrate the principles of our approach. © World Scientific Publishing Company.

DOI 10.1142/S0218194005002488
Citations Scopus - 7
2004 Zhang H, Jarzabek S, 'XVCL: A mechanism for handling variants in software product lines', Science of Computer Programming, 53 381-407 (2004)

Software reuse focused on product lines has emerged as one of the promising ways to increase software productivity and quality. XVCL (XML-based Variant Configuration Language) is ... [more]

Software reuse focused on product lines has emerged as one of the promising ways to increase software productivity and quality. XVCL (XML-based Variant Configuration Language) is a variability mechanism that we developed for handling variants in software product lines. We apply XVCL to develop product line assets (including the domain model, product line architecture and generic components) as a set of x-frames that are capable of accommodating both commonality and variability in a domain. Specific systems, members of a product line, can be constructed by adapting and composing x-frames. In this paper, we illustrate our approach using examples from our product line project on the Computer Aided Dispatch¿(CAD) domain. © 2004 Elsevier B.V. All rights reserved.

DOI 10.1016/j.scico.2003.04.007
Citations Scopus - 38
2003 Zhang H, Jarzabek S, Yang B, 'Quality prediction and assessment for product lines', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2681 681-695 (2003)

In recent years, software product lines have emerged as a promising approach to improve software development productivity in IT industry. In the product line approach, we identify... [more]

In recent years, software product lines have emerged as a promising approach to improve software development productivity in IT industry. In the product line approach, we identify both commonalities and variabilities in a domain, and build generic assets for an organization. Feature diagrams are often used to model common and variant product line requirements and can be considered part of the organizational assets. Despite their importance, quality attributes (or non-functional requirements, NFRs) such as performance and security have not been sufficiently addressed in product line development. A feature diagram alone does not tell us how to select a configuration of variants . to achieve desired quality attributes of a product line member. There is a lack of an explicit model that can represent the impact of variants on quality attributes. In this paper, we propose a Bayesian Belief Network (BBN) based approach to quality prediction and assessment for a software product line. A BBN represents domain experts' knowledge and experiences accumulated from the development of similar projects. It helps us capture the impact of variants on quality attributes, and helps us predict and assess the quality of a product line member by performing quantitative analysis over it. For developing specific systems, members of a product line, we reuse the expertise captured by a BBN instead of working from scratch. We use examples from the Computer Aided Dispatch (CAD) product line project to illustrate our approach. © Springer-Verlag Berlin Heidelberg 2003.

Citations Scopus - 32
Show 19 more journal articles

Conference (75 outputs)

Year Citation Altmetrics Link
2017 Chen J, Bai Y, Hao D, Xiong Y, Zhang H, Xie B, 'Learning to prioritize test programs for compiler testing', Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 (2017)
2017 Shu C, Zhang H, 'Neural Programming by Example', Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. (2017)
2016 Zhang H, Jain A, Khandelwal G, Kaushik C, Ge S, Hu W, 'Bing Developer Assistant: Improving Developer Productivity by Recommending Sample Code', FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING (2016)
DOI 10.1145/2950290.2983955
2016 Gu X, Zhang H, Zhang D, Kim S, 'Deep API Learning', FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING (2016)
DOI 10.1145/2950290.2950334
2016 Chen J, Hu W, Hao D, Xiong Y, Zhang H, Zhang L, Xie B, 'An empirical comparison of compiler testing techniques', Proceedings of the 38th International Conference on Software Engineering (2016) [E1]
DOI 10.1145/2884781.2884878
Citations Scopus - 6
2016 Zhou M, Cheng X, Guo X, Gu M, Zhang H, Song X, 'Improving Failure Detection by Automatically Generating Test Cases Near the Boundaries', PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1 (2016)
DOI 10.1109/COMPSAC.2016.137
2016 Chen J, Bai Y, Hao D, Xiong Y, Zhang H, Zhang L, Xie B, 'Test Case Prioritization for Compilers: A Text-Vector Based Approach', 2016 9TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST) (2016)
DOI 10.1109/ICST.2016.19
Citations Scopus - 4Web of Science - 1
2016 Qingwei Lin Hongyu Zhang Jian-Guang Lou Yu Zhang and Xuewei Chen H, 'Log Clustering based Problem Identification for Online Service Systems' (2016)
2016 Lin Q, Lou JG, Zhang H, Zhang D, 'IDice: Problem identification for emerging issues', Proceedings - International Conference on Software Engineering (2016)

© 2016 ACM. One challenge for maintaining a large-scale software system, especially an online service system, is to quickly respond to customer issues. The issue reports typicall... [more]

© 2016 ACM. One challenge for maintaining a large-scale software system, especially an online service system, is to quickly respond to customer issues. The issue reports typically have many categorical attributes that reect the characteristics of the issues. For a commercial system, most of the time the vol-ume of reported issues is relatively constant. Sometimes, there are emerging issues that lead to significant volume in-crease. It is important for support engineers to effciently and effectively identify and resolve such emerging issues, since they have impacted a large number of customers. Cur-rently, problem identification for an emerging issue is a te-dious and error-prone process, because it requires support engineers to manually identify a particular attribute combi-nation that characterizes the emerging issue among a large number of attribute combinations. We call such an attribute combination effective combination, which is important for is-sue isolation and diagnosis. In this paper, we propose iDice, an approach that can identify the effective combination for an emerging issue with high quality and performance. We evaluate the effectiveness and effciency of iDice through ex-periments. We have also successfully applied iDice to several Microsoft online service systems in production. The results confirm that iDice can help identify emerging issues and re-duce maintenance effort.

DOI 10.1145/2884781.2884795
Citations Scopus - 1
2015 Ding S, Tan HBK, Zhang H, 'ABOR: An Automatic Framework for Buffer Overflow Removal in C/C plus plus Programs', ENTERPRISE INFORMATION SYSTEMS, ICEIS 2014 (2015)
DOI 10.1007/978-3-319-22348-3_12
2015 Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J, 'CodeHow: Effective Code Search based on API Understanding and Extended Boolean Model', 2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) (2015)
DOI 10.1109/ASE.2015.42
Citations Scopus - 5Web of Science - 4
2015 Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D, 'Learning to Log: Helping Developers Make Informed Logging Decisions', 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1 (2015)
DOI 10.1109/ICSE.2015.60
Citations Scopus - 13Web of Science - 4
2015 Zhou H, Lou J-G, Zhang H, Lin H, Lin H, Qin T, 'An Empirical Study on Quality Issues of Production Big Data Platform', 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol 2 (2015)
DOI 10.1109/ICSE.2015.130
Citations Scopus - 6
2015 Lim MH, Lou JG, Zhang H, Fu Q, Teoh ABJ, Lin Q, et al., 'Identifying Recurrent and Unknown Performance Issues', Proceedings - IEEE International Conference on Data Mining, ICDM (2015)

© 2014 IEEE. For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred bef... [more]

© 2014 IEEE. For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

DOI 10.1109/ICDM.2014.96
Citations Scopus - 1
2015 Rui Ding Hucheng Zhou Jian-Guang Lou Hongyu Zhang Qingwei Lin Qiang Fu Dongmei Zhang Tao Xie H, 'Log2: a cost-aware logging mechanism for performance diagnosis' (2015)
2014 'Proceedings of the 9th International Workshop on Advanced Modularization Techniques, AOAsia 2014, Hong Kong, China, November 16, 2014', AOAsia@SIGSOFT FSE (2014)
DOI 10.1145/2666358
2014 Liu K, Tan HBK, Zhang H, 'Mining key and referential constraints enforcement patterns.', SAC (2014)
DOI 10.1145/2554850.2554919
2014 'Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics, WETSoM 2014, Hyderabad, India, June 3, 2014', WETSoM (2014)
2014 Ding S, Tan HBK, Zhang H, 'Automatic Removal of Buffer Overflow Vulnerabilities in C/C++ Programs.', ICEIS (2) (2014)
DOI 10.5220/0004888000490059
2014 Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H, 'Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis', 2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) (2014)
DOI 10.1109/ICSME.2014.40
Citations Scopus - 22Web of Science - 11
2014 Ding S, Zhang H, Tan HBK, 'Detecting Infeasible Branches Based on Code Patterns', 2014 SOFTWARE EVOLUTION WEEK - IEEE CONFERENCE ON SOFTWARE MAINTENANCE, REENGINEERING, AND REVERSE ENGINEERING (CSMR-WCRE) (2014)
Citations Web of Science - 2
2014 Wu R, Zhang H, Cheung SC, Kim S, 'Crashlocator: Locating crashing faults based on crash stacks', 2014 International Symposium on Software Testing and Analysis, ISSTA 2014 - Proceedings (2014)

Copyright 2014 ACM. Software crash is common. When a crash occurs, software developers can receive a report upon user permission. A crash report typically includes a call stack at... [more]

Copyright 2014 ACM. Software crash is common. When a crash occurs, software developers can receive a report upon user permission. A crash report typically includes a call stack at the time of crash. An important step of debugging a crash is to identify faulty functions, which is often a tedious and labor-intensive task. In this paper, we propose CrashLocator, a method to locate faulty functions using the crash stack information in crash reports. It deduces possible crash traces (the failing execution traces that lead to crash) by expanding the crash stack with functions in static call graph. It then calculates the suspiciousness of each function in the approximate crash traces. The functions are then ranked by their suspiciousness scores and are recommended to developers for further investigation. We evaluate our approach using real-world Mozilla crash data. The results show that our approach is effective: We can locate 50.6%, 63.7% and 67.5% of crashing faults by examining top 1, 5 and 10 functions recommended by CrashLocator, respectively. Our approach outperforms the conventional stack-only methods significantly.

Citations Scopus - 26
2014 Cao Y, Zhang H, Ding S, 'Symcrash: Selective recording for reproducing crashes', ASE 2014 - Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (2014)

© 2014 ACM. Software often crashes despite tremendous effort on software quality assurance. Once developers receive a crash report, they need to reproduce the crash in order to u... [more]

© 2014 ACM. Software often crashes despite tremendous effort on software quality assurance. Once developers receive a crash report, they need to reproduce the crash in order to understand the problem and locate the fault. However, limited information from crash reports often makes crash reproduction difficult. Many "captureand-replay" techniques have been proposed to automatically capture program execution data from the failing code, and help developers replay the crash scenarios based on the captured data. However, such techniques often suffer from heavy overhead and introduce privacy concerns. Recently, methods such as BugRedux were proposed to generate test input that leads to crash through symbolic execution. However, such methods have inherent limitations because they rely on conventional symbolic execution techniques. In this paper, we propose a dynamic symbolic execution method called SymCon, which addresses the limitation of conventional symbolic execution by selecting functions that are hard to be resolved by a constraint solver and using their concrete runtime values to replace the symbols. We then propose SymCrash, a selective recording approach that only instruments and monitors the hard-to-solve functions. SymCrash can generate test input for crashes through SymCon. We have applied our approach to successfully reproduce 13 failures of 6 real-world programs. Our results confirm that the proposed approach is suitable for reproducing crashes, in terms of effectiveness, overhead, and privacy. It also outperforms the related methods.

DOI 10.1145/2642937.2642993
Citations Scopus - 8
2014 Sun C, Zhang H, Lou JG, Zhang H, Wang Q, Zhang D, Khoo SC, 'Querying sequential software engineering data', Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (2014)

Copyright 2014 ACM. We propose a pattern-based approach to effectively and efficiently analyzing sequential software engineering (SE) data. Different from other types of SE data, ... [more]

Copyright 2014 ACM. We propose a pattern-based approach to effectively and efficiently analyzing sequential software engineering (SE) data. Different from other types of SE data, sequential SE data preserves unique temporal properties, which cannot be easily analyzed without much programming effort. In order to facilitate the analysis of sequential SE data, we design a sequential pattern query language (SPQL), which specifies the temporal properties based on regular expressions, and is enhanced with variables and statements to store and manipulate matching states. We also propose a query engine to effectively process the SPQL queries. We have applied our approach to an alyze two types of SE data, namely bug report history and source code change history. We experiment with 181,213 Eclipse bug reports and 323,989 code revisions of Android. SPQL enables us to explore interesting temporal properties underneath these sequential data with a few lines of query code and low matching overhead. The analysis results can help better understand a software process and identify process violations.

DOI 10.1145/2635868.2635902
Citations Scopus - 3
2014 Hu H, Zhang H, Xuan J, Sun W, 'Effective bug triage based on historical bug-fix information', Proceedings - International Symposium on Software Reliability Engineering, ISSRE (2014)

© 2014 IEEE. For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to d... [more]

© 2014 IEEE. For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been proposed to automate this process. In this paper, we describe our study on applying conventional bug triage techniques to projects of different sizes. We find that the effectiveness of a bug triage technique largely depends on the size of a project team (measured in terms of the number of developers). The conventional bug triage methods become less effective when the number of developers increases. To further improve the effectiveness of bug triage for large projects, we propose a novel recommendation method called Bug Fixer, which recommends developers for a new bug report based on historical bug-fix information. Bug Fixer constructs a Developer-Component-Bug (DCB) network, which models the relationship between developers and source code components, as well as the relationship between the components and their associated bugs. A DCB network captures the knowledge of 'who fixed what, where'. For a new bug report, Bug Fixer uses a DCB network to recommend to triager a list of suitable developers who could fix this bug. We evaluate Bug Fixer on three large-scale open source projects and two smaller industrial projects. The experimental results show that the proposed method outperforms the existing methods for large projects and achieves comparable performance for small projects.

DOI 10.1109/ISSRE.2014.17
Citations Scopus - 13
2013 Gong J, Zhang H, 'BugMap: a topographic map of bugs.', ESEC/SIGSOFT FSE (2013)
DOI 10.1145/2491411.2494582
2013 Zhang H, Cheung SC, 'A cost-effectiveness criterion for applying software defect prediction models.', ESEC/SIGSOFT FSE (2013)
DOI 10.1145/2491411.2494581
2013 Hao D, Lan T, Zhang H, Guo C, Zhang L, 'Is This a Bug or an Obsolete Test?', ECOOP 2013 - OBJECT-ORIENTED PROGRAMMING (2013)
Citations Web of Science - 1
2013 Liu K, Tan HBK, Zhang H, 'Has This Bug Been Reported?', 2013 20TH WORKING CONFERENCE ON REVERSE ENGINEERING (WCRE) (2013)
Citations Web of Science - 1
2013 Zhang H, Gong L, Versteeg S, 'Predicting Bug-Fixing Time: An Empirical Study of Commercial Software Projects', PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013) (2013)
Citations Web of Science - 22
2013 Wang J, Dang Y, Zhang H, Chen K, Xie T, Zhang D, 'Mining succinct and high-coverage API usage patterns from source code', IEEE International Working Conference on Mining Software Repositories (2013)

During software development, a developer often needs to discover specific usage patterns of Application Programming Interface (API) methods. However, these usage patterns are ofte... [more]

During software development, a developer often needs to discover specific usage patterns of Application Programming Interface (API) methods. However, these usage patterns are often not well documented. To help developers to get such usage patterns, there are approaches proposed to mine client code of the API methods. However, they lack metrics to measure the quality of the mined usage patterns, and the API usage patterns mined by the existing approaches tend to be many and redundant, posing significant barriers for being practical adoption. To address these issues, in this paper, we propose two quality metrics (succinctness and coverage) for mined usage patterns, and further propose a novel approach called Usage Pattern Miner (UP-Miner) that mines succinct and high-coverage usage patterns of API methods from source code. We have evaluated our approach on a large-scale Microsoft codebase. The results show that our approach is effective and outperforms an existing representative approach MAPO. The user studies conducted with Microsoft developers confirm the usefulness of the proposed approach in practice. © 2013 IEEE.

DOI 10.1109/MSR.2013.6624045
Citations Scopus - 32
2012 Zhou J, Zhang H, 'Learning to rank duplicate bug reports.', CIKM (2012)
DOI 10.1145/2396761.2396869
2012 'Proceedings of the 3rd International Workshop on Emerging Trends in Software Metrics, WETSoM 2012, Zurich, Switzerland, June 3, 2012', WETSoM (2012)
2012 Ding S, Tan HBK, Liu K, Chandramohan M, Zhang H, 'Detection of Buffer Overflow Vulnerabilities in C/C++ with Pattern Based Limited Symbolic Evaluation.', COMPSAC Workshops (2012)
DOI 10.1109/COMPSACW.2012.103
2012 Anderson DJ, Concas G, Lunesu MI, Marchesi M, Zhang H, 'A Comparative Study of Scrum and Kanban Approaches on a Real Case Study Using Simulation', AGILE PROCESSES IN SOFTWARE ENGINEERING AND EXTREME PROGRAMMING, XP 2012 (2012)
Citations Web of Science - 2
2012 Zhou J, Zhang H, Lo D, 'Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports', 2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) (2012)
Citations Web of Science - 80
2012 Dang Y, Wu R, Zhang H, Zhang D, Nobel P, 'ReBucket: A Method for Clustering Duplicate Crash Reports Based on Call Stack Similarity', 2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) (2012)
Citations Scopus - 41Web of Science - 23
2012 Gong L, Lo D, Jiang L, Zhang H, 'Diversity Maximization Speedup for Fault Localization', 2012 PROCEEDINGS OF THE 27TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) (2012)
Citations Web of Science - 4
2012 Tran MH, Colman A, Han J, Zhang H, 'Modeling and Verification of Context-aware Systems', 2012 19TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), VOL 1 (2012)
DOI 10.1109/APSEC.2012.50
Citations Web of Science - 1
2012 Wang J, Zhang H, 'Predicting Defect Numbers Based on Defect State Transition Models', PROCEEDINGS OF THE ACM-IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM'12) (2012)
Citations Web of Science - 3
2012 Gong L, Lo D, Jiang L, Zhang H, 'Interactive Fault Localization Leveraging Simple User Feedback', 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM) (2012)
Citations Web of Science - 11
2011 Liu K, Tan HBK, Chen X, Zhang H, Padmanabhuni B, 'Automated Extraction of Data Lifecycle Support from Database Applications.', SEKE (2011)
2011 Wu R, Zhang H, Kim S, Cheung S-C, 'ReLink: recovering links between bugs and changes.', SIGSOFT FSE (2011)
DOI 10.1145/2025113.2025120
2011 Li Y-F, Zhang H, 'Integrating software engineering data using semantic web technologies.', MSR (2011)
DOI 10.1145/1985441.1985473
2011 'Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics, WETSoM 2011, Waikiki, Honolulu, HI, USA, May 24, 2011', WETSoM (2011)
2011 Jarzabek S, Pettersson U, Zhang H, 'University-Industry Collaboration Journey towards Product Lines.', ICSR (2011)
DOI 10.1007/978-3-642-21347-2_17
2011 Kim S, Zhang H, Wu R, Gong L, 'Dealing with Noise in Defect Prediction', 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) (2011)
Citations Web of Science - 54
2011 Concas G, Di Penta M, Tempero E, Zhang H, 'Workshop on Emerging Trends in Software Metrics (WETSoM 2011)', 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) (2011)
2010 Zhang H, Nelson A, Menzies T, 'On the value of learning from defect dense components for software defect prediction.', PROMISE (2010)
DOI 10.1145/1868328.1868350
2010 'Proceedings of the 2010 ICSE Workshop on Emerging Trends in Software Metrics, WETSoM 2010, Cape Town, South Africa, May 4, 2010', WETSoM (2010)
2010 Zhang H, Jarzabek S, 'A Hybrid Approach to Feature-Oriented Programming in XVCL', SOFTWARE PRODUCT LINES: GOING BEYOND (2010)
Citations Scopus - 4Web of Science - 4
2010 Zhang H, Shi B, Zhang L, 'Automatic Checking of License Compliance', 2010 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (2010)
2010 Zhang H, Wu R, 'Sampling Program Quality', 2010 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (2010)
2010 Canfora G, Concas G, Marchesi M, Tempero ED, Zhang H, 'Workshop on Emerging Trends in Software Metrics (WETSoM 2010).', ICSE (2) (2010)
DOI 10.1145/1810295.1810428
2009 Liu L, Zhang H, Ma W, Shan Y, Xu J, Peng F, Burda T, 'Understanding Chinese Characteristics of Requirements Engineering', PROCEEDINGS OF THE 2009 17TH IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (2009)
DOI 10.1109/RE.2009.14
Citations Web of Science - 2
2009 Jarzabek S, Xue Y, Zhang H, Lee Y, 'Avoiding Some Common Preprocessing Pitfalls with Feature Queries', APSEC 09: SIXTEENTH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS (2009)
DOI 10.1109/APSEC.2009.61
Citations Scopus - 1
2009 Zhang H, 'An Investigation of the Relationships between Lines of Code and Defects', 2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS (2009)
DOI 10.1109/ICSM.2009.5306304
Citations Web of Science - 23
2009 Jarzabek S, Zhang H, Lee Y, Xue Y, Shaikh N, 'Increasing Usability of Preprocessing for Feature Management in Product Lines with Queries', 2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, COMPANION VOLUME (2009)
DOI 10.1109/ICSE-COMPANION.2009.5070985
Citations Scopus - 3Web of Science - 1
2008 Zhang H, 'Exploring Regularity in Source Code: Software Science and Zipf's Law', FIFTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS (2008)
DOI 10.1109/WCRE.2008.37
Citations Web of Science - 4
2008 Zhang H, 'An initial study of the growth of eclipse defects.', MSR (2008)
DOI 10.1145/1370750.1370785
2008 Zhang H, 'The scale-free nature of semantic web ontology.', WWW (2008)
DOI 10.1145/1367497.1367649
2007 Zhang H, Zhang X, Gu M, 'Predicting defective software components from code complexity measures', 13TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS (2007)
DOI 10.1109/PRDC.2007.28
Citations Web of Science - 2
2007 Zhang H, Tan HBK, 'An empirical study of class sizes for large Java systems', 14TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS (2007)
DOI 10.1109/ASPEC.2007.64
2007 Zhang H, Zhang X, Gu M, 'Predicting defective software components from code complexity measures', Proceedings - 13th Pacific Rim International Symposium on Dependable Computing, PRDC 2007 (2007)

The ability to predict defective modules can help us allocate limited quality assurance resources effectively and efficiently. In this paper, we propose a complexitybased method f... [more]

The ability to predict defective modules can help us allocate limited quality assurance resources effectively and efficiently. In this paper, we propose a complexitybased method for predicting defect-prone components. Our method takes three code-level complexity measures as input, namely Lines of Code, McCabe's Cyclomatic Complexity and Halstead's Volume, and classifies components as either defective or non-defective. We perform an extensive study of twelve classification models using the public NASA dataseis. Cross-validation results show that our method can achieve good prediction accuracy. This study confirms that static code complexity measures can be useful indicators of component quality. © 2007 IEEE.

DOI 10.1109/PRDC.2007.56
Citations Scopus - 26
2007 Peng D, Jarzabek S, Rajapakse DC, Zhang H, 'Reuse of database access layer components in JEE product lines: Limitations and a possible solution (Case Study)', 19th International Conference on Software Engineering and Knowledge Engineering, SEKE 2007 (2007)

We set up an experiment to evaluate JEE as a platform for product line development. While JEE provides many useful mechanisms for reuse of common services/components, still we fou... [more]

We set up an experiment to evaluate JEE as a platform for product line development. While JEE provides many useful mechanisms for reuse of common services/components, still we found that systematic across-the-board reuse in application domain-specific areas was hard. The main difficulty was the lack of a mechanism to represent groups of similar components in a generic, adaptable form. Such similar components arise as the number of variant features of a product line grows, and we need to accommodate legal combinations of variant features in components of a product line architecture. Such uncontrolled growth of similar component versions hinders productivity of reuse-based development and raises maintenance costs. In the paper, we study the manifestation of this problem in the JEE¿ database access layer. Interactive Development Environments such as NetBeans or JBuilder speed up the development process, but they do not address the source of the problem, which is the lack of mechanisms to design generic components capable of accommodating variant features in various combinations. We filled this gap with a "mixed strategy" solution based on generative programming technique of XVCL applied on top of JEE. In the paper, we highlight the nature of the problems we encountered and our solution. Copyright © (2007) by Knowledge Systems Institute (KSI).

Citations Scopus - 2
2006 Tan HBK, Zhao Y, Zhang H, 'Estimating LOC for information systems from their conceptual data models', Proceedings - International Conference on Software Engineering (2006)

Effort and cost estimation is crucial in software management. Estimation of software size plays a key role in the estimation. Line of Code (LOG) is still a commonly used software ... [more]

Effort and cost estimation is crucial in software management. Estimation of software size plays a key role in the estimation. Line of Code (LOG) is still a commonly used software size measure. Despite the fact that software sizing is well recognized as an important problem for more than two decades, there is still much problem in existing methods. Conceptual data model is widely used in the requirements analysis for information systems. It is also not difficult to construct conceptual data models in the early stage of developing information systems. Much characteristic of an information system is actually reflected from its conceptual data model. We explore into the use of conceptual data model for estimating LOC. This paper proposes a novel method for estimating LOG for an information system from its conceptual data model through the use of multiple linear regression model. We have validated the method through collecting samples from both the industry and open-source systems. Copyright 2006 ACM.

Citations Scopus - 6
2006 Jarzabek S, Zhang H, Shen RU, Lam VT, Zhenxin S, 'Analysis of meta-programs: An example', International Journal of Software Engineering and Knowledge Engineering (2006)

Meta-programs are generic, incomplete, adaptable programs that are instantiated at construction time to meet specific requirements. Templates and generative techniques are example... [more]

Meta-programs are generic, incomplete, adaptable programs that are instantiated at construction time to meet specific requirements. Templates and generative techniques are examples of meta-programming techniques. Understanding of meta-programs is more difficult than understanding of concrete, executable programs. Static and dynamic analysis methods have been applied to ease understanding of programs - can similar methods be used for meta-programs? In our projects, we build meta-programs with a meta-programming technique called XVCL. Meta-programs in XVCL are organized into a hierarchy of meta-components from which the XVCL processor generates concrete, executable programs that meet specific requirements. We developed an automated system that analyzes XVCL meta-programs, and presents developers with information that helps them work with meta-programs more effectively. Our system conducts both static and dynamic analysis of a. meta-program. An integral part of our solution is a query language, FQL in which we formulate questions about meta-prograin properties. An FQL query processor automatically answers a class of queries. The analysis method described in the paper is specific to XVCL. However, the principle of our approach can be applied to other meta-programming systems. We believe readers interested in metaprogramming in general will find some of the lessons from our experiment interesting and useful. © World Scientific Publishing Company.

DOI 10.1142/S0218194006002689
Citations Scopus - 1
2005 Sun J, Zhang H, Li YF, Wang H, 'Formal semantics and verification for feature modeling', Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS (2005)

Research on features has received much attention in the domain engineering community. Feature modeling plays an important role in the design and implementation of complex software... [more]

Research on features has received much attention in the domain engineering community. Feature modeling plays an important role in the design and implementation of complex software systems. However, the presentation and analysis of feature models are still largely informal. There is also an increasing need for methods and tools that can support automated feature model analysis. This paper presents a formal engineering approach to the specification and verification of feature models. A formal semantics for the feature modeling language is defined using first-order logic. It provides a precise and rigorous formal interpretation for the graphical notation. In addition, further validation of the semantics using the Z/EVES theorem prover is presented. Finally, we demonstrate that the consistency of a feature model and its configurations can be automatically verified by encoding the semantics into the Alloy Analyzer. A case study of the Key Word in Context (KWIC) index systems feature model is presented to illustrate the verification process. © 2005 IEEE.

Citations Scopus - 67
2003 Zhang H, Jarzabek S, 'An XVCL approach to handling variants: A KWIC product line example', Proceedings - Asia-Pacific Software Engineering Conference, APSEC (2003)

© 2003 IEEE. We developed XVCL (XML-based Variant Configuration Language), a method and tool for product lines, to facilitate handling variants in reusable software assets (such ... [more]

© 2003 IEEE. We developed XVCL (XML-based Variant Configuration Language), a method and tool for product lines, to facilitate handling variants in reusable software assets (such as architecture, code components or UML models). XVCL is a newer version of Bassett's frames [1], a technology that has achieved substantial productivity improvements in large data processing product lines written in COBOL. Despite its simplicity, XVCL can effectively manage a wide range of product line variants from a compact base of meta-components, structured for effective reuse. We applied XVCL in two medium-size product line projects and a number of smaller case studies. In this paper, we communicate XVCL's capabilities to support product lines by means of a simple, but still interesting, example of the KWIC system introduced by Parnas in 1970's. We show how we can handle functional variants, variant design decisions and implementation-level variants in a generic KWIC system.

DOI 10.1109/APSEC.2003.1254364
Citations Scopus - 6
2003 Jarzabek S, Ong WC, Zhang H, 'Handling variant requirements in domain modeling', Journal of Systems and Software (2003)

Domain models describe common and variant requirements for a family of similar systems. Although most of the notations, such as UML, are meant for modeling a single system, they c... [more]

Domain models describe common and variant requirements for a family of similar systems. Although most of the notations, such as UML, are meant for modeling a single system, they can be extended to model variants. We have done that and applied such extended notations in our projects. We soon found that our models with variants were becoming overly complicated, undermining the major role of domain analysis which is understanding. One variant was often reflected in many models and any given model was affected by many variants. The number of possible variant combinations was growing rapidly and mutual dependencies among variants even further complicated the domain model. We realized that our purely descriptive domain model was only useful for small examples but it did not scale up. In this paper, we describe a modeling method and a Flexible Variant Configuration tool (FVC for short) that alleviate the above mentioned problems. In our approach, we start by modeling so-called domain defaults, i.e., requirements that characterize a typical system in a domain. Then, we describe variants as deltas in respect to domain defaults. The FVC interprets variants to produce customized domain model views for a system that meets specific requirements. We implemented the above concepts using commercial tools Netron Fusion¿ and Rational Rose¿. In the paper, we illustrate our domain modeling method and tool with examples from the Facility Reservation System domain. © 2003 Elsevier Inc. All rights reserved.

DOI 10.1016/S0164-1212(03)00060-8
Citations Scopus - 10
2003 Jarzabek S, Bassett P, Zhang H, Zhang W, 'XVCL: XML-based variant configuration language', Proceedings - International Conference on Software Engineering (2003)

XML-based Variant Configuration Language (XVCL) is a meta-programming technique and tool that provides effective reuse mechanisms. It includes a methodology and a tool-the XVCL pr... [more]

XML-based Variant Configuration Language (XVCL) is a meta-programming technique and tool that provides effective reuse mechanisms. It includes a methodology and a tool-the XVCL processor. The methodology shows how to discover the structure of the solution for the application domain and for the types of variants one wants to address. The XVCL processor automates the routine yet error-prone program construction tasks, allowing to focus on what is novel about the problem domains, requiring creativity.

Citations Scopus - 51
2002 Swe SM, Zhang H, Jarzabek S, 'XVCL: A tutorial', ACM International Conference Proceeding Series (2002)

XVCL (XML-based Variant Configuration Language) is a general-purpose mark-up language for configuring variants in programs and other types of documents. We can apply XVCL to confi... [more]

XVCL (XML-based Variant Configuration Language) is a general-purpose mark-up language for configuring variants in programs and other types of documents. We can apply XVCL to configure variants in a variety of software assets such as software architecture, program code, test cases, technical and user-level program documentation or requirement specifications. The principles of the XVCL have been thoroughly tested in practice. XVCL is based on the same concepts as the frame technology [1]. Frame technology has been extensively applied in industry to manage variants and evolve multi-million-line, COBOL-based, information systems. An independent analysis showed that frame technology has reduced large software project costs by over 84% and their times-to-market by 70%, whe n compared to industry norms [1, 2]. At the same time, we found that the principles of XVCL are not easy to communicate. In this paper, we describe a subset of XVCL. We trust this subset of XVCL is easy to understand and still effectively communicates essential XVCL concepts. To illustrate the XVCL method, we further describe an XVCL solution to handling variants in a Notepad system. Copyright 2002 ACM.

DOI 10.1145/568760.568821
Citations Scopus - 12
2001 Wong TW, Jarzabek S, Swe SM, Shen R, Zhang H, 'XML implementation of frame processor', Proceedings of SSR'01 2001 Symposium on Software Reusability (2001)

A quantitative study has shown that frame technology [1] supported by Fusion¿ toolset can lead to reduction in time-to-market (70%) and project costs (84%). Frame technology has ... [more]

A quantitative study has shown that frame technology [1] supported by Fusion¿ toolset can lead to reduction in time-to-market (70%) and project costs (84%). Frame technology has been developed to handle large COBOL-based business software product families. We wished to investigate how the principle of frame approach can be applied to support product families in other application domains, in particular to build distributed component-based systems written in Object-Oriented languages. As Fusion¿ is tightly coupled with COBOL, we implemented our own tools based on frame concepts using the XML technology. In our solution, a generic architecture for a product family is a hierarchy of XML documents. Each such document contains a reusable program fragment instrumented for change with XML tags. We use a tool built on top of XML parsing framework JAXP to process documents in order to produce a custom member of a product family. Our solution is cost-effective and extensible. In the paper, we describe our solution, illustrating its use with examples. We intend to make our solution available to public in order to encourage investigation of frame concepts in other application domains, implementation languages and platforms.

Citations Scopus - 16
2001 Zhang H, Jarzabek S, Swe SM, 'XVCL approach to separating concerns in product family assets', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001)

© Springer-Verlag Berlin Heidelberg 2001. In this paper, we describe an XML-based language, called XVCL, for managing variants in component-based product families. Using XVCL, we... [more]

© Springer-Verlag Berlin Heidelberg 2001. In this paper, we describe an XML-based language, called XVCL, for managing variants in component-based product families. Using XVCL, we can organize product family assets and instrument them to accommodate variants. A tool that interprets XVCL and provides semi-automatic support for asset customization is also introduced. In our projects, we applied XVCL to manage variants in UML domain models and in generic architectures for product families. We have achieved simple forms of separation of concerns (in both models and architectures) and we are investigating advanced forms in current work. We plan to compare XVCL to other emerging techniques that lead to separating of concerns in software models, documents, architectures and code.

DOI 10.1007/3-540-44800-4_4
Citations Scopus - 2
2001 Jarzabek S, Zhang H, 'XML-based method and tool for handling variant requirements in domain models', Proceedings of the IEEE International Conference on Requirements Engineering (2001)

A domain model describes common and variant requirements for a system family. UML notations used in requirements analysis and software modeling can be extended with "variation poi... [more]

A domain model describes common and variant requirements for a system family. UML notations used in requirements analysis and software modeling can be extended with "variation points" to cater for variant requirements. However, UML models for a large single system are already complicated enough. With variants - UML domain models soon become too complicated to be useful. The main reasons are the explosion of possible variant combinations, complex dependencies among variants and inability to trace variants from a domain model down to the requirements for a specific system, member of a family. We believe that the above mentioned problems cannot be solved at the domain model description level alone. In the paper, we propose a novel solution based on a tool that interprets and manipulates domain models to provide analysts with customized, simple domain views. We describe a variant configuration language that allows us to instrument domain models with variation points and record variant dependencies. An interpreter of this language produces customized views of a domain model, helping analysts understand and reuse software models. We describe the concept of our approach and its simple implementation based on XML and XMI technologies.

Citations Scopus - 34
Show 72 more conferences
Edit

Associate Professor Hongyu Zhang

Position

Associate Professor
School of Electrical Engineering and Computing
Faculty of Engineering and Built Environment

Focus area

Computer Science and Software Engineering

Contact Details

Email hongyu.zhang@newcastle.edu.au
Phone (02) 4921 7790

Office

Room ES233
Building ES.
Location Callaghan
University Drive
Callaghan, NSW 2308
Australia
Edit