Professor Hongyu Zhang

Honorary Professor

School of Computer and Information Sciences (Computer Science and Software Engineering)

Email:hongyu.zhang@newcastle.edu.au
Phone:0249217790

Intelligent software engineering

By mining a vast amount of software data, Associate Professor Hongyu Zhang is developing intelligent methods and tools that improve software quality and development productivity.

Hongyu Zhang

“Currently, software development is largely a manual, time-consuming, and error-prone process”, says Hongyu, “we can improve such a process by learning from software that was written before”.

“Over the years, a large number of software systems have been developed. These software systems are associated with a variety of data such as source code, bugs, logs, incident reports, metric data, etc. The availability of vast amounts of software data opens the opportunity for us to improve software quality and productivity.”

Together with his collaborators and students, Hongyu has proposed many data mining, machine learning (including deep learning), data mining, and information retrieval based methods to extract knowledge from the software data and solve software engineering problems. Some of his works are as follows:

Intelligent programming

To help programmers program, Hongyu proposed many innovative methods that learn from a large amount of source code for effective code search, code summarization, code generation, and code pattern mining. For example, he proposed one of the first deep learning based methods for source code search and API recommendation (FSE’16, ICSE’18), which can help programmers write new programs by searching and reusing existing code. He also proposed neural programming by example (AAAI’17), which targets at a challenging problem of automatically generating a program based on input/output examples through a deep neural network.

Intelligent quality prediction

The quality of software is important. Hongyu proposed many machine learning based methods for predicting defect-prone software modules. He also worked on cloud failure prediction, which predicts future failures of a computing node or a hard disk in a large-scale cloud system based on historical system metric and failure data (FSE’18). Hongyu proposed DeepPerf (ICSE 2019), which utilizes a deep feedforward neural network for predicting the runtime performance of a highly configurable software system. It was the first time that deep neural network was applied for successful software performance prediction.

Intelligent fault detection and diagnosis

Software systems always contain faults (bugs). Hongyu proposed many innovative methods for log-based fault detection, crash-based fault localization, bug report analytics, and incident management. For example, he proposed BugLocator (ICSE’12), which automatically locates buggy source code files based on a bug report. He also works on data-driven methods for compiler testing, with the aim of improving the efficiency of compiler testing.

Making real impact

Apart from scholarly publications, Hongyu is also keen to see the impact of research on practice. When Hongyu was working in Microsoft, he worked closely with Bing and Visual Studio teams on the Bing Developer Assistant (BDA) project. BDA is a Visual Studio Extension that allows developers to search for reusable code snippets based on queries. The BDA tool received more than 450K downloads in 2016. Hongyu has been collaborating with Microsoft Research teams and published many innovative techniques, which were also successfully deployed to real-world online service systems in Microsoft.

An independent 2019 Elsevier Bibliometric Assessment of Software Engineering Scholars ranks Hongyu as the world’s top 20 most prolific Software Engineering researcher in the past decade. He has been recognised in The Australian’s Top Researchers special edition publication (09/2020) as the leading researcher in the field of Software Systems.

Intelligent software engineering

By mining a vast amount of software data, Associate Professor Hongyu Zhang is developing intelligent methods and tools that improve software quality and development productivity.

Biography

My research is in the area of Software Engineering, in particular, intelligent software engineering, software analytics, fault diagnosis, maintenance, and reuse. The main theme of my research is to improve software quality and productivity by mining and analyzing software data. I have published more than 200 Research Papers in international journals and conferences, including TSE, TOSEM, ICSE, FSE, ASE, ISSTA, POPL, AAAI, IJCAI, KDD, ICSME, ICDM, and USENIX ATC. I received more than 8 ACM Distinguished Paper awards and Best Paper awards. I have also served as a program committee member/track chair for many software engineering conferences. I am an associate editor of ACM Computing Surveys and Automated Software Engineering. I am a Senior Member of IEEE, a Distinguished Member of ACM, a Distinguished Member of CCF, and a Fellow of Engineers Australia (FIEAust).

More information about me can be found at my Personal Webpage. I can be always reached at hongyujohn@gmail.com.

Research Area:

My research area is software engineering, in particular:

software analytics, mining software repository, data-driven software engineering
intelligent software and service engineering
software testing, debugging, fault diagnosis
software maintenance and reuse

The main theme of my research is to improve software quality and productivity by utilizing knowledge mined from software data. Over the years, a software organization could accumulate a large amount of data including source code, bug reports, execution logs, changes, metrics, documents, and so on. Data mining, machine learning, and information retrieval techniques can be applied to extract knowledge from the software data and solve software engineering problems.

Recent Program Organizations:

Technical Briefings co-chair: The 45th International Conference on Software Engineering (ICSE 2023)
General co-chair: The 36th International Conference on Software Maintenance and Evolution (ICSME 2020)
Tool Demonstration co-chair: The International Symposium of Software Testing and Analysis (ISSTA 2019)
Program co-chair, The 18th IEEE International Conference on Software Quality, Reliability, and Security (QRS 2018).
Program co-chair, The 25th Asia-Pacific Software Engineering Conference (APSEC 2018).
Co-organizer: Dagstuhl Seminar 17502 on "Testing and Verification of Compilers", Dec 2017, Germany.
Program co-chair, The 12th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE’16).
The International Conference on Predictive Models in Software Engineering (PROMISE), 2014-2018. (Steering Committee Member)

Recent Program Committees:

The IEEE/ACM International Conference on Automated Software Engineering (ASE 2015, ASE 2018-2023).
The IEEE International Conference on Software Maintenance and Evolution (ICSME 2013-2019)
The Working Conference on Mining Software Repositories (MSR 2013-2017, MSR 2020-2022)
The International Conference on Software Engineering: ICSE 2021, ICSE 2023 (Technical Program Committee)
The International Conference on Software Analysis, Evolution and Reengineering (SANER 2015-2017, SANER 2018/2020/2021(industry), SANER 2023)

The thirty-seventh International Conference on Machine Learning (ICML 2020, ICML 2021, ICML 2022)
The eighth International Conference on Learning Representations (ICLR 2020, ICLR 2021, ICLR 2022)
The AAAI Conference on Artificial Intelligence (AAAI 2021, AAAI 2022, AAAI 2023)
ACM SIGSOFT Symposium on the Foundations of Software Engineering: FSE 2022 (Technical Program Committee)
The International Symposium on Empirical Software Engineering and Measurement (ESEM 2016 - 2022)

Qualifications

Doctor of Philosophy, National University of Singapore

Keywords

Artificial Intelligence
Data Mining
Software Engineering

Languages

Mandarin (Mother)
English (Fluent)

Fields of Research

Code	Description	Percentage
461201	Automated software engineering	70
461207	Software quality, processes and metrics	30

Professional Experience

UON Appointment

Title	Organisation / Department
Associate Professor	University of Newcastle School of Electrical Engineering and Computing Australia

Academic appointment

Dates	Title	Organisation / Department
1/2/2023 -	Honorary Professor	School of Information and Physical Sciences (SIPS), University of Newcastle Australia

Edit

Publications

For publications that are currently unpublished or in-press, details are shown in italics.

Chapter (2 outputs)

Year

Citation

Altmetrics

Link

2016

Hou Z, Zhang H, Zhang H, Zhang D, 'Visual analytics for software engineering data', 77-80 (2016)

Many data analysis techniques require substantial knowledge and skills and are typically performed by "data scientists". Ordinary users may find it difficult ... [more]

Many data analysis techniques require substantial knowledge and skills and are typically performed by "data scientists". Ordinary users may find it difficult to apply these techniques to quickly explore the data by themselves. We propose MetroEyes, a visual analytics tool for interactive data exploration. We have successfully transferred the main concepts and experiences of MetroEyes to Microsoft Power BI.

DOI	10.1016/B978-0-12-804206-9.00015-5
Citations	Scopus - 3

2016

Lin Q, Lou JG, Zhang H, Zhang D, 'How to tame your online services', 63-65 (2016)

Online service systems have become increasingly popular and important. Service incidents can lead to huge economic loss. We designed a set of incident management techni... [more]

Online service systems have become increasingly popular and important. Service incidents can lead to huge economic loss. We designed a set of incident management techniques based on the analysis of a huge amount of data collected at service runtime. Our tool is called Service Analysis Studio (SAS), which has been successfully applied to large-scale online services provided by Microsoft.

DOI	10.1016/B978-0-12-804206-9.00012-X
Citations	Scopus - 2

Conference (247 outputs)

Year

Citation

Altmetrics

Link

2025

He X, Li D, Wen H, Zhu Y, Liu C, Yan M, Zhang H, 'CoSEFA: An LLM-Based Programming Assistant for Secure Code Generation via Supervised Co-Decoding', Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1198-1202 (2025) [E1]

DOI	10.1145/3696630.3728609

2025

Gui Y, Li Z, Zhang Z, Wang G, Lv T, Jiang G, Liu Y, Chen D, Wan Y, Zhang H, Jiang W, Shi X, Jin H, 'LaTCoder: Converting Webpage Design to Code with Layout-as-Thought', Proceedings of the 31st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2, 721-732 (2025) [E1]

DOI	10.1145/3711896.3737016
Citations	Scopus - 2

2025

Gao Y, Luo J, Lin H, Zhang H, Wu M, Yang M, 'dl2: Detecting Communication Deadlocks in Deep Learning Jobs', Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 27-38 (2025) [C1]

DOI	10.1145/3696630.3728529
Citations	Scopus - 2

2025

Lin H, Wang Y, Gao Y, Zhang H, Wu M, Yang M, 'Reduction Fusion for Optimized Distributed Data-Parallel Computations via Inverse Recomputation', Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 545-549 (2025) [E1]

DOI	10.1145/3696630.3728496

2025

Li F, Jiang J, Sun J, Zhang H, 'Evaluating the Generalizability of LLMs in Automated Program Repair', Proceedings International Conference on Software Engineering, 91-95 (2025)

DOI	10.1109/ICSE-NIER66352.2025.00024

2025

Gu W, Shi E, Wang Y, Du L, Han S, Zhang H, Zhang D, Lyu MR, 'SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing', Proceedings International Conference on Software Engineering, 2303-2315 (2025)

DOI	10.1109/ICSE55347.2025.00154

2025

Wen XC, Lin Z, Gao C, Zhang H, Wang Y, Liao Q, 'Repository-Level Graph Representation Learning for Enhanced Security Patch Detection', Proceedings International Conference on Software Engineering, 2600-2612 (2025)

DOI	10.1109/ICSE55347.2025.00121

2025

Shi Y, Zhang H, Wan C, Gu X, 'Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers', Proceedings International Conference on Software Engineering, 1628-1639 (2025)

DOI	10.1109/ICSE55347.2025.00005

2025

Luo C, Lyu S, Wu W, Zhang H, Chu D, Hu C, 'Towards High-Strength Combinatorial Interaction Testing for Highly Configurable Software Systems', Proceedings International Conference on Software Engineering, 1579-1591 (2025)

DOI	10.1109/ICSE55347.2025.00113

2025

Sun W, Miao Y, Li Y, Zhang H, Fang C, Liu Y, Deng G, Liu Y, Chen Z, 'Source Code Summarization in the Era of Large Language Models', Proceedings International Conference on Software Engineering, 1882-1894 (2025)

DOI	10.1109/ICSE55347.2025.00034

2025

Zheng D, Wang Y, Shi E, Zhang R, Ma Y, Zhang H, Zheng Z, 'HumanEvo: An Evolution-Aware Benchmark for More Realistic Evaluation of Repository-Level Code Generation', Proceedings International Conference on Software Engineering, 1372-1384 (2025)

DOI	10.1109/ICSE55347.2025.00228

2025

Bi Z, Wan Y, Chu Z, Hu Y, Zhang J, Zhang H, Xu G, Jin H, 'How to Select Pre-Trained Code Models for Reuse? A Learning Perspective', Proceedings 2025 IEEE International Conference on Software Analysis Evolution and Reengineering Saner 2025, 627-638 (2025)

DOI	10.1109/SANER64311.2025.00065

2025

Gui Y, Li Z, Wan Y, Shi Y, Zhang H, Su Y, Chen B, Chen D, Wu S, Zhou X, Jiang W, Jin H, Zhang X, 'WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs', Www 2025 Proceedings of the ACM Web Conference, 1833-1844 (2025) [E1]

DOI	10.1145/3696410.3714889

2025

Gui Y, Wan Y, Li Z, Zhang Z, Chen D, Zhang H, Su Y, Chen B, Zhou X, Jiang W, Zhang X, 'UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs', WWW 2025: Proceedings of the ACM Web Conference, 1846-1855 (2025) [E1]

DOI	10.1145/3696410.3714891
Citations	Scopus - 1

2025

Xie Y, Zhang H, Babar MA, 'Multivariate Time Series Anomaly Detection by Capturing Coarse-Grained Intra- and Inter-Variate Dependencies', Www 2025 Proceedings of the ACM Web Conference, 697-705 (2025) [E1]

DOI	10.1145/3696410.3714941

2025

Pham L, Zhang H, Ha H, Salim F, Zhang X, 'RCAEval: A Benchmark for Root Cause Analysis of Microservice Systems with Telemetry Data', Www Companion 2025 Companion Proceedings of the ACM Web Conference 2025, 777-780 (2025) [E1]

DOI	10.1145/3701716.3715290

2025

Le VH, Xiao Y, Zhang H, 'Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models', Proceedings International Conference on Software Engineering, 975-987 (2025)

DOI	10.1109/ICSE55347.2025.00174

2025

Shi E, Wang Y, Zhang F, Chen B, Zhang H, Wang Y, Guo D, Du L, Han S, Zhang D, Sun H, 'SoTaNa: An Open-Source Software Engineering Instruction-Tuned Model', Proceedings 2025 IEEE ACM 2nd International Conference on AI Foundation Models and Software Engineering Forge 2025, 26-37 (2025)

DOI	10.1109/Forge66646.2025.00010

2025

Jiang J, Li F, Zhao Z, Ye Z, Liu M, Wang B, Zhang H, Chen J, 'Boosting Redundancy-Based Automated Program Repair by Fine-Grained Pattern Mining', Proceedings 2025 IEEE International Conference on Software Maintenance and Evolution Icsme 2025, 85-97 (2025)

DOI	10.1109/ICSME64153.2025.00018

2025

Pan R, Zhang H, Jiang Z, Hou R, 'AgentDroid: A Multi-Agent Tool for Detecting Fraudulent Android Applications', 00, 4009-4012 (2025)

DOI	10.1109/ase63991.2025.00362

2025

Shi Y, Qian Y, Zhang H, Shen B, Gu X, 'LongCodeZip: Compress Long Context for Code Language Models', 00, 141-153 (2025)

DOI	10.1109/ase63991.2025.00020

2025

Qi J, Luan Z, Huang S, Fung C, Wang Y, Wang A, Zhang H, Yang H, Qian D, 'LogMoE: Lightweight Expert Mixture for Cross-System Log Anomaly Detection', 00, 330-341 (2025)

DOI	10.1109/ase63991.2025.00035

2024

Liu C, Zhang X, Zhang H, Wan Z, Huang Z, Yan M, 'An Empirical Study of Code Search in Intelligent Coding Assistant: Perceptions, Expectations, and Directions', COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 283-293 (2024) [E1]

DOI	10.1145/3663529.3663848
Citations	Scopus - 5

2024

Liu Y, Zhang H, Le VH, Miao Y, Li Z, 'Local Search-based Approach for Cost-effective Job Assignment on Large Language Models', Gecco 2024 Companion Proceedings of the 2024 Genetic and Evolutionary Computation Conference Companion, 719-722 (2024) [E1]

DOI	10.1145/3638530.3654104
Co-authors	Sky Miao

2024

Li X, Le VH, Zhang H, Chen P, 'LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data', Proceedings International Conference on Software Engineering (2024) [E1]

DOI	10.1145/3597503.3608129
Citations	Scopus - 6

2024

Qi B, Sun H, Zhang H, Zhao R, Gao X, 'Modularizing while Training: A New Paradigm for Modularizing DNN Models', Proceedings International Conference on Software Engineering (2024) [E1]

DOI	10.1145/3597503.3608135
Citations	Scopus - 2

2024

Xu Z, Qiang S, Song D, Zhou M, Wan H, Zhao X, Luo P, Zhang H, 'DSFM: Enhancing Functional Code Clone Detection with Deep Subtree Interactions', Proceedings International Conference on Software Engineering, 2733-2744 (2024) [E1]

DOI	10.1145/3597503.3639215
Citations	Scopus - 2

2024

Gao Y, He Y, Li X, Zhao B, Lin H, Liang Y, Zhong J, Zhang H, Wang J, Zeng Y, Gui K, Tong J, Yang M, 'An Empirical Study on Low GPU Utilization of Deep Learning Jobs', Proceedings International Conference on Software Engineering, 1171-1183 (2024) [E1]

DOI	10.1145/3597503.3639232
Citations	Scopus - 4

2024

Liu Y, Zhang H, Li Z, Miao Y, 'Optimizing the Utilization of Large Language Models via Schedule Optimization: An Exploratory Study', International Symposium on Empirical Software Engineering and Measurement, 84-95 (2024)

DOI	10.1145/3674805.3686671
Co-authors	Sky Miao

2024

Xiao Y, Le VH, Zhang H, 'Demonstration-Free: Towards More Practical Log Parsing with Large Language Models', Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, 153-165 (2024) [E1]

DOI	10.1145/3691620.3694994
Citations	Scopus - 6

2024

Sun Z, Wan Y, Li J, Zhang H, Jin Z, Li G, Lyu C, 'Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates', Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, 229-241 (2024) [E1]

DOI	10.1145/3691620.3695000
Citations	Scopus - 5

2024

Pham L, Ha H, Zhang H, 'Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?', Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, 706-718 (2024) [E1]

DOI	10.1145/3691620.3695065
Citations	Scopus - 1

2024

Liu Y, Zhang H, Miao Y, Le VH, Li Z, 'OptLLM: Optimal Assignment of Queries to Large Language Models', Proceedings of the IEEE International Conference on Web Services, ICWS, 788-798 (2024) [E1]

DOI	10.1109/ICWS62655.2024.00098
Citations	Scopus - 2
Co-authors	Sky Miao

2024

Liao Y, Xu M, Lin Y, Teoh X, Xie X, Feng R, Liaw F, Zhang H, Dong JS, 'Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based Normality', Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, 531-543 (2024) [E1]

DOI	10.1145/3691620.3695024
Citations	Scopus - 2

2024

Luo C, Lyu S, Zhao Q, Wu W, Zhang H, Hu C, 'Beyond Pairwise Testing: Advancing 3-wise Combinatorial Interaction Testing for Highly Configurable Systems', Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, 641-653 (2024) [E1]

DOI	10.1145/3650212.3680309
Citations	Scopus - 7

2024

Guo L, Wang Y, Shi E, Zhong W, Zhang H, Chen J, Zhang R, Ma Y, Zheng Z, 'When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention', Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 35, 1073-1085 (2024) [E1]

DOI	10.1145/3650212.3680343
Citations	Scopus - 1

2024

Chen Y, Gao C, Yang Z, Zhang H, Liao Q, 'Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code', Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 1, 274-286 (2024) [E1]

DOI	10.1145/3650212.3652127
Citations	Scopus - 2

2024

Chu Z, Wan Y, Li Q, Wu Y, Zhang H, Sui Y, Xu G, Jin H, 'Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation', Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 4511, 389-401 (2024) [E1]

DOI	10.1145/3650212.3652136
Citations	Scopus - 2

2024

Nguyen HT, Nguyen LV, Le VH, Zhang H, Le MT, 'Efficient Log-based Anomaly Detection with Knowledge Distillation', Proceedings of the IEEE International Conference on Web Services, ICWS, 578-589 (2024) [E1]

DOI	10.1109/ICWS62655.2024.00078
Citations	Scopus - 3

2024

Liu Y, Zhang H, Li Z, Miao Y, 'CPLS: Optimizing the Assignment of LLM Queries', Proceedings - 2024 IEEE International Conference on Software Maintenance and Evolution, ICSME 2024, 151-162 (2024) [E1]

DOI	10.1109/ICSME58944.2024.00024
Co-authors	Sky Miao

2024

Bi Z, Wan Y, Wang Z, Zhang H, Guan B, Lu F, Zhang Z, Sui Y, Jin H, Shi X, 'Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback', Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2336-2353 (2024)

DOI	10.18653/v1/2024.findings-acl.138

2023

Gao Y, Shi X, Lin H, Zhang H, Wu H, Li R, Yang M, 'An Empirical Study on Quality Issues of Deep Learning Platform', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 455-466 (2023) [E1]

DOI	10.1109/ICSE-SEIP58684.2023.00052
Citations	Scopus - 9Web of Science - 2

2023

Le V-H, Zhang H, 'Log Parsing: How Far Can ChatGPT Go?', 2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 1699-1704 (2023) [E1]

DOI	10.1109/ASE56229.2023.00206
Citations	Scopus - 3Web of Science - 9

2023

Shi E, Wang Y, Zhang H, Du L, Han S, Zhang D, Sun H, 'Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond', PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 39-51 (2023)

DOI	10.1145/3597926.3598036
Citations	Scopus - 3Web of Science - 6

2023

Liu J, He S, Chen Z, Li L, Kang Y, Zhang X, He P, Zhang H, Lin Q, Xu Z, Rajmohan S, Zhang D, Lyu MR, 'Incident-aware Duplicate Ticket Aggregation for Cloud Systems', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2299-2311 (2023) [E1]

DOI	10.1109/ICSE48619.2023.00193
Citations	Scopus - 8Web of Science - 1

2023

Zhang J, Wang X, Zhang H, Sun H, Liu X, Hu C, Liu Y, 'Detecting Condition-Related Bugs with Control Flow Graph Neural Network', Issta 2023 Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 1370-1382 (2023)

DOI	10.1145/3597926.3598142

2023

Zeng Z, Zhang Y, Xu Y, Ma M, Qiao B, Zou W, Chen Q, Zhang M, Zhang X, Zhang H, Gao X, Fan H, Rajmohan S, Lin Q, Zhang D, 'TraceArk: Towards Actionable Performance Anomaly Alerting for Online Service Systems', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 258-269 (2023)

DOI	10.1109/ICSE-SEIP58684.2023.00029
Citations	Scopus - 3

2023

Zhao Q, Luo C, Cai S, Wu W, Lin J, Zhang H, Hu C, 'CAmpactor: A Novel and Effective Local Search Algorithm for Optimizing Pairwise Covering Arrays', ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 81-93 (2023) [E1]

DOI	10.1145/3611643.3616284
Citations	Scopus - 8Web of Science - 1

2023

Lin Q, Li T, Zhao P, Liu Y, Ma M, Zheng L, Chintalapati M, Liu B, Wang P, Zhang H, Dang Y, Rajmohan S, Zhang D, 'EDITS: An Easy-to-difficult Training Strategy for Cloud Failure Prediction', ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023, 371-375 (2023) [E1]

DOI	10.1145/3543873.3584630
Citations	Scopus - 1Web of Science - 5

2023

Shi E, Wang Y, Gu W, Du L, Zhang H, Han S, Zhang D, Sun H, 'CoCoSoDa: Effective Contrastive Learning for Code Search', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2198-2210 (2023) [E1]

DOI	10.1109/ICSE48619.2023.00185
Citations	Scopus - 4Web of Science - 12

2023

Le V-H, Zhang H, 'Log Parsing with Prompt-based Few-shot Learning', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2438-2449 (2023) [E1]

DOI	10.1109/ICSE48619.2023.00204
Citations	Scopus - 7Web of Science - 9

2023

Li L, Zhang X, He S, Kang Y, Zhango H, Ma M, Dang Y, Xu Z, Rajmohan S, Lin Q, Zhang D, 'CONAN: Diagnosing Batch Failures for Cloud Systems', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 138-149 (2023) [E1]

DOI	10.1109/ICSE-SEIP58684.2023.00018
Citations	Scopus - 1Web of Science - 4

2023

Meng X, Wang X, Zhang H, Sun H, Liu X, Hu C, 'Template-based Neural Program Repair', Proceedings International Conference on Software Engineering, 1456-1468 (2023)

DOI	10.1109/ICSE48619.2023.00127

2023

Gao S, Zhang H, Gao C, Wang C, 'Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models', Proceedings International Conference on Software Engineering, 30-42 (2023)

DOI	10.1109/ICSE48619.2023.00015

2023

Wen XC, Chen Y, Gao C, Zhang H, Zhang JM, Liao Q, 'Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning', Proceedings International Conference on Software Engineering, 2275-2286 (2023)

DOI	10.1109/ICSE48619.2023.00191

2023

Qi B, Sun H, Gao X, Zhang H, Li Z, Liu X, 'Reusing Deep Neural Network Models through Model Re-engineering', Proceedings International Conference on Software Engineering, 983-994 (2023)

DOI	10.1109/ICSE48619.2023.00090

2023

Wang Y, Guo L, Shi E, Chen W, Chen J, Zhong W, Wang M, Li H, Zhang H, Lyu Z, Zheng Z, 'You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search', Proceedings 2023 IEEE International Conference on Software Maintenance and Evolution Icsme 2023, 14-25 (2023)

DOI	10.1109/ICSME58846.2023.00014

2023

Xu Z, Zhou M, Zhao X, Chen Y, Cheng X, Zhang H, 'xASTNN: Improved Code Representations for Industrial Practice', ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1727-1738 (2023) [E1]

DOI	10.1145/3611643.3613869
Citations	Scopus - 5Web of Science - 1

2023

Gao Y, Gu X, Zhang H, Lin H, Yang M, 'Runtime Performance Prediction for Deep Learning Models with Graph Neural Network', 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 368-380 (2023) [E1]

DOI	10.1109/ICSE-SEIP58684.2023.00039
Citations	Scopus - 4Web of Science - 9

2023

Wang Y, Wang J, Zhang H, Wang K, Wang Q, 'What are Pros and Cons? Stance Detection and Summarization on Feature Request', International Symposium on Empirical Software Engineering and Measurement (2023) [E1]

DOI	10.1109/ESEM56168.2023.10304865

2023

Feng Q, Sui Y, Zhang H, 'Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment', Findings of the Association for Computational Linguistics: EMNLP 2023, 8876-8888 (2023) [E1]

Citations	Scopus - 1

2023

Qiao S, Zhou W, Wen J, Zhang H, Gao M, 'Bi-channel Multiple Sparse Graph Attention Networks for Session-based Recommendation', International Conference on Information and Knowledge Management, Proceedings, 2075-2084 (2023) [E1]

DOI	10.1145/3583780.3614791
Citations	Scopus - 2Web of Science - 4

2023

Hu F, Wang Y, Du L, Li X, Zhang H, Han S, Zhang D, 'Revisiting Code Search in a Two-Stage Paradigm', WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and Data Mining, 994-1002 (2023) [E1]

DOI	10.1145/3539597.3570383
Citations	Scopus - 2

2022

Song X, Yan J, Huang Y, Sun H, Zhang H, 'A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data', 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 344-355 (2022) [E1]

DOI	10.1109/QRS57517.2022.00043
Citations	Scopus - 4Web of Science - 1

2022

Liu Y, Zhang X, He S, Zhang H, Li L, Kang Y, Xu Y, Ma M, Lin Q, Dang Y, Rajmohan S, Zhang D, 'UniParser: A Unified Log Parser for Heterogeneous Log Data', WWW 2022: Proceedings of the ACM Web Conference 2022, 1893-1901 (2022) [E1]

DOI	10.1145/3485447.3511993
Citations	Scopus - 1Web of Science - 3

2022

Wang X, Wu Q, Zhang H, Lyu C, Jiang X, Zheng Z, Lyu L, Hu S, 'HELoC: Hierarchical Contrastive Learning of Source Code Representation', IEEE International Conference on Program Comprehension, 2022-March, 354-365 (2022) [E1]

DOI	10.1145/3524610.3527896
Citations	Scopus - 2Web of Science - 1

2022

Tang W, Wang Y, Zhang H, Han S, Luo P, Zhang D, 'LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries', Proceedings: 2022 Mining Software Repositories Conference (MSR 2022), 423-434 (2022) [E1]

DOI	10.1145/3524842.3528442
Citations	Scopus - 3Web of Science - 1

2022

Gui Y, Wan Y, Zhang H, Huang H, Sui Y, Xu G, Shao Z, Jin H, 'Cross-Language Binary-Source Code Matching with Intermediate Representations', Proceedings: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022), 601-612 (2022) [E1]

DOI	10.1109/SANER53432.2022.00077
Citations	Scopus - 3Web of Science - 8

2022

Chen Z, Liu J, Su Y, Zhang H, Ling X, Yang Y, Lyu MR, 'Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 61-72 (2022) [E1]

DOI	10.1145/3510003.3510085
Citations	Scopus - 4Web of Science - 14

2022

Chai Y, Zhang H, Shen B, Gu X, 'Cross-Domain Deep Code Search with Meta Learning', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 487-498 (2022) [E1]

Citations	Scopus - 3Web of Science - 21

2022

Meng X, Wang X, Zhang H, Sun H, Liu X, 'Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 1169-1180 (2022) [E1]

DOI	10.1145/3510003.3510147
Citations	Scopus - 7Web of Science - 22

2022

Le V-H, Zhang H, 'Log-based Anomaly Detection with Deep Learning: How Far Are We?', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 1356-1367 (2022) [E1]

DOI	10.1145/3510003.3510155
Citations	Scopus - 2Web of Science - 59

2022

Shi E, Wang Y, Du L, Chen J, Han S, Zhang H, Zhang D, Sun H, 'On the Evaluation of Neural Code Summarization', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 1597-1608 (2022) [E1]

DOI	10.1145/3510003.3510060
Citations	Scopus - 6Web of Science - 31

2022

Gao Y, Li Z, Lin H, Zhang H, Wu M, Yang M, 'REFTY: Refinement Types for Valid Deep Learning Models', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 1843-1855 (2022) [E1]

DOI	10.1145/3510003.3510077
Citations	Scopus - 5Web of Science - 2

2022

Wan Y, Zhao W, Zhang H, Sui Y, Xu G, Jin H, 'What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2377-2388 (2022) [E1]

DOI	10.1145/3510003.3510050
Citations	Scopus - 7Web of Science - 36

2022

Wan Y, He Y, Bi Z, Zhang J, Sui Y, Zhang H, et al., 'NATURALCC: An Open-Source Toolkit for Code Intelligence', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), Pittsburgh, PA (2022) [E1]

DOI	10.1145/3510454.3516863

2022

Gu W, Wang Y, Du L, Zhang H, Han S, Zhang D, Lyu MR, 'Accelerating Code Search with Deep Hashing and Code Classification', PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2534-2544 (2022) [E1]

Citations	Scopus - 1Web of Science - 6

2022

Xie Y, Zhang H, Babar MA, 'LogGD: Detecting Anomalies from System Logs with Graph Neural Networks', 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 299-310 (2022) [E1]

DOI	10.1109/QRS57517.2022.00039
Citations	Scopus - 3Web of Science - 7

2022

Liu Y, Yang H, Zhao P, Ma M, Wen C, Zhang H, Luo C, Lin Q, Yi C, Wang J, Zhang C, Wang P, Dang Y, Rajmohan S, Zhang D, 'Multi-task Hierarchical Classification for Disk Failure Prediction in Online Service Systems', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 3438-3446 (2022) [E1]

DOI	10.1145/3534678.3539176
Citations	Scopus - 2Web of Science - 6

2022

Wan Y, He Y, Bi Z, Zhang J, Sui Y, Zhang H, Hashimoto K, Jin H, Xu G, Xiong C, Yu PS, 'NaturalCC: An Open-Source Toolkit for Code Intelligence', 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 149-153 (2022) [E1]

DOI	10.1145/3510454.3516863
Citations	Scopus - 1Web of Science - 4

2022

Ma M, Liu Y, Tong Y, Li H, Zhao P, Xu Y, Zhang H, He S, Wang L, Dang Y, Rajmohan S, Lin Q, 'An empirical investigation of missing data handling in cloud node failure prediction', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1453-1464 (2022) [E1]

DOI	10.1145/3540250.3558946
Citations	Scopus - 2Web of Science - 1

2022

Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR, 'No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 382-394 (2022) [E1]

DOI	10.1145/3540250.3549113
Citations	Scopus - 1Web of Science - 2

2022

Wan Y, Zhang S, Zhang H, Sui Y, Xu G, Yao D, Jin H, Sun L, 'You see what I want you to see: poisoning vulnerabilities in neural code search', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1233-1245 (2022) [E1]

DOI	10.1145/3540250.3549153
Citations	Scopus - 4Web of Science - 1

2022

Zhang Z, Zhang H, Shen B, Gu X, 'Diet code is healthy: simplifying programs for pre-trained models of code', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1073-1084 (2022) [E1]

DOI	10.1145/3540250.3549094
Citations	Scopus - 3Web of Science - 1

2022

Luo C, Zhao Q, Cai S, Zhang H, Hu C, 'SamplingCA: effective and efficient sampling-based pairwise testing for highly configurable software systems', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1185-1197 (2022) [E1]

DOI	10.1145/3540250.3549155
Citations	Scopus - 1Web of Science - 3

2022

Wang X, Zhang X, Li L, He S, Zhang H, Liu Y, Zheng L, Kang Y, Lin Q, Dang Y, Rajmohan S, Zhang D, 'SPINE: a scalable log parser with feedback guidance', ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1198-1208 (2022) [E1]

DOI	10.1145/3540250.3549176
Citations	Scopus - 4Web of Science - 1

2022

Li H, Miao C, Leung C, Huang Y, Huang Y, Zhang H, Wang Y, 'Exploring Representation-Level Augmentation for Code Search', Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, 4924-4936 (2022) [E1]

Citations	Scopus - 1

2022

Shi E, Wang Y, Tao W, Du L, Zhang H, Han S, Zhang D, Sun H, 'RACE: Retrieval-Augmented Commit Message Generation', Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, 5520-5530 (2022) [E1]

Citations	Scopus - 3

2022

Wang L, Zhao P, Du C, Luo C, Su M, Yang F, Liu Y, Lin Q, Wang M, Dang Y, Zhang H, Rajmohan S, Zhang D, 'NENYA: Cascade Reinforcement Learning for Cost-Aware Failure Mitigation at Microsoft 365', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 4032-4040 (2022) [E1]

DOI	10.1145/3534678.3539127
Citations	Scopus - 2Web of Science - 1

2022

Wang Y, Wang J, Zhang H, Ming X, Shi L, Wang Q, 'Where is Your App Frustrating Users?', 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2427-2439 (2022) [E1]

DOI	10.1145/3510003.3510189
Citations	Scopus - 2Web of Science - 8

2022

Qi B, Sun H, Gao X, Zhang H, 'Patching Weak Convolutional Neural Network Models through Modularization and Composition', ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 1-12 (2022) [E1]

DOI	10.1145/3551349.3561153
Citations	Scopus - 1Web of Science - 3

2021

Xie Y, Zhang H, Zhang B, Babar MA, Lu S, 'LogDP: Combining Dependency and Proximity for Log-Based Anomaly Detection', Service-Oriented Computing 19th International Conference, ICSOC 2021 Virtual Event, November 22–25, 2021 Proceedings, 13121 LNCS, 708-716 (2021) [E1]

DOI	10.1007/978-3-030-91431-8_47
Citations	Scopus - 8Web of Science - 6

2021

Luo C, Qiao B, Xing W, Chen X, Zhao P, Chao D, Yao R, Zhang H, Wei W, Shaowei C, Bing H, Saravanakumar R, Qingwei L, 'Correlation-Aware Heuristic Search for Intelligent Virtual Machine Provisioning in Cloud Systems', Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence AAAI 2021, 12363-12372 (2021) [E1]

Citations	Scopus - 2Web of Science - 1

2021

Gao Y, Zhu Y, Zhang H, Lin H, Yang M, 'Resource-Guided Configuration Space Reduction for Deep Learning Models', 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 175-187 (2021) [E1]

DOI	10.1109/ICSE43902.2021.00028
Citations	Scopus - 1Web of Science - 7

2021

Luo C, Zhao P, Chen C, Qiao B, Du C, Zhang H, Wu W, Cai S, He B, Rajmohan S, Lin Q, 'PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector', Proceedings of the AAAI Conference on Artificial Intelligence, 8784-8792 (2021) [E1]

Citations	Scopus - 4Web of Science - 1

2021

Le VH, Zhang H, 'Log-based Anomaly Detection Without Log Parsing', Proceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, 492-504 (2021) [E1]

DOI	10.1109/ASE51524.2021.9678773
Citations	Scopus - 2Web of Science - 9

2021

Tao W, Wang Y, Shi E, Du L, Han S, Zhang H, Zhang D, Zhang W, 'On the Evaluation of Commit Message Generation Models: An Experimental Study', Proceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021, 126-136 (2021) [E1]

DOI	10.1109/ICSME52107.2021.00018
Citations	Scopus - 4Web of Science - 2

2021

Zhang X, Du C, Li Y, Xu Y, Zhang H, Qin S, Li Z, Lin Q, Dang Y, Zhou A, Rajmohan S, Zhang D, 'HALO: Hierarchy-aware Fault Localization for Cloud Systems', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 3948-3958 (2021) [E1]

DOI	10.1145/3447548.3467190
Citations	Scopus - 2Web of Science - 1

2021

Zhang X, Xu Y, Qin S, He S, Qiao B, Li Z, Zhang H, Li X, Dang Y, Lin Q, Chintalapati M, Rajmohan S, Zhang D, 'Onion: Identifying incident-indicating logs for cloud systems', ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1253-1263 (2021) [E1]

DOI	10.1145/3468264.3473919
Citations	Scopus - 4Web of Science - 2

2021

Qiao B, Yang F, Luo C, Wang Y, Li J, Lin Q, Zhang H, Datta M, Zhou A, Moscibroda T, Rajmohan S, Zhang D, 'Intelligent container reallocation at Microsoft 365', ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1438-1443 (2021) [E1]

DOI	10.1145/3468264.3473936
Citations	Scopus - 8Web of Science - 7

2021

Luo C, Sun B, Qiao B, Chen J, Zhang H, Lin J, Lin Q, Zhang D, 'LS-sampling: An effective local search based sampling approach for achieving high t-wise coverage', ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1081-1092 (2021) [E1]

DOI	10.1145/3468264.3468622
Citations	Scopus - 2Web of Science - 1

2021

Dong H, Qin S, Xu Y, Qiao B, Zhou S, Yang X, Luo C, Zhao P, Lin Q, Zhang H, Abuduweili A, Ramanujan S, Subramanian K, Zhou A, Rajmohan S, Zhang D, Moscibroda T, 'Effective low capacity status prediction for cloud systems', ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1236-1241 (2021) [E1]

DOI	10.1145/3468264.3473917
Citations	Scopus - 2Web of Science - 1

2021

Wu D, Jing XY, Zhang H, Zhou Y, Xu B, 'Leveraging Stack Overflow to Detect Relevant Tutorial Fragments of APIs', Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, 119-130 (2021) [E1]

DOI	10.1109/SANER50967.2021.00020
Citations	Scopus - 7Web of Science - 4

2021

Li L, Zhang X, Zhao X, Zhang H, Kang Y, Zhao P, Qiao B, He S, Lee P, Sun J, Gao F, Yang L, Lin Q, Rajmohan S, Xu Z, Zhang D, 'Fighting the Fog of War: Automated Incident Detection for Cloud Systems', PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE, 489-502 (2021) [E1]

Citations	Scopus - 3Web of Science - 19

2021

Gu X, Han YS, Kim S, Zhang H, 'Do bugs propagate? an empirical analysis of temporal correlations among software bugs', 35th European Conference on Object-Oriented Programming. Leibniz International Proceedings in Informatics, 194, 11:1-11:21 (2021) [E1]

DOI	10.4230/LIPIcs.ECOOP.2021.11
Citations	Scopus - 2

2021

Luo C, Qiao B, Chen X, Zhao P, Yao R, Zhang H, Wu W, Zhou A, Lin Q, 'Intelligent Virtual Machine Provisioning in Cloud Computing', Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 1495-1502 [E1]

Citations	Scopus - 3Web of Science - 2

2021

Luo C, Zhao P, Qiao B, Wu Y, Zhang H, Wu W, Lu W, Dang Y, Rajmohan S, Lin Q, Zhang D, 'NTAM: Neighborhood-temporal attention model for disk failure prediction in cloud platforms', The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021, 1181-1191 (2021) [E1]

DOI	10.1145/3442381.3449867
Citations	Scopus - 3Web of Science - 1

2021

Chen Z, Liu J, Su Y, Zhang H, Wen X, Ling X, Yang Y, Lyu MR, 'Graph-based Incident Aggregation for Large-Scale Online Service Systems', Proceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, 430-442 (2021) [E1]

DOI	10.1109/ASE51524.2021.9678746
Citations	Scopus - 2Web of Science - 1

2021

Wang W, Chen J, Yang L, Zhang H, Zhao P, Qiao B, Kang Y, Lin Q, Rajmohan S, Gao F, Xu Z, Dang Y, Zhang D, 'How Long Will it Take to Mitigate this Incident for Online Service Systems?', Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2021-October, 36-46 (2021) [E1]

DOI	10.1109/ISSRE52982.2021.00017
Citations	Scopus - 1Web of Science - 9

2021

Wang Y, Li G, Wang Z, Kang Y, Zhou Y, Zhang H, Gao F, Sun J, Yang L, Lee P, Xu Z, Zhao P, Qiao B, Li L, Zhang X, Lin Q, 'Fast Outage Analysis of Large-Scale Production Clouds with Service Correlation Mining', 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 885-896 (2021) [E1]

DOI	10.1109/icse43902.2021.00085
Citations	Scopus - 2Web of Science - 1

2021

Luo C, Lin J, Cai S, Chen X, He B, Qiao B, Zhao P, Lin Q, Zhang H, Wu W, Rajmohan S, Zhang D, 'AutoCCAG: An Automated Approach to Constrained Covering Array Generation', 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 201-212 (2021) [E1]

DOI	10.1109/icse43902.2021.00030
Citations	Scopus - 1Web of Science - 1

2021

Chen J, Xu N, Chen P, Zhang H, 'Efficient Compiler Autotuning via Bayesian Optimization', 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 1198-1209 (2021) [E1]

DOI	10.1109/icse43902.2021.00110
Citations	Scopus - 6Web of Science - 3

2021

Shi E, Wang Y, Du L, Zhang H, Han S, Zhang D, Sun H, 'CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees', EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, 4053-4062 (2021) [E1]

DOI	10.18653/v1/2021.emnlp-main.332
Citations	Scopus - 5Web of Science - 2

2021

Kang Y, Wang Z, Zhang H, Chen J, You H, 'APIRecX: Cross-Library API Recommendation via Pre-Trained Language Model', EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, 3425-3436 (2021) [E1]

DOI	10.18653/v1/2021.emnlp-main.275
Citations	Scopus - 2Web of Science - 1

2020

Mirjalili S, Zhang H, Mirjalili S, Chalup S, Noman N, 'A Novel U-Shaped Transfer Function for Binary Particle Swarm Optimisation', Soft Computing for Problem Solving 2019. Proceedings of SocProS 2019, 241-259 (2020) [E1]

DOI	10.1007/978-981-15-3290-0_19
Citations	Scopus - 6
Co-authors	Nasimul Noman, Stephan Chalup

2020

Zhou J, Li F, Dong J, Zhang H, Hao D, 'Cost-Effective Testing of a Deep Learning Model through Input Reduction', 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 289-300 (2020) [E1]

DOI	10.1109/ISSRE5003.2020.00035
Citations	Scopus - 2Web of Science - 1

2020

Xu Y, Sui K, Yao R, Zhang H, Lin Q, Dang Y, Li P, Jiang K, Zhang W, Lou JG, Chintalapati M, Zhang D, 'Improving service availability of cloud systems by predicting disk error', Proceedings of the 2018 Usenix Annual Technical Conference Usenix Atc 2018, 481-493 (2020)

High service availability is crucial for cloud systems. A typical cloud system uses a large number of physical hard disk drives. Disk errors are one of the most importa... [more]

High service availability is crucial for cloud systems. A typical cloud system uses a large number of physical hard disk drives. Disk errors are one of the most important reasons that lead to service unavailability. Disk error (such as sector error and latency error) can be seen as a form of gray failure, which are fairly subtle failures that are hard to be detected, even when applications are afflicted by them. In this paper, we propose to predict disk errors proactively before they cause more severe damage to the cloud system. The ability to predict faulty disks enables the live migration of existing virtual machines and allocation of new virtual machines to the healthy disks, therefore improving service availability. To build an accurate online prediction model, we utilize both disk-level sensor (SMART) data as well as system-level signals. We develop a cost-sensitive ranking-based machine learning model that can learn the characteristics of faulty disks in the past and rank the disks based on their error-proneness in the near future. We evaluate our approach using real-world data collected from a production cloud system. The results confirm that the proposed approach is effective and outperforms related methods. Furthermore, we have successfully applied the proposed approach to improve service availability of Microsoft Azure.

Citations	Scopus - 103

2020

Zhang B, Zhang H, Moscato P, Zhang A, 'Anomaly Detection via Mining Numerical Workflow Relations from Logs', 2020 International Symposium on Reliable Distributed Systems (SRDS), 195-204 (2020) [E1]

DOI	10.1109/SRDS51746.2020.00027
Citations	Scopus - 2Web of Science - 1
Co-authors	Pablo Moscato

2020

Shu Y, Sui Y, Zhang H, Xu G, 'Perf-AL: Performance Prediction for Configurable Software through Adversarial Learning', Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (2020) [E1]

DOI	10.1145/3382494.3410677
Citations	Scopus - 1

2020

Zhang J, Wang X, Zhang H, Sun H, Pu Y, Liu X, 'Learning to Handle Exceptions', 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 29-41 (2020) [E1]

Citations	Scopus - 2Web of Science - 1

2020

Zhang R, Xiao W, Zhang H, Liu Y, Lin H, Yang M, 'An Empirical Study on Program Failures of Deep Learning Jobs', Proceedings of the 2020 ACM/IEEE 42nd International Conference on Software Engineering (ICSE), 1159-1170 (2020) [E1]

DOI	10.1145/3377811.3380362
Citations	Scopus - 9Web of Science - 6

2020

Zhang J, Wang X, Zhang H, Sun H, Liu X, 'Retrieval-Based Neural Source Code Summarization', Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 1385-1397 (2020) [E1]

DOI	10.1145/3377811.3380383
Citations	Scopus - 2Web of Science - 1

2020

Chen Y, Yang X, Dong H, He X, Zhang H, Lin Q, Chen J, Zhao P, Kang Y, Gao F, Xu Z, Zhang D, 'Identifying Linked Incidents in Large-Scale Online Service Systems', Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 304-314 (2020) [E1]

DOI	10.1145/3368089.3409768
Citations	Scopus - 4Web of Science - 2

2020

Gao Y, Liu Y, Zhang H, Li Z, Zhu Y, Lin H, Yang M, 'Estimating GPU Memory Consumption of Deep Learning Models', Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1342-1352 (2020) [E1]

DOI	10.1145/3368089.3417050
Citations	Scopus - 1Web of Science - 6

2020

Chen Z, Kang Y, Li L, Zhang X, Zhang H, Xu H, Zhou Y, Yang L, Sun J, Xu Z, Dang Y, Gao F, Zhao P, Qiao B, Lin Q, Zhang D, Lyu MR, 'Towards Intelligent Incident Management: Why We Need It and How We Make It', Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1487-1497 (2020) [E1]

DOI	10.1145/3368089.3417055
Citations	Scopus - 8Web of Science - 4

2020

Jiang J, Lu W, Chen J, Lin Q, Zhao P, Kang Y, Zhang H, Xiong Y, Gao F, Xu Z, Dang Y, Zhang D, 'How to Mitigate the Incident? An Effective Troubleshooting Guide Recommendation Technique for Online Service Systems', Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1410-1420 (2020) [E1]

DOI	10.1145/3368089.3417054
Citations	Scopus - 5Web of Science - 3

2020

Gu J, Luo C, Qin S, Qiao B, Lin Q, Zhang H, Li Z, Dang Y, Cai S, Wu W, Zhou Y, Chintalapati M, Zhang D, 'Efficient Incident Identification from Multi-Dimensional Issue Reports via Meta-Heuristic Search', Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 292-303 (2020) [E1]

DOI	10.1145/3368089.3409741
Citations	Scopus - 3Web of Science - 1

2020

Chen J, Zhang S, He X, Lin Q, Zhang H, Hao D, Kang Y, Gao F, Xu Z, Dang Y, Zhang D, 'How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service Systems', ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 373-384 (2020) [E1]

Citations	Scopus - 5Web of Science - 4

2019

Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z, Chen J, He X, Yao R, Lou J-G, Chintalapati M, Shen F, Zhang D, 'Robust Log-based Anomaly Detection on Unstable Log Data', Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 807-817 (2019) [E1]

DOI	10.1145/3338906.3338931
Citations	Scopus - 6Web of Science - 3

2019

Chen J, He X, Lin Q, Zhang H, Hao D, Gao F, Xu Z, Dang Y, Zhang D, 'Continuous Incident Triage for Large-Scale Online Service Systems', 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 364-375 (2019) [E1]

DOI	10.1109/ASE.2019.00042
Citations	Scopus - 8Web of Science - 6

2019

Gu X, Zhang H, Kim S, 'CodeKernel: A Graph Kernel Based Approach to the Selection of API Usage Examples', 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 590-601 (2019) [E1]

DOI	10.1109/ASE.2019.00061
Citations	Scopus - 2Web of Science - 2

2019

Chen J, Wang G, Hao D, Xiong Y, Zhang H, Zhang L, 'History-guided configuration diversification for compiler test-program generation', Proceedings of the 34th International Conference on Automated Software Engineering, 305-316 (2019) [E1]

DOI	10.1109/ASE.2019.00037
Citations	Scopus - 5Web of Science - 3

2019

Lin J, Cai S, Luo C, Lin Q, Zhang H, 'Towards More Efficient Meta-heuristic Algorithms for Combinatorial Test Generation', Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 212-222 (2019) [E1]

DOI	10.1145/3338906.3338914
Citations	Scopus - 2Web of Science - 1

2019

Zhang X, Lin Q, Xu Y, Qin S, Zhang H, Qiao B, Dang Y, Yang X, Cheng Q, Chintalapati M, Wu Y, Hsieh K, Sui K, Meng X, Xu Y, Zhang W, Shen F, Zhang D, 'Cross-dataset Time Series Anomaly Detection for Cloud Systems', Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, 1063-1076 (2019) [E1]

Citations	Scopus - 8Web of Science - 5

2019

Chen J, He X, Lin Q, Xu Y, Zhang H, Hao D, Gao F, Xu Z, Dang Y, Zhang D, 'An Empirical Investigation of Incident Triage for Online Service Systems', 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 111-120 (2019) [E1]

DOI	10.1109/ICSE-SEIP.2019.00020
Citations	Scopus - 1Web of Science - 6

2019

Luo C, Hoos HH, Cai S, Lin Q, Zhang H, Zhang D, 'Local Search with Efficient Automatic Configuration for Minimum Vertex Cover', Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 1297-1304 (2019) [E1]

DOI	10.24963/ijcai.2019/180
Citations	Scopus - 4Web of Science - 3

2019

Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X, 'A Novel Neural Source Code Representation Based on Abstract Syntax Tree', Proceedings of the 41st International Conference on Software Engineering, 783-794 (2019) [E1]

DOI	10.1109/ICSE.2019.00086
Citations	Scopus - 6Web of Science - 3

2019

Ha H, Zhang H, 'DeepPerf: performance prediction for configurable software with deep sparse neural network', Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, 1095-1106 (2019) [E1]

DOI	10.1109/ICSE.2019.00113
Citations	Scopus - 9Web of Science - 6

2019

Chen X, Qiao B, Zhang W, Wu W, Chintalapati M, Zhang D, Lin Q, Luo C, Li X, Zhang H, Xu Y, Dang Y, Sui K, Zhang X, 'Neural feature search: A neural architecture for automated feature engineering', Proceedings - IEEE International Conference on Data Mining, ICDM, 2019-November, 71-80 (2019) [E1]

DOI	10.1109/ICDM.2019.00017
Citations	Scopus - 5Web of Science - 2

2019

Zhang B, Zhang H, Chen J, Hao D, Moscato P, 'Automatic Discovery and Cleansing of Numerical Metamorphic Relations', 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, 235-245 (2019) [E1]

DOI	10.1109/ICSME.2019.00035
Citations	Web of Science - 1
Co-authors	Pablo Moscato

2019

Li C, Zhou M, Gu Z, Gu M, Zhang H, 'Ares: Inferring error specifications through static analysis', Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, 1174-1177 (2019) [E1]

DOI	10.1109/ASE.2019.00130
Citations	Scopus - 6Web of Science - 6

2019

Ha H, Zhang H, 'Performance-Influence Model for Highly Configurable Software with Fourier Learning and Lasso Regression', Proceedings - 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, 470-480 (2019) [E1]

DOI	10.1109/ICSME.2019.00080
Citations	Scopus - 2Web of Science - 2

2019

Chen Y, Zhang H, Yang X, Lin Q, Zhang D, Dong H, Xu Y, Li H, Kang Y, Gao F, Xu Z, Dang Y, 'Outage Prediction and Diagnosis for Cloud Service Systems', The Web Conference. Proceedings of The World Wide Web Conference WWW 2019, 2659-2665 (2019) [E1]

DOI	10.1145/3308558.3313501
Citations	Scopus - 7Web of Science - 5

2019

Zhang B, Zhang H, Chen J, Hao D, Moscato P, 'AutoMR: Automatic Discovery and Cleansing of Numerical Metamorphic Relations', 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 246-246 (2019)

DOI	10.1109/ICSME.2019.00036
Citations	Scopus - 3Web of Science - 3
Co-authors	Pablo Moscato

2018

Lin Q, Hsieh K, Dang Y, Zhang H, Sui K, Xu Y, Lou JG, Li C, Wu Y, Yao R, Chintalapati M, Zhang D, 'Predicting node failure in cloud service systems', ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 480-490 (2018) [E1]

DOI	10.1145/3236024.3236060
Citations	Scopus - 1Web of Science - 7

2018

He S, Lin Q, Lou J-G, Zhang H, Lyu MR, Zhang D, 'Identifying impactful service system problems via log analysis', ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 60-70 (2018) [E1]

DOI	10.1145/3236024.3236083
Citations	Scopus - 1Web of Science - 1

2018

Jiang J, Xiong Y, Zhang H, Gao Q, Chen X, 'Shaping program repair space with existing patches and similar code', ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 298-309 (2018) [E1]

DOI	10.1145/3213846.3213871
Citations	Scopus - 2Web of Science - 1

2018

Tonelli R, Ducasse S, Fenu G, Bracciali A, Amaral V, Arcelli F, Bartoletti M, Bartosz W, Bistarelli S, Buglione L, Challet D, Counsell S, Destefanis G, Duedder B, Eskandari S, Gil Y, Heckel R, Hierons R, Matulevicius R, Morasca S, Orru M, Ricci L, Singh P, Swift S, ter Beek M, Vitaletti A, Zhang H, Zunino R, 'Message from the chairs', 2018 IEEE 1st International Workshop on Blockchain Oriented Software Engineering Iwbose 2018 Proceedings, 2018-January (2018)

DOI	10.1109/IWBOSE.2018.8327563
Citations	Scopus - 2

2018

Abreu R, Zhang H, 'Message from the QRS 2018 program chairs', Proceedings 2018 IEEE 18th International Conference on Software Quality Reliability and Security Qrs 2018 (2018)

DOI	10.1109/QRS.2018.00007

2018

Abreu R, Zhang H, 'Message from the QRS 2018 Program Chairs', Proceedings 2018 IEEE 18th International Conference on Software Quality Reliability and Security Companion Qrs C 2018 (2018)

DOI	10.1109/QRS-C.2018.00007

2018

Galster M, Zhang H, 'Message from the ASWEC 2018: Short research paper program committee chairs', Proceedings 25th Australasian Software Engineering Conference ASWEC 2018 (2018)

DOI	10.1109/ASWEC.2018.00007

2018

Washizaki H, Zhang H, 'Message from the APSEC 2018 Program Co-Chairs', Proceedings Asia Pacific Software Engineering Conference APSEC, 2018-December, xvii-xviii (2018)

DOI	10.1109/APSEC.2018.00006

2018

Lin Q, Ke W, Lou JG, Zhang H, Sui K, Xu Y, Zhou Z, Qiao B, Zhang D, 'BigIN4: Instant, interactive insight identification for multi-dimensional big data', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 547-555 (2018) [E1]

DOI	10.1145/3219819.3219867
Citations	Scopus - 1Web of Science - 1

2018

Xu Y, Sui K, Yao R, Zhang H, Lin Q, Dang Y, Li P, Jiang K, Zhang W, Lou J-G, Chintalapati M, Zhang D, 'Improving Service Availability of Cloud Systems by Predicting Disk Error', Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18), 481-494 (2018) [E1]

Citations	Scopus - 45Web of Science - 8

2018

Barbar M, Sui Y, Zhang H, Chen S, Xue J, 'Live Path CFI Against Control Flow Hijacking Attacks', Information Security and Privacy: 23rd Australasian Conference, ACISP 2018, 10946, 768-779 (2018) [E1]

DOI	10.1007/978-3-319-93638-3_45
Citations	Scopus - 2Web of Science - 1

2018

Barbar M, Sui Y, Zhang H, Chen S, Xue J, 'Poster: Live Path Control Flow Integrity', PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 195-196 (2018)

DOI	10.1145/3183440.3195093
Citations	Scopus - 2Web of Science - 2

2018

Wu R, Wen M, Cheung S-C, Zhang H, 'ChangeLocator: Locate Crash-Inducing Changes Based on Crash Reports', PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 536-536 (2018)

DOI	10.1145/3180155.3182516
Citations	Web of Science - 1

2018

Gu X, Zhang H, Kim S, 'Deep code search', ICSE '18 Proceedings of the 40th International Conference on Software Engineering, 933-944 (2018) [E1]

DOI	10.1145/3180155.3180167
Citations	Scopus - 5Web of Science - 3

2017

Li Z, Jing X, Zhu X, Zhang H, 'Heterogeneous Defect Prediction Through Multiple Kernel Learning and Ensemble Learning', 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), 91-102 (2017) [E1]

DOI	10.1109/ICSME.2017.19
Citations	Scopus - 5Web of Science - 4

2017

Gu X, Zhang H, Zhang D, Kim S, 'DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning', Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, , August 19-25, 2017, 3675-3681 (2017) [E1]

DOI	10.24963/ijcai.2017/514
Citations	Scopus - 4Web of Science - 3

2017

Chen J, Bai Y, Hao D, Xiong Y, Zhang H, Xie B, 'Learning to prioritize test programs for compiler testing', ICSE'17 Proceedings of the 39th International Conference on Software Engineering, 700-711 (2017) [E1]

DOI	10.1109/ICSE.2017.70
Citations	Scopus - 8Web of Science - 5

2017

Shu C, Zhang H, 'Neural Programming by Example', Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 1539-1545 (2017) [E1]

Citations	Scopus - 1Web of Science - 1

2016

Chen J, Hu W, Hao D, Xiong Y, Zhang H, Zhang L, Xie B, 'An empirical comparison of compiler testing techniques', Proceedings of the 38th International Conference on Software Engineering, 14-22-May-2016, 180-190 (2016) [E1]

DOI	10.1145/2884781.2884878
Citations	Scopus - 9Web of Science - 7

2016

Lv F, Zhang H, Lou JG, Wang S, Zhang D, Zhao J, 'CodeHow: Effective code search based on api understanding and extended boolean model', Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering ICSE 2015: Volume 2, 260-270 (2016) [E1]

DOI	10.1109/ASE.2015.42
Citations	Scopus - 2Web of Science - 1

2016

Lin Q, Lou JG, Zhang H, Zhang D, 'IDice: Problem identification for emerging issues', Proceedings of the 2016 IEEE/ACM 38th IEEE International Conference on Software Engineering ICSE 2016, 14-22-May-2016, 214-224 (2016) [E1]

DOI	10.1145/2884781.2884795
Citations	Scopus - 6Web of Science - 4

2016

Wu R, Xiao X, Cheung SC, Zhang H, Zhang C, 'Casper: An efficient approach to call trace collection', Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 20-22-January-2016, 678-690 (2016) [E1]

DOI	10.1145/2837614.2837619
Citations	Scopus - 1Web of Science - 9

2016

Zhou M, Cheng X, Guo X, Gu M, Zhang H, Song X, 'Improving Failure Detection by Automatically Generating Test Cases Near the Boundaries', Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications Conference Workshops COMPSACW 2016, 1, 164-173 (2016) [E1]

DOI	10.1109/COMPSAC.2016.137
Citations	Scopus - 3Web of Science - 2

2016

Chen J, Bai Y, Hao D, Xiong Y, Zhang H, Zhang L, Xie B, 'Test Case Prioritization for Compilers: A Text-Vector Based Approach', Proceedings of the 2016 IEEE International Conference on Software Testing, Verification and Validation ICST 2016, 266-277 (2016) [E1]

DOI	10.1109/ICST.2016.19
Citations	Scopus - 5Web of Science - 4

2016

Lin Q, Zhang H, Lou Y, Zhang Y, Chen X, 'Log Clustering based problem identification for online service systems', ICSE '16 Proceedings of the 38th International Conference on Software Engineering Companion, 102-111 (2016) [E1]

DOI	10.1145/2889160.2889232
Citations	Scopus - 4Web of Science - 2

2016

Gu X, Zhang H, Zhang D, Kim S, 'Deep API Learning', FSE'16: Proceedings of the 2016 24TH ACM SIGSOFT International Symposium on Foundations of Software Engineering, 631-642 (2016) [E1]

DOI	10.1145/2950290.2950334
Citations	Scopus - 4Web of Science - 3

2016

Zhang H, Jain A, Khandelwal G, Kaushik C, Ge S, Hu W, 'Bing Developer Assistant: Improving Developer Productivity by Recommending Sample Code', FSE'16: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 956-961 (2016) [E1]

DOI	10.1145/2950290.2983955
Citations	Scopus - 5Web of Science - 3

2015

Ding S, Tan HBK, Zhang H, 'ABOR: An Automatic Framework for Buffer Overflow Removal in C/C plus plus Programs', Enterprise Information Systems: 16th International Conference (ICEIS 2014), 227, 204-221 (2015) [E1]

DOI	10.1007/978-3-319-22348-3_12

2015

Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D, 'Learning to log: Helping developers make informed logging decisions', Proceedings of the Proceedings 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering ICSE 2015: Volume 2, 1, 415-425 (2015) [E1]

DOI	10.1109/ICSE.2015.60
Citations	Scopus - 2Web of Science - 1

2015

Zhou H, Lou JG, Zhang H, Lin H, Lin H, Qin T, 'An Empirical Study on Quality Issues of Production Big Data Platform', Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering ICSE 2015: Volume 2, 2, 17-26 (2015) [E1]

DOI	10.1109/ICSE.2015.130
Citations	Scopus - 3Web of Science - 2

2015

Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T, 'Log2: a cost-aware logging mechanism for performance diagnosis', Proceedings of the 2015 USENIX Annual Technical Conference, 139-150 (2015) [E1]

Logging has been a common practice for monitoring and diagnosing performance issues. However, logging comes at a cost, especially for large-scale online service systems... [more]

Logging has been a common practice for monitoring and diagnosing performance issues. However, logging comes at a cost, especially for large-scale online service systems. First, the overhead incurred by intensive logging is non-negligible. Second, it is costly to diagnose a performance issue if there are a tremendous amount of redundant logs. Therefore, we believe that it is important to limit the overhead incurred by logging, without sacrificing the logging effectiveness. In this paper we propose Log2, a cost-aware logging mechanism. Given a "budget" (defined as the maximum volume of logs allowed to be output in a time interval), Log2 makes the "whether to log" decision through a two-phase filtering mechanism. In the first phase, a large number of irrelevant logs are discarded efficiently. In the second phase, useful logs are cached and output while complying with logging budget. In this way, Log2 keeps the useful logs and discards the less useful ones. We have implemented Log2 and evaluated it on an open source system as well as a real-world online service system from Microsoft. The experimental results show that Log2 can control logging overhead while preserving logging effectiveness.

Citations	Scopus - 9

2015

Lim MH, Lou JG, Zhang H, Fu Q, Teoh ABJ, Lin Q, Ding R, Zhang D, 'Identifying Recurrent and Unknown Performance Issues', Proceedings of the 14th IEEE International Conference on Data Mining (ICDM 2014), 2015-January, 320-329 (2015) [E1]

For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before... [more]

For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

DOI	10.1109/ICDM.2014.96
Citations	Scopus - 3Web of Science - 2

2014

, 'Proceedings of the 9th International Workshop on Advanced Modularization Techniques, AOAsia 2014, Hong Kong, China, November 16, 2014', AOAsia@SIGSOFT FSE (2014)

2014

Liu K, Tan HBK, Zhang H, 'Mining key and referential constraints enforcement patterns.', SAC, 850-854 (2014)

2014

, 'Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics, WETSoM 2014, Hyderabad, India, June 3, 2014', WETSoM (2014)

2014

Ding S, Tan HBK, Zhang H, 'Automatic Removal of Buffer Overflow Vulnerabilities in C/C++ Programs.', ICEIS (2), 49-59 (2014)

2014

Wong CP, Xiong Y, Zhang H, Hao D, Zhang L, Mei H, 'Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis', Proceedings of the 30th International Conference on Software Maintenance and Evolution (ICSME 2014), 181-190 (2014) [E1]

DOI	10.1109/ICSME.2014.40
Citations	Scopus - 1Web of Science - 1

2014

Ding S, Zhang H, Tan HBK, 'Detecting Infeasible Branches Based on Code Patterns', 2014 Software Evolution Week: IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), 74-83 (2014) [E1]

Citations	Web of Science - 6

2014

Counsell S, Marchesi M, Venkatasubramanyam R, Visaggio A, Zhang H, 'Message from the Chairs', 5th International Workshop on Emerging Trends in Software Metrics Wetsom 2014 Proceedings, iii-iv (2014)

2014

Wu R, Zhang H, Cheung SC, Kim S, 'Crashlocator: Locating crashing faults based on crash stacks', 2014 International Symposium on Software Testing and Analysis (ISSTA 2014 - Proceedings), 204-214 (2014) [E1]

Software crash is common. When a crash occurs, software developers can receive a report upon user permission. A crash report typically includes a call stack at the time... [more]

Software crash is common. When a crash occurs, software developers can receive a report upon user permission. A crash report typically includes a call stack at the time of crash. An important step of debugging a crash is to identify faulty functions, which is often a tedious and labor-intensive task. In this paper, we propose CrashLocator, a method to locate faulty functions using the crash stack information in crash reports. It deduces possible crash traces (the failing execution traces that lead to crash) by expanding the crash stack with functions in static call graph. It then calculates the suspiciousness of each function in the approximate crash traces. The functions are then ranked by their suspiciousness scores and are recommended to developers for further investigation. We evaluate our approach using real-world Mozilla crash data. The results show that our approach is effective: We can locate 50.6%, 63.7% and 67.5% of crashing faults by examining top 1, 5 and 10 functions recommended by CrashLocator, respectively. Our approach outperforms the conventional stack-only methods significantly.

DOI	10.1145/2610384.2610386
Citations	Scopus - 1

2014

Cao Y, Zhang H, Ding S, 'Symcrash: Selective recording for reproducing crashes', ASE 2014 - Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, 791-802 (2014)

Software often crashes despite tremendous effort on software quality assurance. Once developers receive a crash report, they need to reproduce the crash in order to und... [more]

Software often crashes despite tremendous effort on software quality assurance. Once developers receive a crash report, they need to reproduce the crash in order to understand the problem and locate the fault. However, limited information from crash reports often makes crash reproduction difficult. Many "captureand-replay" techniques have been proposed to automatically capture program execution data from the failing code, and help developers replay the crash scenarios based on the captured data. However, such techniques often suffer from heavy overhead and introduce privacy concerns. Recently, methods such as BugRedux were proposed to generate test input that leads to crash through symbolic execution. However, such methods have inherent limitations because they rely on conventional symbolic execution techniques. In this paper, we propose a dynamic symbolic execution method called SymCon, which addresses the limitation of conventional symbolic execution by selecting functions that are hard to be resolved by a constraint solver and using their concrete runtime values to replace the symbols. We then propose SymCrash, a selective recording approach that only instruments and monitors the hard-to-solve functions. SymCrash can generate test input for crashes through SymCon. We have applied our approach to successfully reproduce 13 failures of 6 real-world programs. Our results confirm that the proposed approach is suitable for reproducing crashes, in terms of effectiveness, overhead, and privacy. It also outperforms the related methods.

DOI	10.1145/2642937.2642993
Citations	Scopus - 2

2014

Sun C, Zhang H, Lou JG, Zhang H, Wang Q, Zhang D, Khoo SC, 'Querying sequential software engineering data', 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2014) Proceedings, 16-21-November-2014, 700-710 (2014) [E1]

We propose a pattern-based approach to effectively and efficiently analyzing sequential software engineering (SE) data. Different from other types of SE data, sequentia... [more]

We propose a pattern-based approach to effectively and efficiently analyzing sequential software engineering (SE) data. Different from other types of SE data, sequential SE data preserves unique temporal properties, which cannot be easily analyzed without much programming effort. In order to facilitate the analysis of sequential SE data, we design a sequential pattern query language (SPQL), which specifies the temporal properties based on regular expressions, and is enhanced with variables and statements to store and manipulate matching states. We also propose a query engine to effectively process the SPQL queries. We have applied our approach to analyze two types of SE data, namely bug report history and source code change history. We experiment with 181,213 Eclipse bug reports and 323,989 code revisions of Android. SPQL enables us to explore interesting temporal properties underneath these sequential data with a few lines of query code and low matching overhead. The analysis results can help better understand a software process and identify process violations.

DOI	10.1145/2635868.2635902
Citations	Scopus - 5Web of Science - 4

2014

Hu H, Zhang H, Xuan J, Sun W, 'Effective bug triage based on historical bug-fix information', Proceedings of the IEEE 25th International Symposium on Software Reliability Engineering, 122-132 (2014) [E1]

For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to deve... [more]

For complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been proposed to automate this process. In this paper, we describe our study on applying conventional bug triage techniques to projects of different sizes. We find that the effectiveness of a bug triage technique largely depends on the size of a project team (measured in terms of the number of developers). The conventional bug triage methods become less effective when the number of developers increases. To further improve the effectiveness of bug triage for large projects, we propose a novel recommendation method called Bug Fixer, which recommends developers for a new bug report based on historical bug-fix information. Bug Fixer constructs a Developer-Component-Bug (DCB) network, which models the relationship between developers and source code components, as well as the relationship between the components and their associated bugs. A DCB network captures the knowledge of 'who fixed what, where'. For a new bug report, Bug Fixer uses a DCB network to recommend to triager a list of suitable developers who could fix this bug. We evaluate Bug Fixer on three large-scale open source projects and two smaller industrial projects. The experimental results show that the proposed method outperforms the existing methods for large projects and achieves comparable performance for small projects.

DOI	10.1109/ISSRE.2014.17
Citations	Scopus - 9Web of Science - 7

2013

Hao D, Lan T, Zhang H, Guo C, Zhang L, 'Is This a Bug or an Obsolete Test?', ECOOP 2013 - Object-Oriented Programming, 7920, 602-628 (2013) [E1]

DOI	10.1007/978-3-642-39038-8_25
Citations	Web of Science - 1

2013

Liu K, Tan HBK, Zhang H, 'Has This Bug Been Reported?', Proceedings of the 20th Working Conference on Reverse Engineering (WCRE 2013), 82-91 (2013) [E1]

DOI	10.1109/WCRE.2013.6671283
Citations	Scopus - 1Web of Science - 1

2013

Zhang H, Gong L, Versteeg S, 'Predicting Bug-Fixing Time: An Empirical Study of Commercial Software Projects', Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), 1042-1051 (2013) [E1]

DOI	10.1109/ICSE.2013.6606654
Citations	Scopus - 1Web of Science - 1

2013

Zhang H, Cheung SC, 'A cost-effectiveness criterion for applying software defect prediction models', 2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering Esec Fse 2013 Proceedings, 643-646 (2013)

Ideally, software defect prediction models should help organize software quality assurance (SQA) resources and reduce cost of finding defects by allowing the modules mo... [more]

Ideally, software defect prediction models should help organize software quality assurance (SQA) resources and reduce cost of finding defects by allowing the modules most likely to contain defects to be inspected first. In this paper, we study the cost-effectiveness of applying defect prediction models in SQA and propose a basic cost-effectiveness criterion. The criterion implies that defect prediction models should be applied with caution. We also propose a new metric FN/(FN+TN) to measure the cost-effectiveness of a defect prediction model. Copyright 2013 ACM.

DOI	10.1145/2491411.2494581
Citations	Scopus - 15

2013

Gong J, Zhang H, 'BugMap: A topographic map of bugs', 2013 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering Esec Fse 2013 Proceedings, 647-650 (2013)

A large and complex software system could contain a large number of bugs. It is desirable for developers to understand how these bugs are distributed across the system,... [more]

A large and complex software system could contain a large number of bugs. It is desirable for developers to understand how these bugs are distributed across the system, so they could have a better overview of software quality. In this paper, we describe BugMap, a tool we developed for visualizing large-scale bug location information. Taken source code and bug data as the input, BugMap can display bug localizations on a topographic map. By examining the topographic map, developers can understand how the components and files are affected by bugs. We apply this tool to visualize the distribution of Eclipse bugs across components/files. The results show that our tool is effective for understanding the overall quality status of a large-scale system and for identifying the problematic areas of the system. Copyright 2013 ACM.

DOI	10.1145/2491411.2494582
Citations	Scopus - 9

2013

Hao D, Lan T, Zhang H, Guo C, Zhang L, 'Is this a bug or an obsolete test?', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 7920 LNCS, 602-628 (2013)

In software evolution, developers typically need to identify whether the failure of a test is due to a bug in the source code under test or the obsoleteness of the test... [more]

In software evolution, developers typically need to identify whether the failure of a test is due to a bug in the source code under test or the obsoleteness of the test code when they execute a test suite. Only after finding the cause of a failure can developers determine whether to fix the bug or repair the obsolete test. Researchers have proposed several techniques to automate test repair. However, test-repair techniques typically assume that test failures are always due to obsolete tests. Thus, such techniques may not be applicable in real world software evolution when developers do not know whether the failure is due to a bug or an obsolete test. To know whether the cause of a test failure lies in the source code under test or in the test code, we view this problem as a classification problem and propose an automatic approach based on machine learning. Specifically, we target Java software using the JUnit testing framework and collect a set of features that may be related to failures of tests. Using this set of features, we adopt the Best-first Decision Tree Learning algorithm to train a classifier with some existing regression test failures as training instances. Then, we use the classifier to classify future failed tests. Furthermore, we evaluated our approach using two Java programs in three scenarios (within the same version, within different versions of a program, and between different programs), and found that our approach can effectively classify the causes of failed tests. © 2013 Springer-Verlag Berlin Heidelberg.

DOI	10.1007/978-3-642-39038-8_25
Citations	Scopus - 21

2013

Wang J, Dang Y, Zhang H, Chen K, Xie T, Zhang D, 'Mining succinct and high-coverage API usage patterns from source code', Proceedings of the 2013 10th Working Conference on Mining Software Repositories, 319-328 (2013) [E1]

During software development, a developer often needs to discover specific usage patterns of Application Programming Interface (API) methods. However, these usage patter... [more]

During software development, a developer often needs to discover specific usage patterns of Application Programming Interface (API) methods. However, these usage patterns are often not well documented. To help developers to get such usage patterns, there are approaches proposed to mine client code of the API methods. However, they lack metrics to measure the quality of the mined usage patterns, and the API usage patterns mined by the existing approaches tend to be many and redundant, posing significant barriers for being practical adoption. To address these issues, in this paper, we propose two quality metrics (succinctness and coverage) for mined usage patterns, and further propose a novel approach called Usage Pattern Miner (UP-Miner) that mines succinct and high-coverage usage patterns of API methods from source code. We have evaluated our approach on a large-scale Microsoft codebase. The results show that our approach is effective and outperforms an existing representative approach MAPO. The user studies conducted with Microsoft developers confirm the usefulness of the proposed approach in practice. © 2013 IEEE.

DOI	10.1109/MSR.2013.6624045
Citations	Scopus - 1Web of Science - 1

2012

Zhou J, Zhang H, 'Learning to rank duplicate bug reports', CIKM: Proceedings of the 21st ACM international conference on Information and knowledge management, 852-861 (2012) [E1]

For a large and complex software system, the project team could receive a large number of bug reports. Some bug reports could be duplicates as they essentially report t... [more]

For a large and complex software system, the project team could receive a large number of bug reports. Some bug reports could be duplicates as they essentially report the same problem. It is often tedious and costly to manually check if a newly reported bug is a duplicate of an already reported bug. In this paper, we propose BugSim, a method that can automatically retrieve duplicate bug reports given a new bug report. BugSim is based on learning to rank concepts. We identify textual and statistical features of bug reports and propose a similarity function for bug reports based on the features. We then construct a training set by assembling pairs of duplicate and non-duplicate bug reports. We train the weights of features by applying the stochastic gradient descent algorithm over the training set. For a new bug report, we retrieve candidate duplicate reports using the trained model. We evaluate BugSim using more than 45,100 real bug reports of twelve Eclipse projects. The evaluation results show that the proposed method is effective. On average, the recall rate for the top 10 retrieved reports is 76.11%. Furthermore, BugSim outperforms the previous state-of-art methods that are implemented using SVM and BM25F ext. © 2012 ACM.

DOI	10.1145/2396761.2396869
Citations	Scopus - 4

2012

, 'Proceedings of the 3rd International Workshop on Emerging Trends in Software Metrics, WETSoM 2012, Zurich, Switzerland, June 3, 2012', WETSoM (2012)

2012

Anderson DJ, Concas G, Lunesu MI, Marchesi M, Zhang H, 'A Comparative Study of Scrum and Kanban Approaches on a Real Case Study Using Simulation', Agile Processes in Software Engineering and Extreme Programming, 111, 123-137 (2012) [E1]

DOI	10.1007/978-3-642-30350-0
Citations	Web of Science - 2

2012

Zhou J, Zhang H, Lo D, 'Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports', 2012 34TH International Conference on Software Engineering (ICSE), proceedings, 14-24 (2012) [E1]

DOI	10.1109/ICSE.2012.6227210
Citations	Scopus - 6Web of Science - 4

2012

Dang Y, Wu R, Zhang H, Zhang D, Nobel P, 'ReBucket: A method for clustering duplicate crash reports based on call stack similarity', Proceedings of the 34th International Conference on Software Engineering (ICSE), 1084-1093 (2012) [E1]

DOI	10.1109/ICSE.2012.6227111
Citations	Scopus - 1Web of Science - 8

2012

Gong L, Lo D, Jiang L, Zhang H, 'Diversity Maximization Speedup for Fault Localization', 2012 Proceedings of the 27TH IEEE/ACM International Conference on Automated Software Engineering, 30-39 (2012) [E1]

DOI	10.1145/2351676.2351682
Citations	Scopus - 2Web of Science - 1

2012

Tran MH, Colman A, Han J, Zhang H, 'Modeling and Verification of Context-Aware Systems', APSECW 2012: Proceedings of the 19th Asia-Pacific Software Engineering Conference, 1, 79-84 (2012) [E1]

DOI	10.1109/APSEC.2012.50
Citations	Scopus - 8Web of Science - 2

2012

Wang J, Zhang H, 'Predicting Defect Numbers Based on Defect State Transition Models', ESEM'12: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement, 191-200 (2012) [E1]

DOI	10.1145/2372251.2372287
Citations	Scopus - 2Web of Science - 1

2012

Gong L, Lo D, Jiang L, Zhang H, 'Interactive Fault Localization Leveraging Simple User Feedback', Proceedings of the 28TH IEEE International Conference on Software Maintenance (ICSM), 67-76 (2012) [E1]

DOI	10.1109/ICSM.2012.6405255
Citations	Scopus - 4Web of Science - 3

2012

Ding S, Tan HBK, Liu K, Chandramohan M, Zhang H, 'Detection of buffer overflow vulnerabilities in C/C++ with pattern based limited symbolic evaluation', Proceedings International Computer Software and Applications Conference, 559-564 (2012)

Buffer overflow vulnerability is one of the major security threats for applications written in C/C++. Among the existing approaches for detecting buffer overflow vulner... [more]

Buffer overflow vulnerability is one of the major security threats for applications written in C/C++. Among the existing approaches for detecting buffer overflow vulnerability, though flow sensitive based approaches offer higher precision but they are limited by heavy overhead and the fact that many constraints are unsolvable. We propose a novel method to efficiently detect vulnerable buffer overflows in any given control flow graph through recognizing two patterns. The proposed approach first uses syntax analysis to filter away those branches that cannot possibly comply with any of the two patterns before applying a limited symbolic evaluation for a precise matching against the patterns. The proposed approach only needs to evaluate a limited set of selected branch predicates according to the patterns and avoids the need to deal with a large number of general branch predicates. This significantly improves the scalability while not sacrificing the detection precision. Our experiments demonstrate the scalability and efficiency of the proposed method, which demonstrates its applicability. © 2012 IEEE.

DOI	10.1109/COMPSACW.2012.103
Citations	Scopus - 3

2012

Grieskamp W, Zhang H, 'Message from the QSIC 2012 Industry Track Chairs', Proceedings International Conference on Quality Software (2012)

DOI	10.1109/QSIC.2012.50

2012

Concas G, Canfora G, Tempero E, Zhang H, 'Welcome to 3rd International Workshop on Emerging Trends in Software Metrics (WETSoM 2012)', 2012 3rd International Workshop on Emerging Trends in Software Metrics Wetsom 2012 Proceedings, 3-4 (2012)

Welcome to WETSoM2012, the 3rd International Workshop on Emerging Trends in Software Metrics. Since its start, WETSoM attracted a blend of academic and industrial resea... [more]

Welcome to WETSoM2012, the 3rd International Workshop on Emerging Trends in Software Metrics. Since its start, WETSoM attracted a blend of academic and industrial researchers, creating a stimulating atmosphere to discuss the progresses of software metrics. A key motivation for this workshop is to help overcoming the low impact that software metrics has on current software development. This is pursued by critically examining the evidence for the effectiveness of existing metrics and identifying new directions for metrics. Evidence for existing metrics includes how the metrics have been used in practice and studies showing their effectiveness. Identifying new directions includes use of new theories, such as complex network theory, on which to base metrics. We are pleased that this year WETSoMfeatures 12 technical paper and an exciting keynote on mining developers' communication to assess software quality by Massimiliano di Penta. The program of WETSoM2012 is the result of hard work by many dedicated people; we especially thank the authors of submitted papers and the members of the program committee. Above all, the greatest richness of this workshop is its participants, who shape the discussion and points into new directions for software metrics research and practice. We hope you will have a great time and an unforgettable experience at WETSoM2012. © 2012 IEEE.

DOI	10.1109/WETSoM.2012.6226985

2012

Anderson DJ, Concas G, Lunesu MI, Marchesi M, Zhang H, 'A comparative study of scrum and kanban approaches on a real case study using simulation', Lecture Notes in Business Information Processing, 111 LNBIP, 123-137 (2012)

We present the application of software process modeling and simulation using an agent-based approach to a real case study of software maintenance. The original process ... [more]

We present the application of software process modeling and simulation using an agent-based approach to a real case study of software maintenance. The original process used PSP/TSP; it spent a large amount of time estimating in advance maintenance requests, and needed to be greatly improved. To this purpose, a Kanban system was successfully implemented, that demonstrated to be able to substantially improve the process without giving up PSP/TSP. We customized the simulator and, using input data with the same characteristics of the real ones, we were able to obtain results very similar to that of the processes of the case study, in particular of the original process. We also simulated, using the same input data, the possible application of the Scrum process to the same data, showing results comparable to the Kanban process. © 2012 Springer-Verlag Berlin Heidelberg.

DOI	10.1007/978-3-642-30350-0_9
Citations	Scopus - 27

2011

Li YF, Zhang H, 'Integrating software engineering data using semantic web technologies', Proceedings International Conference on Software Engineering, 211-214 (2011)

A plethora of software engineering data have been produced by different organizations and tools over time. These data may come from different sources, and are often dis... [more]

A plethora of software engineering data have been produced by different organizations and tools over time. These data may come from different sources, and are often disparate and distributed. The integration of these data may open up the possibility of conducting systemic, holistic study of software projects in ways previously unexplored. Semantic Web technologies have been used successfully in a wide array of domains such as health care and life sciences as a platform for information integration and knowledge management. The success is largely due to the open and extensible nature of ontology languages as well as growing tool support. We believe that Semantic Web technologies represent an ideal platform for the integration of software engineering data in a semantic repository. By querying and analyzing such a repository, researchers and practitioners can better understand and control software engineering activities and processes. In this paper, we describe how we apply Semantic Web techniques to integrate object-oriented software engineering data from different sources. We also show how the integrated data can help us answer complex queries about large-scale software projects through a case study on the Eclipse system. © 2011 ACM.

DOI	10.1145/1985441.1985473
Citations	Scopus - 9

2011

Wu R, Zhang H, Kim S, Cheung S-C, 'ReLink: recovering links between bugs and changes', Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, 15-25 (2011) [E1]

Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predictin... [more]

Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics. © 2011 ACM.

DOI	10.1145/2025113.2025120
Citations	Scopus - 3

2011

, 'Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics, WETSoM 2011, Waikiki, Honolulu, HI, USA, May 24, 2011', WETSoM (2011)

2011

Kim S, Zhang H, Wu R, Gong L, 'Dealing with Noise in Defect Prediction', 33rd International Conference on Software Engineering: Proceedings of the Conference, 481-490 (2011) [E1]

DOI	10.1145/1985793.1985859
Citations	Scopus - 3Web of Science - 2

2011

Concas G, Di Penta M, Tempero E, Zhang H, 'Workshop on Emerging Trends in Software Metrics (WETSoM 2011)', 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 1224-+ (2011) [E3]

2011

, 'Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering' (2011)

DOI	10.1145/2025113

2011

Liu K, Tan HBK, Chen X, Zhang H, Padmanabhuni BM, 'Automated extraction of data lifecycle support from database applications', Seke 2011 Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering, 432-437 (2011)

Database application is one of the most common types of systems. Grounded on the simple concept of data lifecycle-any data in database is created from insertion, used v... [more]

Database application is one of the most common types of systems. Grounded on the simple concept of data lifecycle-any data in database is created from insertion, used via selection and modification and terminated at deletion-this paper proposes a novel approach to reverse engineer the data lifecycle automatically from the source code of database applications. The extracted information can be used for the selection of open-source database applications for adaptation. It can also be used for maintenance and verification of database applications. A tool has been developed to implement the proposed approach for PHP-based database applications. Case studies have also been conducted to evaluate the use of the proposed approach.

Citations	Scopus - 4

2011

Concas G, Tempero E, Zhang H, Di Penta M, 'Workshop on Emerging Trends in Software Metrics (WETSoM 2011)', Proceedings International Conference on Software Engineering (2011)

2011

Jarzabek S, Pettersson U, Zhang H, 'University-industry collaboration journey towards product lines', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 6727 LNCS, 223-237 (2011)

Product Lines for mission critical Command and Control systems was a starting point for a long lasting research collaboration between National University of Singapore (... [more]

Product Lines for mission critical Command and Control systems was a starting point for a long lasting research collaboration between National University of Singapore (NUS) and ST Electronics (Info-Software Systems) Pte Ltd (STEE-InfoSoft). Collaboration was intensified by a joint research project, also involving University of Waterloo and Netron Inc. that led to development of reuse technology called XVCL. The contribution of this paper is twofold: First, we describe collaboration modes, factors that were critical to sustain collaboration, and benefits for university and industry gained over years. Among the main benefits, STEE-InfoSoft advanced its reuse practice by applying XVCL in several software Product Line projects, while NUS team received early feedback from STEE-InfoSoft which helped refine XVCL reuse methods and keep academic research in sync with industrial realities. Academic findings and industrial pilots have opened new unexpected research directions. Second, we draw lessons learned from many projects, to explain the general nature and significance of problems addressed with the XVCL approach. © 2011 Springer-Verlag.

DOI	10.1007/978-3-642-21347-2_17
Citations	Scopus - 4

2010

, 'Proceedings of the 2010 ICSE Workshop on Emerging Trends in Software Metrics, WETSoM 2010, Cape Town, South Africa, May 4, 2010', WETSoM (2010)

2010

Zhang H, Jarzabek S, 'A Hybrid Approach to Feature-Oriented Programming in XVCL', SOFTWARE PRODUCT LINES: GOING BEYOND, 6287, 440-+ (2010)

Citations	Scopus - 4Web of Science - 4

2010

Zhang H, Shi B, Zhang L, 'Automatic Checking of License Compliance', 2010 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (2010)

Citations	Scopus - 8Web of Science - 1

2010

Zhang H, Wu R, 'Sampling Program Quality', 2010 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (2010)

Citations	Scopus - 7

2010

Zhang H, Nelson A, Menzies T, 'On the value of learning from defect dense components for software defect prediction', ACM International Conference Proceeding Series (2010)

BACKGROUND: Defect predictors learned from static code measures can isolate code modules with a higher than usual probability of defects. AIMS: To improve those learner... [more]

BACKGROUND: Defect predictors learned from static code measures can isolate code modules with a higher than usual probability of defects. AIMS: To improve those learners by focusing on the defect-rich portions of the training sets. METHOD: Defect data CM1, KC1, MC1, PC1, PC3 was separated into components. A subset of the projects (selected at random) were set aside for testing. Training sets were generated for a NaiveBayes classifier in two ways. In sample the dense treatment, the components with higher than the median number of defective modules were used for training. In the standard treatment, modules from any component were used for training. Both samples were run against the test set and evaluated using recall, probability of false alarm, and precision. In addition, under sampling and over sampling was performed on the defect data. Each method was repeated in a 10-by-10 cross-validation experiment. RESULTS: Prediction models learned from defect dense components out-performed standard method, under sampling, as well as over sampling. In statistical rankings based on recall, probability of false alarm, and precision, models learned from dense components won 4-5 times more often than any other method, and also lost the least amount of times. CONCLUSIONS: Given training data where most of the defects exist in small numbers of components, better defect predictors can be trained from the defect dense components.

DOI	10.1145/1868328.1868350
Citations	Scopus - 15

2010

Canfora G, Concas G, Marchesi M, Tempero E, Zhang H, 'Workshop on Emerging Trends in Software Metrics (WETSoM 2010)', Proceedings International Conference on Software Engineering, 2, 459-460 (2010)

The Workshop on Emerging Trends in Software Metrics aims at bringing together researchers and practitioners to discuss the progress of software metrics. The motivation ... [more]

The Workshop on Emerging Trends in Software Metrics aims at bringing together researchers and practitioners to discuss the progress of software metrics. The motivation for this workshop is the low impact that software metrics has on current software development. The goals of this workshop are to critically examine the evidence for the effectiveness of existing metrics and to identify new directions for development of software metrics. © 2010 ACM.

DOI	10.1145/1810295.1810428

2010

Canfora G, Concas G, Marchesi M, Tempero E, Zhang H, 'Proceedings - International Conference on Software Engineering: Foreword', Proceedings International Conference on Software Engineering (2010)

2009

Liu L, Zhang H, Ma W, Shan Y, Xu J, Peng F, Burda T, 'Understanding Chinese Characteristics of Requirements Engineering', PROCEEDINGS OF THE 2009 17TH IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE, 261-+ (2009)

DOI	10.1109/RE.2009.14
Citations	Scopus - 9Web of Science - 4

2009

Jarzabek S, Xue Y, Zhang H, Lee Y, 'Avoiding Some Common Preprocessing Pitfalls with Feature Queries', APSEC 09: SIXTEENTH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 283-+ (2009)

DOI	10.1109/APSEC.2009.61
Citations	Scopus - 1

2009

Zhang H, 'An Investigation of the Relationships between Lines of Code and Defects', 2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS, 274-283 (2009)

DOI	10.1109/ICSM.2009.5306304
Citations	Scopus - 1Web of Science - 87

2009

Jarzabek S, Zhang H, Lee Y, Xue Y, Shaikh N, 'Increasing Usability of Preprocessing for Feature Management in Product Lines with Queries', 2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, COMPANION VOLUME, 215-+ (2009)

DOI	10.1109/ICSE-COMPANION.2009.5070985
Citations	Scopus - 4Web of Science - 1

2008

Zhang H, 'Exploring Regularity in Source Code: Software Science and Zipf's Law', FIFTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 101-110 (2008)

DOI	10.1109/WCRE.2008.37
Citations	Scopus - 1Web of Science - 11

2008

Zhang H, 'An initial study of the growth of Eclipse defects', Proceedings International Conference on Software Engineering, 141-143 (2008)

We analyze the Eclipse defect data from June 2004 to November 2007, and find that the growth of the number of defects can be well modeled by polynomial functions. Furth... [more]

We analyze the Eclipse defect data from June 2004 to November 2007, and find that the growth of the number of defects can be well modeled by polynomial functions. Furthermore, we can predict the number of future Eclipse defects based on the nature of defect growth. Copyright 2008 ACM.

DOI	10.1145/1370750.1370785
Citations	Scopus - 10

2008

Hongyu Z, 'The scale-free nature of semantic web ontology', Proceeding of the 17th International Conference on World Wide Web 2008 Www 08, 1047-1048 (2008)

Semantic web ontology languages, such as OWL, have been widely used for knowledge representation. Through empirical analysis of real-world ontologies we discover that, ... [more]

Semantic web ontology languages, such as OWL, have been widely used for knowledge representation. Through empirical analysis of real-world ontologies we discover that, like many natural and social phenomenon, the semantic web ontology is also "scale-free".

DOI	10.1145/1367497.1367649
Citations	Scopus - 16

2007

Zhang H, Zhang X, Gu M, 'Predicting defective software components from code complexity measures', 13TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 93-96 (2007)

DOI	10.1109/PRDC.2007.28
Citations	Web of Science - 33

2007

Zhang H, Tan HBK, 'An empirical study of class sizes for large Java systems', 14TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 230-+ (2007)

DOI	10.1109/ASPEC.2007.64
Citations	Web of Science - 16

2007

Zhang H, Tan HBK, 'An empirical study of class sizes for large java systems', Proceedings Asia Pacific Software Engineering Conference APSEC, 230-237 (2007)

We perform an empirical study of class sizes (in terms of Lines of Code) on a number of large Java software systems, and discover an interesting pattern - that many cla... [more]

We perform an empirical study of class sizes (in terms of Lines of Code) on a number of large Java software systems, and discover an interesting pattern - that many classes have only small sizes whereas a few classes have large size. We call this phenomenon the small class phenomenon. Further analysis shows that the class sizes follow the lognormal distribution. Having understood the distribution of class sizes, we then derive a general size estimation model, which reveals the relationship between the size of a large Java system and the number oficiasses the system has. In this paper, we also show that the adoption of objectorientation is a possible cause of the small class phenomenon. We believe our study reveals the regularity that emerges from large-scale object-oriented software construction, and hope our research can contribute to a deep understanding of computer programming. © 2007 IEEE.

DOI	10.1109/APSEC.2007.20
Citations	Scopus - 23

2007

Vieira M, Laranjeiro N, Madeira H, 'Benchmarking the robustness of web services', 13TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 322-329 (2007)

The ability to predict defective modules can help us allocate limited quality assurance resources effectively and efficiently. In this paper, we propose a complexitybas... [more]

The ability to predict defective modules can help us allocate limited quality assurance resources effectively and efficiently. In this paper, we propose a complexitybased method for predicting defect-prone components. Our method takes three code-level complexity measures as input, namely Lines of Code, McCabe's Cyclomatic Complexity and Halstead's Volume, and classifies components as either defective or non-defective. We perform an extensive study of twelve classification models using the public NASA dataseis. Cross-validation results show that our method can achieve good prediction accuracy. This study confirms that static code complexity measures can be useful indicators of component quality. © 2007 IEEE.

DOI	10.1109/PRDC.2007.56
Citations	Scopus - 4Web of Science - 10

2007

Peng D, Jarzabek S, Rajapakse DC, Zhang H, 'Reuse of database access layer components in JEE product lines: Limitations and a possible solution (Case Study)', 19th International Conference on Software Engineering and Knowledge Engineering Seke 2007, 308-313 (2007)

We set up an experiment to evaluate JEE as a platform for product line development. While JEE provides many useful mechanisms for reuse of common services/components, s... [more]

We set up an experiment to evaluate JEE as a platform for product line development. While JEE provides many useful mechanisms for reuse of common services/components, still we found that systematic across-the-board reuse in application domain-specific areas was hard. The main difficulty was the lack of a mechanism to represent groups of similar components in a generic, adaptable form. Such similar components arise as the number of variant features of a product line grows, and we need to accommodate legal combinations of variant features in components of a product line architecture. Such uncontrolled growth of similar component versions hinders productivity of reuse-based development and raises maintenance costs. In the paper, we study the manifestation of this problem in the JEE¿ database access layer. Interactive Development Environments such as NetBeans or JBuilder speed up the development process, but they do not address the source of the problem, which is the lack of mechanisms to design generic components capable of accommodating variant features in various combinations. We filled this gap with a "mixed strategy" solution based on generative programming technique of XVCL applied on top of JEE. In the paper, we highlight the nature of the problems we encountered and our solution. Copyright © (2007) by Knowledge Systems Institute (KSI).

Citations	Scopus - 2

2006

Tan HBK, Zhao Y, Zhang H, 'Estimating LOC for information systems from their conceptual data models', Proceedings International Conference on Software Engineering, 2006, 321-330 (2006)

Effort and cost estimation is crucial in software management. Estimation of software size plays a key role in the estimation. Line of Code (LOG) is still a commonly use... [more]

Effort and cost estimation is crucial in software management. Estimation of software size plays a key role in the estimation. Line of Code (LOG) is still a commonly used software size measure. Despite the fact that software sizing is well recognized as an important problem for more than two decades, there is still much problem in existing methods. Conceptual data model is widely used in the requirements analysis for information systems. It is also not difficult to construct conceptual data models in the early stage of developing information systems. Much characteristic of an information system is actually reflected from its conceptual data model. We explore into the use of conceptual data model for estimating LOC. This paper proposes a novel method for estimating LOG for an information system from its conceptual data model through the use of multiple linear regression model. We have validated the method through collecting samples from both the industry and open-source systems. Copyright 2006 ACM.

DOI	10.1145/1134285.1134331
Citations	Scopus - 14

2006

Jarzabek S, Zhang HY, Ru S, Lam VT, Sun ZX, 'Analysis of meta-programs: An example', INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 16, 77-101 (2006)

Meta-programs are generic, incomplete, adaptable programs that are instantiated at construction time to meet specific requirements. Templates and generative techniques ... [more]

Meta-programs are generic, incomplete, adaptable programs that are instantiated at construction time to meet specific requirements. Templates and generative techniques are examples of meta-programming techniques. Understanding of meta-programs is more difficult than understanding of concrete, executable programs. Static and dynamic analysis methods have been applied to ease understanding of programs - can similar methods be used for meta-programs? In our projects, we build meta-programs with a meta-programming technique called XVCL. Meta-programs in XVCL are organized into a hierarchy of meta-components from which the XVCL processor generates concrete, executable programs that meet specific requirements. We developed an automated system that analyzes XVCL meta-programs, and presents developers with information that helps them work with meta-programs more effectively. Our system conducts both static and dynamic analysis of a. meta-program. An integral part of our solution is a query language, FQL in which we formulate questions about meta-prograin properties. An FQL query processor automatically answers a class of queries. The analysis method described in the paper is specific to XVCL. However, the principle of our approach can be applied to other meta-programming systems. We believe readers interested in metaprogramming in general will find some of the lessons from our experiment interesting and useful. © World Scientific Publishing Company.

DOI	10.1142/S0218194006002689
Citations	Scopus - 1

2005

Sun J, Zhang HY, Li YF, Wang H, 'Formal semantics and verification for feature modeling', ICECCS 2005: 10TH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS, PROCEEDINGS, 303-312 (2005)

Research on features has received much attention in the domain engineering community. Feature modeling plays an important role in the design and implementation of compl... [more]

Research on features has received much attention in the domain engineering community. Feature modeling plays an important role in the design and implementation of complex software systems. However, the presentation and analysis of feature models are still largely informal. There is also an increasing need for methods and tools that can support automated feature model analysis. This paper presents a formal engineering approach to the specification and verification of feature models. A formal semantics for the feature modeling language is defined using first-order logic. It provides a precise and rigorous formal interpretation for the graphical notation. In addition, further validation of the semantics using the Z/EVES theorem prover is presented. Finally, we demonstrate that the consistency of a feature model and its configurations can be automatically verified by encoding the semantics into the Alloy Analyzer. A case study of the Key Word in Context (KWIC) index systems feature model is presented to illustrate the verification process. © 2005 IEEE.

Citations	Scopus - 1Web of Science - 43

2005

Zhang HY, Bradbury JS, Cordy JR, Dingel J, 'Implementation and verification of implicit-invocation systems using source transformation', FIFTH IEEE INTERNATIONAL WORKSHOP ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 87-96 (2005)

DOI	10.1109/SCAM.2005.15
Citations	Web of Science - 3

2003

Zhang HY, Jarzabek S, 'An XVCL approach to handling variants: A KWIC product line example', ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 116-125 (2003)

We developed XVCL (XML-based Variant Configuration Language), a method and tool for product lines, to facilitate handling variants in reusable software assets (such as ... [more]

We developed XVCL (XML-based Variant Configuration Language), a method and tool for product lines, to facilitate handling variants in reusable software assets (such as architecture, code components or UML models). XVCL is a newer version of Bassett's frames [1], a technology that has achieved substantial productivity improvements in large data processing product lines written in COBOL. Despite its simplicity, XVCL can effectively manage a wide range of product line variants from a compact base of meta-components, structured for effective reuse. We applied XVCL in two medium-size product line projects and a number of smaller case studies. In this paper, we communicate XVCL's capabilities to support product lines by means of a simple, but still interesting, example of the KWIC system introduced by Parnas in 1970's. We show how we can handle functional variants, variant design decisions and implementation-level variants in a generic KWIC system.

Citations	Scopus - 6Web of Science - 3

2003

Jarzabek S, Ong WC, Zhang HY, 'Handling variant requirements in domain modeling', JOURNAL OF SYSTEMS AND SOFTWARE, 68, 171-182 (2003)

Domain models describe common and variant requirements for a family of similar systems. Although most of the notations, such as UML, are meant for modeling a single sys... [more]

Domain models describe common and variant requirements for a family of similar systems. Although most of the notations, such as UML, are meant for modeling a single system, they can be extended to model variants. We have done that and applied such extended notations in our projects. We soon found that our models with variants were becoming overly complicated, undermining the major role of domain analysis which is understanding. One variant was often reflected in many models and any given model was affected by many variants. The number of possible variant combinations was growing rapidly and mutual dependencies among variants even further complicated the domain model. We realized that our purely descriptive domain model was only useful for small examples but it did not scale up. In this paper, we describe a modeling method and a Flexible Variant Configuration tool (FVC for short) that alleviate the above mentioned problems. In our approach, we start by modeling so-called domain defaults, i.e., requirements that characterize a typical system in a domain. Then, we describe variants as deltas in respect to domain defaults. The FVC interprets variants to produce customized domain model views for a system that meets specific requirements. We implemented the above concepts using commercial tools Netron Fusion¿ and Rational Rose¿. In the paper, we illustrate our domain modeling method and tool with examples from the Facility Reservation System domain. © 2003 Elsevier Inc. All rights reserved.

DOI	10.1016/S0164-1212(03)00060-8
Citations	Scopus - 1Web of Science - 9

2003

Jarzabek S, Bassett P, Zhang HY, Zhang WS, 'XVCL: XML-based variant configuration language', 25TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 810-811 (2003)

XML-based Variant Configuration Language (XVCL) is a meta-programming technique and tool that provides effective reuse mechanisms. It includes a methodology and a tool-... [more]

XML-based Variant Configuration Language (XVCL) is a meta-programming technique and tool that provides effective reuse mechanisms. It includes a methodology and a tool-the XVCL processor. The methodology shows how to discover the structure of the solution for the application domain and for the types of variants one wants to address. The XVCL processor automates the routine yet error-prone program construction tasks, allowing to focus on what is novel about the problem domains, requiring creativity.

DOI	10.1109/ICSE.2003.1201298
Citations	Scopus - 7Web of Science - 31

2002

Swe SM, Zhang H, Jarzabek S, 'XVCL: A tutorial', ACM International Conference Proceeding Series, 27, 341-349 (2002)

XVCL (XML-based Variant Configuration Language) is a general-purpose mark-up language for configuring variants in programs and other types of documents. We can apply XV... [more]

XVCL (XML-based Variant Configuration Language) is a general-purpose mark-up language for configuring variants in programs and other types of documents. We can apply XVCL to configure variants in a variety of software assets such as software architecture, program code, test cases, technical and user-level program documentation or requirement specifications. The principles of the XVCL have been thoroughly tested in practice. XVCL is based on the same concepts as the frame technology [1]. Frame technology has been extensively applied in industry to manage variants and evolve multi-million-line, COBOL-based, information systems. An independent analysis showed that frame technology has reduced large software project costs by over 84% and their times-to-market by 70%, when compared to industry norms [1, 2]. At the same time, we found that the principles of XVCL are not easy to communicate. In this paper, we describe a subset of XVCL. We trust this subset of XVCL is easy to understand and still effectively communicates essential XVCL concepts. To illustrate the XVCL method, we further describe an XVCL solution to handling variants in a Notepad system. Copyright 2002 ACM.

DOI	10.1145/568760.568821
Citations	Scopus - 10

2001

Durrani TS, Leyman AR, 'Message from the chairmen', IEEE Workshop on Statistical Signal Processing Proceedings (2001)

2001

Wong TW, Jarzabek S, Swe SM, Shen R, Zhang H, 'XML implementation of frame processor', Proceedings of Ssr 01 2001 Symposium on Software Reusability, 164-172 (2001)

A quantitative study has shown that frame technology [1] supported by Fusion¿ toolset can lead to reduction in time-to-market (70%) and project costs (84%). Frame techn... [more]

A quantitative study has shown that frame technology [1] supported by Fusion¿ toolset can lead to reduction in time-to-market (70%) and project costs (84%). Frame technology has been developed to handle large COBOL-based business software product families. We wished to investigate how the principle of frame approach can be applied to support product families in other application domains, in particular to build distributed component-based systems written in Object-Oriented languages. As Fusion¿ is tightly coupled with COBOL, we implemented our own tools based on frame concepts using the XML technology. In our solution, a generic architecture for a product family is a hierarchy of XML documents. Each such document contains a reusable program fragment instrumented for change with XML tags. We use a tool built on top of XML parsing framework JAXP to process documents in order to produce a custom member of a product family. Our solution is cost-effective and extensible. In the paper, we describe our solution, illustrating its use with examples. We intend to make our solution available to public in order to encourage investigation of frame concepts in other application domains, implementation languages and platforms.

DOI	10.1145/375212.375285
Citations	Scopus - 17

2001

Zhang H, Jarzabek S, Swe SM, 'XVCL approach to separating concerns in product family assets', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2186, 36-47 (2001)

In this paper, we describe an XML-based language, called XVCL, for managing variants in component-based product families. Using XVCL, we can organize product family ass... [more]

In this paper, we describe an XML-based language, called XVCL, for managing variants in component-based product families. Using XVCL, we can organize product family assets and instrument them to accommodate variants. A tool that interprets XVCL and provides semi-automatic support for asset customization is also introduced. In our projects, we applied XVCL to manage variants in UML domain models and in generic architectures for product families. We have achieved simple forms of separation of concerns (in both models and architectures) and we are investigating advanced forms in current work. We plan to compare XVCL to other emerging techniques that lead to separating of concerns in software models, documents, architectures and code.

DOI	10.1007/3-540-44800-4_4
Citations	Scopus - 13

2001

Jarzabek S, Zhang HY, 'XML-based method and tool for handling variant requirements in domain models', FIFTH IEEE INTERNATIONAL SYMPOSIUM ON REQUIREMENTS ENGINEERING, PROCEEDINGS, 166-173 (2001)

A domain model describes common and variant requirements for a system family. UML notations used in requirements analysis and software modeling can be extended with &qu... [more]

A domain model describes common and variant requirements for a system family. UML notations used in requirements analysis and software modeling can be extended with "variation points" to cater for variant requirements. However, UML models for a large single system are already complicated enough. With variants - UML domain models soon become too complicated to be useful. The main reasons are the explosion of possible variant combinations, complex dependencies among variants and inability to trace variants from a domain model down to the requirements for a specific system, member of a family. We believe that the above mentioned problems cannot be solved at the domain model description level alone. In the paper, we propose a novel solution based on a tool that interprets and manipulates domain models to provide analysts with customized, simple domain views. We describe a variant configuration language that allows us to instrument domain models with variation points and record variant dependencies. An interpreter of this language produces customized views of a domain model, helping analysts understand and reuse software models. We describe the concept of our approach and its simple implementation based on XML and XMI technologies.

Citations	Scopus - 3Web of Science - 15

Show 244 more conferences

Journal article (92 outputs)

Year

Citation

Altmetrics

Link

2026

Tian Z, Ma M, Hort M, Sarro F, Zhang H, Chen J, 'Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances', ACM Transactions on Software Engineering and Methodology, 35, 1-23 (2026)

DOI	10.1145/3737697

2026

Yang B, Tian H, Ren J, Zhang H, Klein J, Bissyandé TF, Goues CL, Jin S, 'MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-Tuning', ACM Transactions on Software Engineering and Methodology, 35 (2026)

DOI	10.1145/3735129

2026

Wan Y, Wan G, Zhang S, Zhang H, Sui Y, Zhou P, Jin H, Sun L, 'Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach', ACM Transactions on Software Engineering and Methodology, 35, 1-24 (2026) [C1]

DOI	10.1145/3742785

2026

Huang L, Yan M, Yin T, Sun W, Liu Z, Zhang H, Lo D, 'Steer Your Model: Secure Code Generation with Contrastive Decoding', IEEE Transactions on Software Engineering (2026)

DOI	10.1109/TSE.2025.3650127

2026

Zhu Y, Liu C, He X, Ren X, Liu Z, Pan R, Zhang H, 'AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation', IEEE Transactions on Software Engineering, 52, 631-650 (2026)

DOI	10.1109/TSE.2025.3642621

2026

Wang Y, Wang Y, Wang S, Guo D, Chen J, Grundy J, Liu X, Ma Y, Mao M, Zhang H, Zheng Z, 'RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation', IEEE Transactions on Software Engineering, 52, 675-690 (2026)

DOI	10.1109/TSE.2025.3645056

2026

Liu H, Huang X, Zhou W, Luo F, Wen J, Zhang H, 'Representation-Enhanced Cascading Multi-Level Interest Learning for Multi-Behavior Recommendation', ACM Transactions on Information Systems, 44 (2026)

DOI	10.1145/3786341

2026

Zhang C, Li X, Li B, Deng C, Li M, Zhang S, Yu W, Zhang H, Wang Z, Yang Y, Zeng Y, 'Hypergraph-driven spatial multimodal fusion for precise domain delineation and tumor microenvironment decoding', Communications Biology, 9 (2026) [C1]

DOI	10.1038/s42003-025-09312-0

2025

Wu D, Zhang H, Feng Y, Dong Z, 'MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledge', Journal of Systems and Software, 222 (2025) [C1]

API tutorials are vital resources as they can help developers learn how to use the APIs. An API tutorial is usually split into a number of consecutive units that descri... [more]

API tutorials are vital resources as they can help developers learn how to use the APIs. An API tutorial is usually split into a number of consecutive units that describe the same topic, denoted as tutorial fragments. We treat a tutorial fragment explaining how to use an API as a relevant fragment of the API. Locating relevant tutorial fragments of an API can help developers understand and learn APIs. Existing approaches often train location models using API knowledge from a single resource (e.g., API tutorials). In practice, API knowledge from multiple resources such as API tutorials, Stack Overflow (SO) posts, and API specifications (denoted as multi-source API knowledge) is available to help locate relevant fragments of APIs. While leveraging multi-source API knowledge is intuitively more beneficial, it is a challenging task to use multi-source API knowledge due to diverse distribution and imbalanced distribution issues. Here, the diverse distribution denotes that the data in the same resource are close to each other in the feature space, while data in different resources are far away from each other. The imbalanced distribution denotes that the amount of relevant data is less than the amount of irrelevant data. In this paper, we propose a novel approach called MITU (using Multi-source API knowledge to locate relevant TUtorial fragments) to alleviate these two challenges. For the diverse distribution problem, MITU can project multi-source API knowledge to a correlated space where their distributions become similar. For the imbalanced distribution problem, MITU can minimize the misclassification cost when learning multi-source API knowledge. More specifically, we first collect multi-source API knowledge from API specifications, SO posts, and API tutorials, respectively. Then, we train a cost-sensitive subspace analysis based location model, which can make full use of multi-source API knowledge by addressing issues of diverse and imbalanced distributions. At last, relevant tutorial fragments of APIs can be located by consulting the trained model. We evaluate MITU on Java and Android multi-source API knowledge datasets containing a total of 44,064 samples. Experimental results show that MITU is effective and outperforms the existing approaches. Moreover, our user study confirms the effectiveness of MITU in practice.

DOI	10.1016/j.jss.2024.112296

2025

Li F, Jiang J, Sun J, Zhang H, 'Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis', ACM Transactions on Software Engineering and Methodology, 34 (2025) [C1]

DOI	10.1145/3715004

2025

Li H, Sun W, Yan M, Xu L, Li Q, Zhang X, Zhang H, 'Retrieval-Augmented Fine-Tuning for Improving Retrieve-and-Edit Based Assertion Generation', IEEE Transactions on Software Engineering, 51, 1591-1614 (2025) [C1]

DOI	10.1109/TSE.2025.3558403

2025

Li Z, Zhang H, Jing XY, Yu W, Liu Y, 'Unsupervised Software Defect Prediction Through Multiview Clustering', IEEE Transactions on Reliability, 74, 3356-3370 (2025) [C1]

The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activit... [more]

The core goal of software defect prediction (SDP) is to identify modules with a high likelihood of defects, thereby enabling prioritization of quality assurance activities with low inspection effort. There are many supervised defect prediction models that are extensively studied. However, these methods require the need for labeling data to get enough training modules, which will cause a lot of waste of human resources. Cross-project defect prediction primarily reuses models trained on other projects with enough historical data. However, this strategy is often hindered by large distribution differences across different projects and privacy concerns of data. Unsupervised learning technique is an alternative solution to the unlabeled data, but it mainly focuses on single-view prediction by concatenating all the software metrics. This ignores the diversity and complementarity of different types of metrics. This study proposes a novel approach, namely, multiview unsupervised software defect prediction (MUSDP). It aims to collaboratively learn the diversity and complementarity of different views to build a robust and reliable defect prediction model. Extensive experiments on 28 releases from eight software projects indicate that MUSDP exhibits superior or comparable results regarding G-mean, AUC, Popt, and Recall@20% compared to competing supervised and unsupervised methods. For the interpretation of MUSDP, the number of added and deleted lines significantly influence its predictions.

DOI	10.1109/TR.2025.3548107

2025

Gu W, Chen Y, Wang Y, Zhang H, Gao C, Lyu MR, 'Weakly Supervised Vulnerability Localization via Multiple Instance Learning', ACM Transactions on Software Engineering and Methodology

DOI	10.1145/3768572

2025

Wang Y, Shi E, Du L, Yang X, Hu Y, Wang Y, Guo D, Han S, Zhang H, Zhang D, 'Context-aware code summarization with multi-relational graph neural network', Automated Software Engineering, 32 (2025)

DOI	10.1007/s10515-025-00490-z

2025

Wang Y, Wang J, Zhang H, Ming X, Wang Q, 'Better together: Automated app review analysis with deep multi-task learning', Information and Software Technology, 177 (2025) [C1]

Context: User reviews of mobile apps provide an important communication channel between developers and users. Existing approaches to automated app review analysis mainl... [more]

Context: User reviews of mobile apps provide an important communication channel between developers and users. Existing approaches to automated app review analysis mainly focus on one task (e.g., bug classification task, information extraction task, etc.) at a time, and are often constrained by the manually defined patterns and the ignorance of the correlations among the tasks. Recently, multi-task learning (MTL) has been successfully applied in many scenarios, with the potential to address the limitations associated with app review mining tasks. Objective: In this paper, we propose MABLE, a deep MTL-based and semantic-aware approach, to improve app review analysis by exploiting task correlations. Methods: MABLE jointly identifies the types of involved bugs reported in the review and extracts the fine-grained features where bugs might occur. It consists of three main phases: (1) data preparation phase, which prepares data to allow data sharing beyond single task learning; (2) model construction phase, which employs a BERT model as the shared representation layer to capture the semantic meanings of reviews, and task-specific layers to model two tasks in parallel; (3) model training phase, which enables eavesdropping by shared loss function between the two related tasks. Results: Evaluation results on six apps show that MABLE outperforms ten commonly-used and state-of-the-art baselines, with the precision of 79.76% and the recall of 79.24% for classifying bugs, and the precision of 79.83% and the recall of 80.33% for extracting problematic app features. The MTL mechanism improves the F-measure of two tasks by 3.80% and 4.63%, respectively. Conclusion: The proposed approach provides a novel and effective way to jointly learn two related review analysis tasks, and sheds light on exploring other review mining tasks.

DOI	10.1016/j.infsof.2024.107597

2025

Gu X, Chen M, Lin Y, Hu Y, Zhang H, Wan C, Wei Z, Xu Y, Wang J, 'On the Effectiveness of Large Language Models in Domain-Specific Code Generation', ACM Transactions on Software Engineering and Methodology, 34 (2025) [C1]

DOI	10.1145/3697012
Citations	Scopus - 4

2025

Zeng Y, Xie J, Shangguan N, Wei Z, Li W, Su Y, Yang S, Zhang C, Zhang J, Fang N, Zhang H, Lu Y, Zhao H, Fan J, Yu W, Yang Y, 'CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells', Nature Communications, 16 (2025) [C1]

DOI	10.1038/s41467-025-59926-5

2025

Liu H, Li Z, Zhang H, Jing X-Y, Liu J, 'CFG2AT: Control Flow Graph and Graph Attention Network-Based Software Defect Prediction', IEEE TRANSACTIONS ON RELIABILITY [C1]

DOI	10.1109/TR.2024.3503688
Citations	Scopus - 2

2025

Wu Y, Wan Y, Chu Z, Zhao W, Liu Y, Zhang H, Shi X, Jin H, Yu PS, 'Can Large Language Models Serve as Evaluators for Code Summarization?', IEEE Transactions on Software Engineering, 51, 3205-3217 (2025) [C1]

DOI	10.1109/TSE.2025.3595283

2025

Guo L, Tao W, Jiang R, Wang Y, Chen J, Liu X, Ma Y, Mao M, Zhang H, Zheng Z, 'OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution', Proceedings of the ACM on Software Engineering, 2, 24-46 (2025)

DOI	10.1145/3728871

2025

Li Z, Zhu W, Zhang H, Miao Y, Ren J, 'The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models', Automated Software Engineering, 32 (2025) [C1]

The performance and interpretation of a defect prediction model depend on the software metrics utilized in its construction. Feature selection techniques can enhance mo... [more]

The performance and interpretation of a defect prediction model depend on the software metrics utilized in its construction. Feature selection techniques can enhance model performance and interpretation by effectively removing redundant, correlated, and irrelevant metrics from defect datasets. Previous empirical studies have scrutinized the impact of feature selection techniques on the performance and interpretation of defect prediction models. However, most feature selection techniques examined in these studies are primarily supervised. In particular, the impact of unsupervised feature selection (UFS) techniques on defect prediction remains unknown and needs to be explored extensively. To address this gap, we systematically apply 21 UFS techniques to evaluate their impact on the performance and interpretation of unsupervised defect prediction models in binary classification and effort-aware ranking scenarios. Extensive experiments are conducted on the 28 versions from 8 projects using 4 unsupervised models. We observe that: (1) 10¿100% of the selected metrics are inconsistent between each pair of UFS techniques. (2) 29¿100% of the selected metrics are inconsistent among different software modules. (3) For unsupervised defect prediction models, some UFS techniques (e.g., AutoSpearman, LS, and FMIUFS) exhibit the ability to effectively reduce the number of metrics while maintaining or even improving model performance. (4) UFS techniques alter the ranking of the top 3 groups of metrics in defect models, affecting the interpretation of these models. Based on these findings, we recommend that software practitioners utilize UFS techniques for unsupervised defect prediction. However, caution should be exercised when deriving insights and interpretations from defect prediction models.

DOI	10.1007/s10515-025-00510-y
Co-authors	Sky Miao

2025

Dai Z, Song Y, Qi T, Zhang H, Zhao H, Wang Z, Yang Y, Zeng Y, 'Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks', Advanced Science, 12 (2025)

DOI	10.1002/advs.202502987

2025

Tantithamthavorn CK, Palomba F, Khomh F, Chua JJ, 'MLOps, LLMOps, FMOps, and Beyond', IEEE SOFTWARE, 42, 26-32 (2025)

DOI	10.1109/MS.2024.3477014
Citations	Scopus - 6

2025

Gu W, Lyu Z, Wang Y, Zhang H, Gao C, R. Lyu M, 'SPENCER: Self-Adaptive Model Distillation for Efficient Code Retrieval', ACM Transactions on Software Engineering and Methodology

DOI	10.1145/3765748

2024

Luo C, Song J, Zhao Q, Sun B, Chen J, Zhang H, Lin J, Hu C, 'Solving the t-wise Coverage Maximum Problem via Effective and Efficient Local Search-based Sampling', ACM Transactions on Software Engineering and Methodology (2024) [C1]

DOI	10.1145/3688836
Citations	Scopus - 2

2024

Li B, Sun Z, Huang T, Zhang H, Wan Y, Li G, Jin Z, Lyu C, 'IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion', Proceedings of the ACM on Software Engineering, 1, 182-203 (2024) [C1]

DOI	10.1145/3643735

2024

Wu D, Zhang H, Feng Y, Dong Z, Sun Y, 'The future of API analytics', Automated Software Engineering, 31 (2024) [C1]

Reusing APIs can greatly expedite the software development process and reduce programming effort. To learn how to use APIs, developers often rely on API learning resour... [more]

Reusing APIs can greatly expedite the software development process and reduce programming effort. To learn how to use APIs, developers often rely on API learning resources (such as API references and tutorials) that contain rich and valuable API knowledge. In recent years, numerous API analytic approaches have been presented to help developers mine API knowledge from API learning resources. While these approaches have shown promising results in various tasks, there are many opportunities in this area. In this paper, we discuss several possible future works on API analytics.

DOI	10.1007/s10515-024-00442-z
Citations	Scopus - 1

2024

Li Z, Du Q, Zhang H, Jing XY, Wu F, 'An empirical study of data sampling techniques for just-in-time software defect prediction', Automated Software Engineering, 31 (2024) [C1]

Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalanc... [more]

Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in F-measure, AUC, and MCC. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in Popt, Recall@20%, and F-measure@20%. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.

DOI	10.1007/s10515-024-00455-8
Citations	Scopus - 4

2024

Zeng Y, Song Y, Zhang C, Li H, Zhao Y, Yu W, Zhang S, Zhang H, Dai Z, Yang Y, 'Imputing spatial transcriptomics through gene network constructed from protein language model', COMMUNICATIONS BIOLOGY, 7 (2024) [C1]

DOI	10.1038/s42003-024-06964-2
Citations	Scopus - 5

2024

Liao J, Yang M, Zhou W, Zhang H, Wen J, 'Modeling item exposure and user satisfaction for debiased recommendation with causal inference', Information Sciences, 676 (2024) [C1]

Recommender systems (RSs) aim to provide suggestions for items that are most pertinent to a particular user. Typically, RSs are trained and evaluated directly on the ob... [more]

Recommender systems (RSs) aim to provide suggestions for items that are most pertinent to a particular user. Typically, RSs are trained and evaluated directly on the observed items, raising concerns about exposure bias - many missing items are false negatives, which were not consumed due to lack of exposure rather than lack of affinity. In addition, user satisfaction is often ignored in previous RSs, where consumer behaviors may be influenced by advertisement or promotion instead of actual user interests. In this paper, we propose a novel model-agnostic causal inference method for debiased recommendation, which models item Exposure and user Satisfaction simultaneously with Causal Inference (ESCI). Specifically, we formulate a causal graph to describe the recommendation process, where the ranking score is influenced by item exposure, user satisfaction, and user-item matching. We investigate the change in the ranking score when item exposure is discarded. In addition, we propose an adversarial training strategy to improve the generalization and robustness of recommender systems. During testing, we perform causal inference to remove the effect of item exposure. The comprehensive experimental study on four benchmark datasets demonstrates that the proposed ESCI enhances recommendation performance for users with non-high interaction frequencies, thereby outperforming state-of-the-art baselines.

DOI	10.1016/j.ins.2024.120834
Citations	Scopus - 1

2024

Qi B, Sun H, Zhang H, Gao X, 'Reusing Convolutional Neural Network Models through Modularization and Composition', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 33 (2024) [C1]

With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the publi... [more]

With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN models in a more fine-grained manner. Specifically, we propose two modularization approaches named CNNSplitter and GradSplitter, which can decompose a trained convolutional neural network (CNN) model for N-class classification into N small reusable modules. Each module recognizes one of the N classes and contains a part of the convolution kernels of the trained CNN model. Then, the resulting modules can be reused to patch existing CNN models or build new CNN models through composition. The main difference between CNNSplitter and GradSplitter lies in their search methods: the former relies on a genetic algorithm to explore search space, while the latter utilizes a gradient-based search method. Our experiments with three representative CNNs on three widely used public datasets demonstrate the effectiveness of the proposed approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss, produces much smaller modules (19.88% fewer kernels), and achieves better results on patching weak models. In particular, experiments on GradSplitter show that (1) by patching weak models, the average improvement in terms of precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and (2) for a new task, compared with the models trained from scratch, reusing modules achieves similar accuracy (the average loss of accuracy is only 2.46%) without a costly training process. Our approaches provide a viable solution to the rapid development and improvement of CNN models.

DOI	10.1145/3632744
Citations	Scopus - 5Web of Science - 2

2024

Carleton A, Falessi D, Zhang H, Xia X, 'Generative AI: Redefining the Future of Software Engineering', IEEE SOFTWARE, 41, 34-37 (2024)

DOI	10.1109/MS.2024.3441889
Citations	Scopus - 9

2024

Wan Y, Bi Z, He Y, Zhang J, Zhang H, Sui Y, Xu G, Jin H, Yu P, 'Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit', ACM Computing Surveys, 56 (2024) [C1]

DOI	10.1145/3664597
Citations	Scopus - 2

2024

Dinh CT, Vu TT, Tran NH, Dao MN, Zhang H, 'A New Look and Convergence Rate of Federated Multitask Learning With Laplacian Regularization', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 35, 8075-8085 (2024) [C1]

DOI	10.1109/TNNLS.2022.3224252
Citations	Scopus - 2Web of Science - 12

2024

Wang X, Yu H, Meng X, Cao H, Zhang H, Sun H, Liu X, Hu C, 'MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program Repair', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 33 (2024) [C1]

Fault localization (FL) and automated program repair (APR) are two main tasks of automatic software debugging. Compared with traditional methods, deep learning-based ap... [more]

Fault localization (FL) and automated program repair (APR) are two main tasks of automatic software debugging. Compared with traditional methods, deep learning-based approaches have been demonstrated to achieve better performance in FL and APR tasks. However, the existing deep learning-based FL methods ignore the deep semantic features or only consider simple code representations. And for APR tasks, existing template-based APR methods are weak in selecting the correct fix templates for more effective program repair, which are also not able to synthesize patches via the embedded end-to-end code modification knowledge obtained by training models on large-scale bug-fix code pairs. Moreover, in most of FL and APR methods, the model designs and training phases are performed separately, leading to ineffective sharing of updated parameters and extracted knowledge during the training process. This limitation hinders the further improvement in the performance of FL and APR tasks. To solve the above problems, we propose a novel approach called MTL-TRANSFER, which leverages a multi-task learning strategy to extract deep semantic features and transferred knowledge from different perspectives. First, we construct a large-scale open-source bug datasets and implement 11 multi-task learning models for bug detection and patch generation sub-tasks on 11 commonly used bug types, as well as one multi-classifier to learn the relevant semantics for the subsequent fix template selection task. Second, an MLP-based ranking model is leveraged to fuse spectrum-based, mutation-based and semantic-based features to generate a sorted list of suspicious statements. Third, we combine the patches generated by the neural patch generation sub-task from the multi-task learning strategy with the optimized fix template selecting order gained from the multi-classifier mentioned above. Finally, the more accurate FL results, the optimized fix template selecting order, and the expanded patch candidates are combined together to further enhance the overall performance of APR tasks. Our extensive experiments on widely-used benchmark Defects4J show that MTL-TRANSFER outperforms all baselines in FL and APR tasks, proving the effectiveness of our approach. Compared with our previously proposed FL method TRANSFER-FL (which is also the state-of-the-art statement-level FL method), MTL-TRANSFER increases the faults hit by 8/11/12 on Top-1/3/5 metrics (92/159/183 in total). And on APR tasks, the number of successfully repaired bugs of MTL-TRANSFER under the perfect localization setting reaches 75, which is 8 more than our previous APR method TRANSFER-PR. Furthermore, another experiment to simulate the actual repair scenarios shows that MTL-TRANSFER can successfully repair 15 and 9 more bugs (56 in total) compared with TBar and TRANSFER, which demonstrates the effectiveness of the combination of our optimized FL and APR components.

DOI	10.1145/3654441
Citations	Scopus - 5

2024

Le VH, Zhang H, 'PreLog: A Pre-trained Model for Log Analytics', Annals of the Entomological Society of America, 2 (2024) [C1]

DOI	10.1145/3654966

2024

Wu Y, Wan Y, Zhang H, Sui Y, Wei W, Zhao W, Xu G, Jin H, 'Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study', Annals of the Entomological Society of America, 2 (2024) [E1]

DOI	10.1145/3654992

2024

Wu D, Feng Y, Zhang H, Xu B, 'Automatic recognizing relevant fragments of APIs using API references', Automated Software Engineering, 31 (2024) [C1]

API tutorials are crucial resources as they often provide detailed explanations of how to utilize APIs. Typically, an API tutorial is segmented into a number of consecu... [more]

API tutorials are crucial resources as they often provide detailed explanations of how to utilize APIs. Typically, an API tutorial is segmented into a number of consecutive fragments. If a fragment explains API usage, we regard it as a relevant fragment of the API. Recognizing relevant fragments can aid developers in comprehending, learning, and using APIs. Recently, some studies have presented relevant fragments recognition approaches, which mainly focused on using API tutorials or Stack Overflow to train the recognition model. API references are also important API learning resources as they contain abundant API knowledge. Considering the similarity between API tutorials and API references (both provide API knowledge), we believe that using API knowledge from API references could help recognize relevant tutorial fragments of APIs effectively. However, it is non-trivial to leverage API references to build a supervised learning-based recognition model. Two major problems are the lack of labeled API references and the unavailability of engineered features of API references. We propose a supervised learning based approach named RRTR (which stands for Recognize Relevant Tutorial fragments using API References) to address the above problems. For the problem of lacking labeled API references, RRTR designs heuristic rules to automatically collect relevant and irrelevant API references for APIs. Regarding the unavailable engineered features issue, we adopt the pre-trained SBERT model (SBERT stands for Sentence-BERT) to automatically learn semantic features for API references. More specifically, we first automatically generate labeled ( API, ARE) pairs (ARE stands for an API reference) via our heuristic rules of API references. We then use SBERT to automatically learn semantic features for the collected pairs and train a supervised learning based recognition model. Finally, we can recognize the relevant tutorial fragments of APIs based on the trained model. To evaluate the effectiveness of RRTR, we collected Java and Android API reference datasets containing a total of 20,680 labeled ( API, ARE) pairs. Experimental results demonstrate that RRTR outperforms state-of-the-art approaches in terms of F-Measure on two datasets. In addition, we conducted a user study to investigate the practicality of RRTR and the results further illustrate the effectiveness of RRTR in practice. The proposed RRTR approach can effectively recognize relevant fragments of APIs with API references by solving the problems of lacking labeled API references and engineered features of API references.

DOI	10.1007/s10515-023-00401-0
Citations	Scopus - 3

2024

Xie Y, Zhang H, Babar MA, 'LogSD: Detecting Anomalies from System Logs through Self-Supervised Learning and Frequency-Based Masking', Proceedings of the ACM on Software Engineering, 1, 2098-2120 (2024) [C1]

DOI	10.1145/3660800

2024

Pham L, Ha H, Zhang H, 'BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection', Proceedings of the ACM on Software Engineering, 1, 2214-2237 (2024) [E1]

DOI	10.1145/3660805

2024

Sun W, Guo Z, Yan M, Liu Z, Lei Y, Zhang H, 'Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 50, 2656-2676 (2024) [C1]

Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the spe... [more]

Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker. For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.

DOI	10.1109/TSE.2024.3449917
Citations	Scopus - 1

2024

Tao W, Zhou Y, Wang Y, Zhang H, Wang H, Zhang W, 'KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 33 (2024) [C1]

Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previ... [more]

Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods.

DOI	10.1145/3643675
Citations	Scopus - 1Web of Science - 3

2024

Yang L, Chen J, Gao S, Gong Z, Zhang H, Kang Y, Li H, 'Try with Simpler-An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 33 (2024) [C1]

With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., temp... [more]

With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., templates of log messages) and designing more advanced DL models for anomaly detection. Indeed, the effectiveness of log-based anomaly detection can be improved, but these DL-based techniques further suffer from the limitations of more heavy dependency on training data (such as data quality or data labels) and higher costs in time and resources due to the complexity and scale of DL models, which hinder their practical use. On the contrary, the techniques based on traditional machine learning or data mining algorithms are less dependent on training data and more efficient but produce worse effectiveness than DL-based techniques, which is mainly caused by the problem of unseen log events (some log events in incoming log messages are unseen in training data) confirmed by our motivating study. Intuitively, if we can improve the effectiveness of traditional techniques to be comparable with advanced DL-based techniques, then log-based anomaly detection can be more practical. Indeed, an existing study in the other area (i.e., linking questions posted on Stack Overflow) has pointed out that traditional techniques with some optimizations can indeed achieve comparable effectiveness with the state-of-the-art DL-based technique, indicating the feasibility of enhancing traditional log-based anomaly detection techniques to some degree.Inspired by the idea of "try-with-simpler,"we conducted the first empirical study to explore the potential of improving traditional techniques for more practical log-based anomaly detection. In this work, we optimized the traditional unsupervised PCA (Principal Component Analysis) technique by incorporating a lightweight semantic-based log representation in it, called SemPCA, and conducted an extensive study to investigate the potential of SemPCA for more practical log-based anomaly detection. By comparing seven log-based anomaly detection techniques (including four DL-based techniques, two traditional techniques, and SemPCA) on both public and industrial datasets, our results show that SemPCA achieves comparable effectiveness as advanced supervised/semi-supervised DL-based techniques while being much more stable under insufficient training data and more efficient, demonstrating that the traditional technique can still excel after small but useful adaptation.

DOI	10.1145/3644386
Citations	Scopus - 1

2023

Alharbi F, Luo S, Zhang H, Shaukat K, Yang G, Wheeler CA, Chen Z, 'A Brief Review of Acoustic and Vibration Signal-Based Fault Detection for Belt Conveyor Idlers Using Machine Learning Models', SENSORS, 23 (2023) [C1]

DOI	10.3390/s23041902
Citations	Scopus - 7Web of Science - 28
Co-authors	Craig Wheeler, Zhiyong Chen, Suhuai Luo

2023

Li Z, Zhang H, Jing X-Y, Xie J, Guo M, Ren J, 'DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 49, 1941-1963 (2023) [C1]

Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other project... [more]

Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists large distribution difference between the source and target data. Most CPDP methods neglect to select the appropriate source data for a given target at the project level. More importantly, existing CPDP models are parametric methods, which usually require intensive parameter selection and tuning to achieve better prediction performance. This would hinder wide applicability of CPDP in practice. Moreover, most CPDP methods do not address the cross-project class imbalance problem. These limitations lead to suboptimal CPDP results. In this paper, we propose a novel data selection and sampling based domain programming predictor (DSSDPP) for CPDP, which addresses the above limitations. DSSDPP is a non-parametric CPDP method, which can perform knowledge transfer across projects without the need for parameter selection and tuning. By exploiting the structures of source and target data, DSSDPP can learn a discriminative transfer classifier for identifying defects of the target project. Extensive experiments on 22 projects from four datasets indicate that DSSDPP achieves better MCC and AUC results against a range of competing methods both in the single-source and multi-source scenarios. Since DSSDPP is easy, effective, extensible, and efficient, we suggest that future work can use it with the well-chosen source data to conduct CPDP especially for the projects with limited computational budget.

DOI	10.1109/TSE.2022.3204589
Citations	Scopus - 2Web of Science - 8

2023

Zhang B, Zhang H, Le V-H, Moscato P, Zhang A, 'Semi-supervised and unsupervised anomaly detection by mining numerical workflow relations from system logs', AUTOMATED SOFTWARE ENGINEERING, 30 (2023) [C1]

Large-scale software-intensive systems often generate logs for troubleshooting purpose. The system logs are semi-structured text messages that record the internal statu... [more]

Large-scale software-intensive systems often generate logs for troubleshooting purpose. The system logs are semi-structured text messages that record the internal status of a system at runtime. In this paper, we propose ADR (Anomaly Detection by workflow Relations), which can mine numerical relations from logs and then utilize the discovered relations to detect system anomalies. Firstly the raw log entries are parsed into sequences of log events and transformed to an extended event-count-matrix. The relations among the matrix columns represent the relations among the system events in workflows. Next, ADR evaluates the matrix's nullspace that corresponds to the linearly dependent relations of the columns. Anomalies can be detected by evaluating whether or not the logs violate the mined relations. We design two types of ADR: sADR (for semi-supervised learning) and uADR (for unsupervised learning). We have evaluated them on four public log datasets. The experimental results show that ADR can extract the workflow relations from log data, and is effective for log-based anomaly detection in both semi-supervised and unsupervised manners.

DOI	10.1007/s10515-022-00370-w
Citations	Scopus - 1Web of Science - 6
Co-authors	Pablo Moscato

2023

Wang W, Chen J, Yang L, Zhang H, Wang Z, 'Understanding and predicting incident mitigation time', INFORMATION AND SOFTWARE TECHNOLOGY, 155 (2023) [C1]

DOI	10.1016/j.infsof.2022.107119
Citations	Scopus - 9Web of Science - 3

2023

Wu D, Jing X-Y, Zhang H, Zhou Y, Xu B, 'Leveraging Stack Overflow to detect relevant tutorial fragments of APIs', EMPIRICAL SOFTWARE ENGINEERING, 28 (2023) [C1]

DOI	10.1007/s10664-022-10235-1
Citations	Scopus - 1Web of Science - 8

2023

Wu D, Jing X-Y, Zhang H, Feng Y, Chen H, Zhou Y, Xu B, 'Retrieving API Knowledge from Tutorials and Stack Overflow Based on Natural Language Queries', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 32 (2023) [C1]

DOI	10.1145/3565799
Citations	Scopus - 1Web of Science - 6

2023

Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR, 'Prompt Tuning in Code Intelligence: An Experimental Evaluation', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 49, 4869-4885 (2023) [C1]

Pre-trained models have been shown effective in many code intelligence tasks, such as automatic code summarization and defect prediction. These models are pre-trained o... [more]

Pre-trained models have been shown effective in many code intelligence tasks, such as automatic code summarization and defect prediction. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks are in different forms, it is hard to fully explore the knowledge of pre-trained models. Besides, the performance of fine-tuning strongly relies on the amount of downstream task data, while in practice, the data scarcity scenarios are common. Recent studies in the natural language processing (NLP) field show that prompt tuning, a new paradigm for tuning, alleviates the above issues and achieves promising results in various NLP tasks. In prompt tuning, the prompts inserted during tuning provide task-specific knowledge, which is especially beneficial for tasks with relatively scarce data. In this article, we empirically evaluate the usage and effect of prompt tuning in code intelligence tasks. We conduct prompt tuning on popular pre-trained models CodeBERT and CodeT5 and experiment with four code intelligence tasks including defect prediction, code search, code summarization, and code translation. Our experimental results show that prompt tuning consistently outperforms fine-tuning in all four tasks. In addition, prompt tuning shows great potential in low-resource scenarios, e.g., improving the BLEU scores of fine-tuning by more than 26% on average for code summarization. Our results suggest that instead of fine-tuning, we could adapt prompt tuning for code intelligence tasks to achieve better performance, especially when lacking task-specific data. We also discuss the implications for adapting prompt tuning in code intelligence tasks.

DOI	10.1109/TSE.2023.3313881
Citations	Scopus - 3Web of Science - 4

2023

Gao Y, Zhang H, Lyu C, 'EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization', Empirical Software Engineering, 28 (2023) [C1]

Code summarization aims to generate concise natural language descriptions for a piece of code, which can help developers comprehend the source code. Analysis of current... [more]

Code summarization aims to generate concise natural language descriptions for a piece of code, which can help developers comprehend the source code. Analysis of current work shows that the extraction of syntactic and semantic features of source code is crucial for generating high-quality summaries. To provide a more comprehensive feature representation of source code from different perspectives, we propose an approach named EnCoSum, which enhances semantic features for the multi-scale multi-modal code summarization method. This method complements our previously proposed M2TS approach (multi-scale multi-modal approach based on Transformer for source code summarization), which uses the multi-scale method to capture Abstract Syntax Trees (ASTs) structural information more completely and accurately at multiple local and global levels. In addition, we devise a new cross-modal fusion method to fuse source code and AST features, which can highlight key features in each modality that help generate summaries. To obtain richer semantic information, we improve M2TS. First, we add data flow and control flow to ASTs, and added-edge ASTs, called Enhanced-ASTs (E-ASTs). In addition, we introduce method name sequences extracted in the source code, which exist more knowledge about critical tokens in the corresponding summaries and can help the model generate higher-quality summaries. We conduct extensive experiments on processed Java and Python datasets and evaluate our approach via the four most commonly used machine translation metrics. The experimental results demonstrate that EnCoSum is effective and outperforms current state-of-the-art methods. Further, we perform ablation experiments on each of the model's key components, and the results show that they all contribute to the performance of EnCoSum.

DOI	10.1007/s10664-023-10384-x
Citations	Scopus - 3

2023

Zhang W, Guo S, Zhang H, Sui Y, Xue Y, Xu Y, 'Challenging Machine Learning-Based Clone Detectors via Semantic-Preserving Code Transformations', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 49, 3052-3070 (2023) [C1]

Software clone detection identifies similar or identical code snippets. It has been an active research topic that attracts extensive attention over the last two decades... [more]

Software clone detection identifies similar or identical code snippets. It has been an active research topic that attracts extensive attention over the last two decades. In recent years, machine learning (ML) based detectors, especially deep learning-based ones, have demonstrated impressive capability on clone detection. It seems that this longstanding problem has already been tamed owing to the advances in ML techniques. In this work, we would like to challenge the robustness of the recent ML-based clone detectors through code semantic-preserving transformations. We first utilize fifteen simple code transformation operators combined with commonly-used heuristics (i.e., Random Search, Genetic Algorithm, and Markov Chain Monte Carlo) to perform equivalent program transformation. Furthermore, we propose a deep reinforcement learning-based sequence generation (DRLSG) strategy to effectively guide the search process of generating clones that could escape from the detection. We then evaluate the ML-based detectors with the pairs of original and generated clones. We realize our method in a framework named CloneGen (stands for Clone Generator). CloneGen In evaluation, we challenge the three state-of-the-art ML-based detectors and four traditional detectors with the code clones after semantic-preserving transformations via the aid of CloneGen. Surprisingly, our experiments show that, despite the notable successes achieved by existing clone detectors, the ML models inside these detectors still cannot distinguish numerous clones produced by the code transformations in CloneGen. In addition, adversarial training of ML-based clone detectors using clones generated by CloneGen can improve their robustness and accuracy. Meanwhile, compared with the commonly-used heuristics, the DRLSG strategy has shown the best effectiveness in generating code clones to decrease the detection accuracy of the ML-based detectors. Our investigation reveals an explicable but always ignored robustness issue of the latest ML-based detectors. Therefore, we call for more attention to the robustness of these new ML-based detectors.

DOI	10.1109/TSE.2023.3240118
Citations	Scopus - 3Web of Science - 6

2023

Shi E, Wang Y, Du L, Zhang H, Han S, Zhang D, Sun H, 'CoCoAST: Representing Source Code via Hierarchical Splitting and Reconstruction of Abstract Syntax Trees', EMPIRICAL SOFTWARE ENGINEERING, 28 (2023) [C1]

DOI	10.1007/s10664-023-10378-9
Citations	Scopus - 8Web of Science - 1

2022

Tao W, Wang Y, Shi E, Du L, Han S, Zhang H, Zhang D, Zhang W, 'A large-scale empirical study of commit message generation: models, datasets and evaluation', EMPIRICAL SOFTWARE ENGINEERING, 27 (2022) [C1]

DOI	10.1007/s10664-022-10219-1
Citations	Scopus - 1Web of Science - 8

2022

Qi B, Sun H, Yuan W, Zhang H, Meng X, 'DreamLoc: A Deep Relevance Matching-Based Framework for bug Localization', IEEE TRANSACTIONS ON RELIABILITY, 71, 235-249 (2022) [C1]

To improve the software debugging efficiency, bug localization techniques have been developed to automatically locate buggy files based on bug reports. Traditional info... [more]

To improve the software debugging efficiency, bug localization techniques have been developed to automatically locate buggy files based on bug reports. Traditional information retrieval-based bug localization cannot deal with the lexical mismatch, thus its performance is limited. In recent years, some deep learning models have been proposed to learn the semantics of bug reports and source files to bridge the lexical gap. However, their accuracy is still limited as building accurate semantic representations of bug reports and source files is very challenging. Recently, relevance matching was proposed to identify whether a document is relevant to a given query by considering both local matching and global matching. In this work, we propose a novel framework DreamLoc, which utilizes a relevance matching model to locate buggy files. Specifically, DreamLoc conducts the local matching by employing an attention-based mechanism to calculate the matching scores between bug report terms and code snippets. It also conducts the global matching by employing a gating mechanism to aggregate results of local matching and obtain the final matching score between a bug report and a source file. Since the local matching considers the relevance between each word and the global matching differentiates the importance of words, DreamLoc can effectively model the characteristics of bug reports and source files. Experimental results on five benchmark datasets show that DreamLoc outperforms five state-of-the-art models. For example, compared with DeepLoc, a recently proposed approach, the evaluation measures Accuracy@10, MAP, and MRR are improved by 6.4%, 7.4%, and 7.2%, respectively.

DOI	10.1109/TR.2021.3104728
Citations	Scopus - 2Web of Science - 12

2021

Chen J, Wang G, Hao D, Xiong Y, Zhang H, Zhang L, Xie B, 'Coverage Prediction for Accelerating Compiler Testing', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 47, 261-278 (2021) [C1]

DOI	10.1109/TSE.2018.2889771
Citations	Scopus - 3Web of Science - 21

2021

Wei T, Wang Y, Shi E, Lun D, Shi H, Zhang H, Dongmei Z, Wenqiang Z, 'On the Evaluation of Commit Message Generation Models: An Experimental Study'

DOI	10.26226/morressier.613b5419842293c031b5b634

2021

Gu W, Li Z, Gao C, Wang C, Zhang H, Xu Z, Lyu MR, 'CRaDLe: Deep code retrieval based on semantic Dependency Learning', NEURAL NETWORKS, 141, 385-394 (2021) [C1]

Code retrieval is a common practice for programmers to reuse existing code snippets in the open-source repositories. Given a user query (i.e., a natural language descri... [more]

Code retrieval is a common practice for programmers to reuse existing code snippets in the open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level, and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches on the task.

DOI	10.1016/j.neunet.2021.04.019
Citations	Scopus - 4Web of Science - 28

2021

Lyu C, Wang R, Zhang H, Zhang H, Hu S, 'Embedding API dependency graph for neural code generation', EMPIRICAL SOFTWARE ENGINEERING, 26 (2021) [C1]

The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning ... [more]

The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named "embedder" is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods and maintains its performance as the length of the target code increases. Extensive ablation tests show that the proposed ADG embedding is effective and outperforms the baselines.

DOI	10.1007/s10664-021-09968-2
Citations	Scopus - 1Web of Science - 10

2021

Wu D, Jing X-Y, Zhang H, Li B, Xie Y, Xu B, 'Generating API tags for tutorial fragments from Stack Overflow', EMPIRICAL SOFTWARE ENGINEERING, 26 (2021) [C1]

API tutorials are important learning resources as they explain how to use certain APIs in a given programming context. An API tutorial can be split into a number of uni... [more]

API tutorials are important learning resources as they explain how to use certain APIs in a given programming context. An API tutorial can be split into a number of units. Consecutive units that describe a same topic are often called a tutorial fragment. We consider the API explained by a tutorial fragment as an API tag. Generating API tags for a tutorial fragment can help understand, navigate, and retrieve the fragment. Existing approaches often do not perform well on API tag generation due to high manual effort and low accuracy. Like API tutorials, Stack Overflow (SO) is also an important learning resource that provides the explanations of APIs. Thus, SO posts also contain API tags. Besides, API tags of SO posts are abundant and can be extracted easily. In this paper, we propose a novel approach ATTACK (stands for A PI T ag for T utorial frA gments using C rowd K nowledge), which can automatically generate API tags for tutorial fragments from SO posts. ATTACK first constructs (Q&Apair,tagset) pairs by extracting API tags of SO posts. Then, it trains a deep neural network with the attention mechanism to learn the semantic relatedness between Q&A pairs and the associated API tags, taking into consideration both textual descriptions and code in a Q&A pair. Finally, the trained model is used to generate API tags for tutorial fragments. We evaluate ATTACK on public Java and Android datasets containing 43,132 (Q&Apair,tagset) pairs. Experimental results show that ATTACK is effective and outperforms the state-of-the-art approaches in terms of F-Measure. Our user study further confirms the effectiveness of ATTACK in generating API tags for tutorial fragments. We also apply ATTACK to document linking and the results confirm the usefulness of API tags generated by ATTACK.

DOI	10.1007/s10664-021-09962-8
Citations	Scopus - 1Web of Science - 9

2020

Chen J, Patra J, Pradel M, Xiong Y, Zhang H, Hao D, Zhang L, 'A survey of compiler testing', ACM Computing Surveys, 53 (2020) [C1]

DOI	10.1145/3363562
Citations	Scopus - 1Web of Science - 1

2020

Wu D, Jing X-Y, Zhang H, Kong X, Xie Y, Huang Z, 'Data-drivenapproach to application programming interface documentation mining: A review', WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 10 (2020) [C1]

DOI	10.1002/widm.1369
Citations	Scopus - 1Web of Science - 10

2020

Zhang Z, Sun H, Zhang H, 'Developer recommendation for Topcoder through a meta-learning based policy model', Empirical Software Engineering, 25, 859-889 (2020) [C1]

DOI	10.1007/s10664-019-09755-0
Citations	Scopus - 2Web of Science - 1

2019

Li Z, Jing X-Y, Zhu X, Zhang H, Xu B, Ying S, 'Heterogeneous defect prediction with two-stage ensemble learning', AUTOMATED SOFTWARE ENGINEERING, 26, 599-651 (2019) [C1]

DOI	10.1007/s10515-019-00259-1
Citations	Scopus - 6Web of Science - 44

2019

Mirjalili SZ, Mirjalili S, Zhang H, Chalup S, Noman N, 'Improving the reliability of implicit averaging methods using new conditional operators for robust optimization', Swarm and Evolutionary Computation, 51 (2019) [C1]

DOI	10.1016/j.swevo.2019.100579
Citations	Scopus - 7Web of Science - 5
Co-authors	Stephan Chalup, Nasimul Noman

2019

Chen J, Hu W, Hao D, Xiong Y, Zhang H, Zhang L, 'Static duplicate bug-report identification for compilers', SCIENTIA SINICA Informationis, 49, 1283-1298 (2019) [C1]

DOI	10.1360/N112019-00001

2019

Zhiqiang L, Xiao-Yuan J, Xiaoke Z, Zhang H, Baowen X, Shi Y, 'On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction', IEEE Transactions on Software Engineering, 45, 391-411 (2019) [C1]

DOI	10.1109/TSE.2017.2780222
Citations	Scopus - 7Web of Science - 7

2019

Gu Y, Xuan J, Zhang H, Zhang L, Fan Q, Xie X, Qian T, 'Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence', Journal of Systems and Software, 148, 88-104 (2019) [C1]

DOI	10.1016/j.jss.2018.11.004
Citations	Scopus - 3Web of Science - 3

2018

Zhang H, Miranskyy A, Bener AB, 'Editorial: Special Section on Best Papers of PROMISE 2016', INFORMATION AND SOFTWARE TECHNOLOGY, 95, 295-295 (2018)

DOI	10.1016/j.infsof.2017.12.014
Citations	Web of Science - 4

2018

Wu R, Wen M, Cheung SC, Zhang H, 'ChangeLocator: locate crash-inducing changes based on crash reports', EMPIRICAL SOFTWARE ENGINEERING, 23, 2866-2900 (2018) [C1]

DOI	10.1007/s10664-017-9567-4
Citations	Scopus - 4Web of Science - 3

2017

Xuan J, Jiang H, Zhang H, Ren Z, 'Developer recommendation on bug commenting: a ranking approach for the developer crowd', Science China-Information Sciences, 60, 072105-1-072105-18 (2017) [C1]

DOI	10.1007/s11432-015-0582-8
Citations	Scopus - 2Web of Science - 1

2016

Xia X, Gong L, Le TDB, Lo D, Jiang L, Zhang H, 'Diversity maximization speedup for localizing faults in single-fault and multi-fault programs', Automated Software Engineering, 23, 43-75 (2016) [C1]

DOI	10.1007/s10515-014-0165-z
Citations	Scopus - 2Web of Science - 1

2015

Li M, Zhang H, Lo D, Lucia , 'Improving Software Quality and Productivity Leveraging Mining Techniques', ACM SIGSOFT Software Engineering Notes, 40, 1-2 (2015)

DOI	10.1145/2693208.2693219

2014

Gong L, Zhang H, Seo H, Kim S, 'Locating Crashing Faults based on Crash Stack Traces.', CoRR, abs/1404.4100 (2014)

2013

Peters F, Menzies T, Gong L, Zhang H, 'Balancing Privacy and Utility in Cross-Company Defect Prediction', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 39, 1054-1068 (2013) [C1]

DOI	10.1109/TSE.2013.6
Citations	Scopus - 1Web of Science - 9

2013

Concas G, Lunesu MI, Marchesi M, Zhang H, 'Simulation of software maintenance process, with and without a work-in-process limit', JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 25, 1225-1248 (2013)

DOI	10.1002/smr.1599
Citations	Scopus - 2Web of Science - 18

2012

Li M, Zhang H, Wu R, Zhou Z-H, 'Sample-based software defect prediction with active and semi-supervised learning', Automated Software Engineering, 19, 201-230 (2012) [C1]

DOI	10.1007/s10515-011-0092-1
Citations	Scopus - 1Web of Science - 1

2011

Zhang H, Tan HBK, Zhang L, Lin X, Wang X, Zhang C, Mei H, 'Checking enforcement of integrity constraints in database applications based on code patterns', JOURNAL OF SYSTEMS AND SOFTWARE, 84, 2253-2264 (2011) [C1]

DOI	10.1016/j.jss.2011.06.044
Citations	Scopus - 2Web of Science - 1

2010

Canfora G, Concas G, Marchesi M, Tempero E, Zhang H, '2010 ICSE workshop on emerging trends in software metrics', ACM SIGSOFT Software Engineering Notes, 35, 51-53 (2010)

DOI	10.1145/1838687.1838700

2010

Zhang H, Li Y-F, Tan HBK, 'Measuring design complexity of semantic web ontologies', JOURNAL OF SYSTEMS AND SOFTWARE, 83, 803-814 (2010)

DOI	10.1016/j.jss.2009.11.735
Citations	Scopus - 1Web of Science - 72

2010

Zhang H, Kim S, 'Monitoring Software Quality Evolution for Defects', IEEE SOFTWARE, 27, 58-64 (2010)

DOI	10.1109/MS.2010.66
Citations	Scopus - 2Web of Science - 20

2010

Concas G, Cantone G, Tempero E, Zhang H, 'New Generation of Software Metrics', Advances in Software Engineering, 2010, 1-2 (2010)

DOI	10.1155/2010/913892

2009

Zhang H, 'Discovering power laws in computer programs', INFORMATION PROCESSING & MANAGEMENT, 45, 477-483 (2009)

DOI	10.1016/j.ipm.2009.02.001
Citations	Scopus - 1Web of Science - 10

2009

Tan HBK, Zhao Y, Zhang H, 'Conceptual Data Model-Based Software Size Estimation for Information Systems', ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 19 (2009)

DOI	10.1145/1571629.1571630
Citations	Scopus - 3Web of Science - 25

2009

Zhang H, Tan HBK, Marchesi M, 'The Distribution of Program Sizes and Its Implications: An Eclipse Case Study', CoRR, abs/0905.2288 (2009)

2008

Zhang H, 'On the distribution of software faults', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 34, 301-302 (2008)

DOI	10.1109/TSE.2007.70771
Citations	Scopus - 6Web of Science - 41

2007

Zhang H, Zhang X, 'Comments on "data mining static code attributes to learn defect predictors"', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 33, 635-636 (2007)

DOI	10.1109/TSE.2007.70706
Citations	Scopus - 1Web of Science - 70

2007

Wang HH, Li YF, Sun J, Zhang H, Pan J, 'Verifying feature models using OWL', JOURNAL OF WEB SEMANTICS, 5, 117-129 (2007)

DOI	10.1016/j.websem.2006.11.006
Citations	Scopus - 1Web of Science - 68

2005

Zhang HY, Jarzabek S, 'A Bayesian network approach to rational architectural design', INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 15, 695-717 (2005)

In software architecture design, we explore design alternatives and make decisions about adoption or rejection of a design from a web of complex and often uncertain inf... [more]

In software architecture design, we explore design alternatives and make decisions about adoption or rejection of a design from a web of complex and often uncertain information. Different architectural design decisions may lead to systems that satisfy the same set of functional requirements but differ in certain quality attributes. In this paper, we propose a Bayesian Network based approach to rational architectural design. Our Bayesian Network helps software architects record and make design decisions. We can perform both qualitative and quantitative analysis over the Bayesian Network to understand how the design decisions influence system quality attributes, and to reason about rational design decisions. We use the KWIC (Key Word In Context) example to illustrate the principles of our approach. © World Scientific Publishing Company.

DOI	10.1142/S0218194005002488
Citations	Scopus - 1Web of Science - 4

2004

Zhang HY, Jarzabek S, 'XVCL: a mechanism for handling variants in software product lines', SCIENCE OF COMPUTER PROGRAMMING, 53, 381-407 (2004)

Software reuse focused on product lines has emerged as one of the promising ways to increase software productivity and quality. XVCL (XML-based Variant Configuration La... [more]

Software reuse focused on product lines has emerged as one of the promising ways to increase software productivity and quality. XVCL (XML-based Variant Configuration Language) is a variability mechanism that we developed for handling variants in software product lines. We apply XVCL to develop product line assets (including the domain model, product line architecture and generic components) as a set of x-frames that are capable of accommodating both commonality and variability in a domain. Specific systems, members of a product line, can be constructed by adapting and composing x-frames. In this paper, we illustrate our approach using examples from our product line project on the Computer Aided Dispatch¿(CAD) domain. © 2004 Elsevier B.V. All rights reserved.

DOI	10.1016/j.scico.2003.04.007
Citations	Scopus - 4Web of Science - 20

2003

Zhang HY, Stan Z, Yang B, 'Quality prediction and assessment for product lines', ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2681, 681-695 (2003)

In recent years, software product lines have emerged as a promising approach to improve software development productivity in IT industry. In the product line approach, ... [more]

In recent years, software product lines have emerged as a promising approach to improve software development productivity in IT industry. In the product line approach, we identify both commonalities and variabilities in a domain, and build generic assets for an organization. Feature diagrams are often used to model common and variant product line requirements and can be considered part of the organizational assets. Despite their importance, quality attributes (or non-functional requirements, NFRs) such as performance and security have not been sufficiently addressed in product line development. A feature diagram alone does not tell us how to select a configuration of variants . to achieve desired quality attributes of a product line member. There is a lack of an explicit model that can represent the impact of variants on quality attributes. In this paper, we propose a Bayesian Belief Network (BBN) based approach to quality prediction and assessment for a software product line. A BBN represents domain experts' knowledge and experiences accumulated from the development of similar projects. It helps us capture the impact of variants on quality attributes, and helps us predict and assess the quality of a product line member by performing quantitative analysis over it. For developing specific systems, members of a product line, we reuse the expertise captured by a BBN instead of working from scratch. We use examples from the Computer Aided Dispatch (CAD) product line project to illustrate our approach. © Springer-Verlag Berlin Heidelberg 2003.

Citations	Scopus - 4Web of Science - 25

2001

Wong T, Jarzabek S, Swe SM, Shen R, Zhang H, 'XML implementation of frame processor', ACM SIGSOFT Software Engineering Notes, 26, 164-172 (2001)

DOI	10.1145/379377.375285

Show 89 more journal articles

Preprint (43 outputs)

Year

Citation

Altmetrics

Link

2025

Gui Y, Li Z, Zhang Z, Wan Y, Chen D, Zhang H, Su Y, Chen B, Zhou X, Jiang W, Zhang X, 'UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs' (2025)

DOI	10.48550/arxiv.2505.09904

2025

Gui Y, Li Z, Wan Y, Shi Y, Zhang H, Su Y, Chen B, Chen D, Wu S, Zhou X, Jiang W, Jin H, Zhang X, 'WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs' (2025)

DOI	10.48550/arxiv.2404.06369

2025

Gu X, Chen M, Lin Y, Hu Y, Zhang H, Wan C, Wei Z, Xu Y, Wang J, 'On the Effectiveness of Large Language Models in Domain-Specific Code Generation' (2025)

DOI	10.48550/arxiv.2312.01639

2025

Gui Y, Li Z, Zhang Z, Wang G, Lv T, Jiang G, Liu Y, Chen D, Wan Y, Zhang H, Jiang W, Shi X, Jin H, 'LaTCoder: Converting Webpage Design to Code with Layout-as-Thought' (2025)

DOI	10.48550/arxiv.2508.03560

2024

Hu F, Wang Y, Du L, Li X, Zhang H, Han S, Zhang D, 'Revisiting Code Search in a Two-Stage Paradigm' (2024)

DOI	10.48550/arxiv.2208.11274

2024

Chai Y, Zhang H, Shen B, Gu X, 'Cross-Domain Deep Code Search with Meta Learning' (2024)

DOI	10.48550/arxiv.2201.00150

2024

Liu Y, Zhang H, Miao Y, Le V-H, Li Z, 'OptLLM: Optimal Assignment of Queries to Large Language Models' (2024)

DOI	10.48550/arxiv.2405.15130

2024

Tao W, Zhou Y, Wang Y, Zhang W, Zhang H, Cheng Y, 'MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution' (2024)

DOI	10.48550/arxiv.2403.17927

2024

Li B, Sun Z, Huang T, Zhang H, Wan Y, Li G, Jin Z, Lyu C, 'IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion' (2024)

DOI	10.48550/arxiv.2401.16637

2024

Wang Y, Huang Y, Guo D, Zhang H, Zheng Z, 'SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization' (2024)

DOI	10.48550/arxiv.2401.14727

2024

Tao W, Zhou Y, Wang Y, Zhang H, Wang H, Zhang W, 'KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation' (2024)

DOI	10.48550/arxiv.2401.08376

2024

Hu F, Wang Y, Du L, Zhang H, Han S, Zhang D, Li X, 'Tackling Long Code Search with Splitting, Encoding, and Aggregating' (2024)

DOI	10.48550/arxiv.2208.11271

2023

Wan Y, He Y, Bi Z, Zhang J, Zhang H, Sui Y, Xu G, Jin H, Yu PS, 'Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit' (2023)

DOI	10.48550/arxiv.2401.00288

2023

Qi B, Sun H, Zhang H, Gao X, 'Reusing Convolutional Neural Network Models through Modularization and Composition' (2023)

DOI	10.48550/arxiv.2311.04438

2023

Li X, Zhang H, Le V-H, Chen P, 'LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data' (2023)

DOI	10.48550/arxiv.2309.09479

2023

Shi E, Zhang F, Wang Y, Chen B, Du L, Zhang H, Han S, Zhang D, Sun H, 'SoTaNa: The Open-Source Software Development Assistant' (2023)

DOI	10.48550/arxiv.2308.13416

2023

Qi B, Sun H, Zhang H, Zhao R, Gao X, 'Modularizing while Training: A New Paradigm for Modularizing DNN Models' (2023)

DOI	10.48550/arxiv.2306.09376

2023

Le V-H, Zhang H, 'Log Parsing: How Far Can ChatGPT Go?' (2023)

DOI	10.48550/arxiv.2306.01590

2023

Shi E, Wang Y, Zhang H, Du L, Han S, Zhang D, Sun H, 'Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond' (2023)

DOI	10.48550/arxiv.2304.05216

2023

Qi B, Sun H, Gao X, Zhang H, Li Z, Liu X, 'Reusing Deep Neural Network Models through Model Re-engineering' (2023)

DOI	10.48550/arxiv.2304.00245

2023

Liu J, He S, Chen Z, Li L, Kang Y, Zhang X, He P, Zhang H, Lin Q, Xu Z, Rajmohan S, Zhang D, Lyu MR, 'Incident-aware Duplicate Ticket Aggregation for Cloud Systems' (2023)

DOI	10.48550/arxiv.2302.09520

2023

Le V-H, Zhang H, 'Log Parsing with Prompt-based Few-shot Learning' (2023)

DOI	10.48550/arxiv.2302.07435

2023

Shi E, Wang Y, Gu W, Du L, Zhang H, Han S, Zhang D, Sun H, 'CoCoSoDa: Effective Contrastive Learning for Code Search' (2023)

DOI	10.48550/arxiv.2204.03293

2023

Gao S, Wen X-C, Gao C, Wang W, Zhang H, Lyu MR, 'What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?' (2023)

DOI	10.48550/arxiv.2304.07575

2023

Qi B, Sun H, Gao X, Zhang H, 'Patching Weak Convolutional Neural Network Models through Modularization and Composition' (2023)

DOI	10.48550/arxiv.2209.06116

2022

Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR, 'No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence' (2022)

DOI	10.48550/arxiv.2207.11680

2022

Zhang Z, Zhang H, Shen B, Gu X, 'Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code' (2022)

DOI	10.48550/arxiv.2206.14390

2022

Tang W, Wang Y, Zhang H, Han S, Luo P, Zhang D, 'LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries' (2022)

DOI	10.48550/arxiv.2204.10232

2022

Wang Y, Wang J, Zhang H, Ming X, Shi L, Wang Q, 'Where is Your App Frustrating Users?' (2022)

DOI	10.48550/arxiv.2204.09310

2022

Liu Y, Zhang X, He S, Zhang H, Li L, Kang Y, Xu Y, Ma M, Lin Q, Dang Y, Rajmohan S, Zhang D, 'UniParser: A Unified Log Parser for Heterogeneous Log Data' (2022)

DOI	10.48550/arxiv.2202.06569

2022

Le V-H, Zhang H, 'Log-based Anomaly Detection with Deep Learning: How Far Are We?' (2022)

DOI	10.48550/arxiv.2202.04301

2022

Chen Z, Liu J, Su Y, Zhang H, Ling X, Yang Y, Lyu MR, 'Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching' (2022)

DOI	10.48550/arxiv.2201.02944

2022

Li H, Miao C, Leung C, Huang Y, Huang Y, Zhang H, Wang Y, 'Exploring Representation-Level Augmentation for Code Search' (2022)

DOI	10.48550/arxiv.2210.12285

2022

Ma M, Tian Z, Hort M, Sarro F, Zhang H, Lin Q, Zhang D, 'Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances' (2022)

DOI	10.48550/arxiv.2209.08321

2022

Gu W, Wang Y, Du L, Zhang H, Han S, Zhang D, Lyu MR, 'Accelerating Code Search with Deep Hashing and Code Classification' (2022)

DOI	10.48550/arxiv.2203.15287

2022

Shi E, Wang Y, Tao W, Du L, Zhang H, Han S, Zhang D, Sun H, 'RACE: Retrieval-Augmented Commit Message Generation' (2022)

DOI	10.48550/arxiv.2203.02700

2022

Wan Y, Zhao W, Zhang H, Sui Y, Xu G, Jin H, 'What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code' (2022)

DOI	10.48550/arxiv.2202.06840

2022

Gui Y, Wan Y, Zhang H, Huang H, Sui Y, Xu G, Shao Z, Jin H, 'Cross-Language Binary-Source Code Matching with Intermediate Representations' (2022)

DOI	10.48550/arxiv.2201.07420

2022

Shi E, Wang Y, Du L, Chen J, Han S, Zhang H, Zhang D, Sun H, 'On the Evaluation of Neural Code Summarization' (2022)

DOI	10.48550/arxiv.2107.07112

2022

Gu W, Li Z, Gao C, Wang C, Zhang H, Xu Z, Lyu MR, 'CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning' (2022)

DOI	10.48550/arxiv.2012.01028

2021

Shi E, Wang Y, Du L, Zhang H, Han S, Zhang D, Sun H, 'CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees' (2021)

DOI	10.48550/arxiv.2108.12987

2021

Zhang W, Guo S, Zhang H, Sui Y, Xue Y, Xu Y, 'Challenging Machine Learning-based Clone Detectors via Semantic-preserving Code Transformations' (2021)

DOI	10.48550/arxiv.2111.10793

2021

Lyu C, Wang R, Zhang H, Zhang H, Hu S, 'Embedding API Dependency Graph for Neural Code Generation' (2021)

DOI	10.48550/arxiv.2103.15361

Show 40 more preprints

Edit

Grants and Funding

Summary

Number of grants	5
Total funding	$710,646

Click on a grant title below to expand the full details for that specific grant.

20221 grants / $287,435

Intelligent Incident Management for Software-Intensive Systems$287,435

Funding body: ARC (Australian Research Council)

Funding body	ARC (Australian Research Council)
Project Team	Prof Hongyu Zhang, Huong Ha, Prof Hongyu Zhang, Dr Huong Ha
Scheme	Discovery Projects
Role	Lead
Funding Start	2022
Funding Finish	2024
GNo	G2100087
Type Of Funding	C1200 - Aust Competitive - ARC
Category	1200
UON	Y

20202 grants / $287,989

Data-driven Approach to Resilient Online Service Systems$264,489

Funding body: ARC (Australian Research Council)

Funding body	ARC (Australian Research Council)
Project Team	Prof Hongyu Zhang, Professor Michael Lyu
Scheme	Discovery Projects
Role	Lead
Funding Start	2020
Funding Finish	2022
GNo	G1900151
Type Of Funding	C1200 - Aust Competitive - ARC
Category	1200
UON	Y

Machine learning (ML), statistical methods and simulations for signal-sorting$23,500

Funding body: University of Melbourne

Funding body	University of Melbourne
Project Team	Prof Stephan Chalup, Prof Hongyu Zhang, Mr Thomas Dowdell
Scheme	AMSI Australian Postgraduate Research Internships
Role	Investigator
Funding Start	2020
Funding Finish	2020
GNo	G2001206
Type Of Funding	Scheme excluded from IGS
Category	EXCL
UON	Y

20172 grants / $135,222

Model Building based on Source Code for Problem Location$71,372

Funding body: Huawei Technologies Co.,Ltd.

Funding body	Huawei Technologies Co.,Ltd.
Project Team	Prof Hongyu Zhang
Scheme	Huawei Research Innovation Program (HIRP)
Role	Lead
Funding Start	2017
Funding Finish	2017
GNo	G1701312
Type Of Funding	C3400 – International For Profit
Category	3400
UON	Y

The Exploration of Auto-Code-Generation Technologies and Possible Applications$63,850

Funding body: Huawei Technologies Co.,Ltd.

Funding body	Huawei Technologies Co.,Ltd.
Project Team	Prof Hongyu Zhang
Scheme	Huawei Research Innovation Program (HIRP)
Role	Lead
Funding Start	2017
Funding Finish	2017
GNo	G1701333
Type Of Funding	C3400 – International For Profit
Category	3400
UON	Y

Edit

Research Supervision

Number of supervisions

Completed9

Current3

Current Supervision

Commenced	Level of Study	Research Title	Program	Supervisor Type
2024	PhD	Leveraging Large Language Models for Automated Software Quality Assurance	PhD (Computer Science), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor
2024	PhD	Automatic Code Refactoring Leveraging Large Language Models	PhD (Computer Science), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor
2022	PhD	Intelligent Fault Detection for Belt Conveyor Idlers Using Machine Learning	PhD (Information Technology), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor

Past Supervision

Year	Level of Study	Research Title	Program	Supervisor Type
2025	PhD	Optimizing Large Language Model Utilization through Scheduling Strategies	PhD (Computer Science), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor
2024	PhD	Semantic-aware Intelligent Log Analytics	PhD (Software Engineering), College of Engineering, Science and Environment, The University of Newcastle	Principal Supervisor
2022	PhD	Mining Numerical Invariants for Improving Software Reliability	PhD (Computer Science), College of Engineering, Science and Environment, The University of Newcastle	Principal Supervisor
2022	PhD	Exploring Factors that Influence the Acceptance of Clinical Decision Support Systems in Saudi Arabia	PhD (Information Systems), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor
2021	PhD	A Framework for Functional Feature and Crosscutting Concern Modelling in Software Product Lines	PhD (Software Engineering), College of Engineering, Science and Environment, The University of Newcastle	Co-Supervisor
2013	Masters	Spectrum-based Fault Localization	Computer Science, Tsinghua University	Sole Supervisor
2013	Masters	Analysis and Prediction of Software Team's Bug Fixing Ability	Computer Science, Tsinghua University	Sole Supervisor
2012	Masters	Techniques for Duplicate Bug Report Detection and Bug Localization	Computer Science, Tsinghua University	Sole Supervisor
2012	Masters	Methods and Tools for Software Defect Prediction	Computer Science, Tsinghua University	Sole Supervisor

Edit

Research Collaborations

The map is a representation of a researchers co-authorship with collaborators across the globe. The map displays the number of publications against a country, where there is at least one co-author based in that country. Data is sourced from the University of Newcastle research publication management system (NURO) and may not fully represent the authors complete body of work.

	Country	Count of Publications
	China	227
	Australia	151
	United States	90
	Singapore	41
	Hong Kong	27
	Italy	11
	New Zealand	8
	Canada	7
	Germany	7
	United Kingdom	4
	Korea, Republic of	4
	Portugal	4
	India	3
	France	2
	Denmark	1
	Estonia	1
	Israel	1
	Japan	1
	Macao	1
	Netherlands	1
	Pakistan	1
	Poland	1
	Saudi Arabia	1
	Sweden	1
	Viet Nam	1
	More...

Edit

News

News • 1 Oct 2020

Our researchers recognised in The Australian’s Research 2020 magazine

The Australian's Research 2020 magazine paid tribute to several University of Newcastle researchers for their track record of excellence and contribution to their fields.