Dr Zhenghao Chen

Lecturer in Data Science

School of Computer and Information Sciences

Email:zhenghao.chen@newcastle.edu.au

Career Summary

Biography

Dr. Zhenghao Chen is a Lecturer (Assistant Professor) at the University of Newcastle. He obtained B.IT. H1 and Ph.D. at the University of Sydney in 2017 and 2022, respectively. Upon completing his PhD, he worked as a Research Engineer at TikTok, a Postdoctoral Research Fellow at the University of Sydney and a Visiting Research Scientist at Microsoft Research and Disney Research. He has been awarded Google Australia Prize, Australia RTP International Fellowship, ACM SIGMM Outstanding Doctoral Thesis Award, and Microsoft Research StarTrack Fellowship for his academic merit.

Dr. Chen’s broad research interests lie in Generative AI (GenAI). He has published at flagships AI conferences including CVPR, ICCV, ICML, ICLR, ECCV, MM, and AAAI, as well as in top-tier engineering journals such as T-IP, T-CSVT, T-PAMI, T-MI, and PR. He serves on the Program Committee for CVPR, ECCV, ICCV, SIGGRAPH, AAAI, IJCAI, MICCAI, MM, KDD and organizes workshops in MM and ICCV. He also acts as a journal reviewer for T-IP, T-MI, T-CSVT, IJCV, PR, and the Guest Editor for MTAP, MDPI-Algorithms, Frontier-in-AI.

Dr. Chen actively advances the translation of his research into global industrial impact and cross-disciplinary applications. Through his roles at Microsoft, TikTok, and Disney, his innovations have generated multiple patents and have been deployed in global enterprise GenAI systems. He also collaborates with scientists worldwide to apply GenAI to major scientific challenges, with interdisciplinary publications including Nature Portfolio journals.

Qualifications

DOCTOR OF PHILOSOPHY, University of Sydney
BACHELOR OF INFORMATION TECHNOLOGY, University of Sydney

Keywords

Computer Vision
Machine Learning
Multimedia
Natural Language Processing

Fields of Research

Code	Description	Percentage
460307	Multimodal analysis and synthesis	20
461103	Deep learning	30
460304	Computer vision	30
460208	Natural language processing	20

Professional Experience

UON Appointment

Title	Organisation / Department
Lecturer in Data Science	University of Newcastle School of Computer and Information Sciences Australia

Academic appointment

Dates	Title	Organisation / Department
1/9/2022 - 1/5/2024	Postdoctoral Research Fellow	University of Sydney Australia

Professional appointment

Dates	Title	Organisation / Department
1/10/2025 - 12/1/2026	Visiting Research Scientist	Microsoft Research China
1/5/2024 - 1/10/2024	Research Engineer	TikTok Australia
1/10/2022 - 1/2/2023	Visiting Research Scientist	Disney Research Switzerland

Awards

Honours

Year	Award
2025	Microsoft Research Asia StarTrack Fellowship Microsoft Research
2024	ACM SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Application Association for Computing Machinery (ACM)

Scholarship

Year	Award
2019	Australia Government Research Training Program (RTP) Fellowship (International), Australian Government Department of Education
2017	Google Australia Prize for Excellence in Computer Science Google

Teaching

Code	Course	Role	Duration
ELEC5304	Intelligent Visual Signal Understanding The university of Sydney	Coordinator & Lecturer	1/3/2023 - 1/7/2023
ELEC5306	Video Intelligence and Compression The university of Sydney	Lecturer	1/7/2022 - 1/11/2022
COMP1140	Database and Information Management School of Information and Physical Sciences (SIPS), University of Newcastle	Coordinator & Lecturer	21/7/2025 - 3/11/2025
COMP1010	Computing Fundamentals School of Information and Physical Sciences (SIPS), University of Newcastle	Coordinator & Lecturer	3/3/2025 - 7/7/2025

Edit

Publications

For publications that are currently unpublished or in-press, details are shown in italics.

Chapter (2 outputs)

Year

Citation

Altmetrics

Link

2022

Wang Z, Huo X, Chen Z, Zhang J, Sheng L, Xu D, 'Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation', 13692, 175-191 (2022)

Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its ... [more]

Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its success. In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence. However, it is not trivial to effectively fuse the geometric and visual information from these two distinctive modalities, especially for the registration problem. In this work, we propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation to progressively fuse these two modalities, where the geometric features from the depth data act as the geometry-dependent convolution kernels to transform the visual features from the RGB data. The resultant visual-geometric features are in canonical feature spaces with alleviated visual dissimilarity caused by geometric changes, by which more reliable correspondence can be achieved. The proposed GAVE module can be readily plugged into recent RGB-D point cloud registration framework. Extensive experiments on 3D Match and ScanNet demonstrate that our method outperforms the state-of-the-art point cloud registration methods even without correspondence or pose supervision.

DOI	10.1007/978-3-031-19824-3_11
Citations	Scopus - 2Web of Science - 5

2020

Hu Z, Chen Z, Xu D, Lu G, Ouyang W, Gu S, 'Improving Deep Video Compression by Resolution-Adaptive Flow Coding', 12347 LNCS, 193-209 (2020)

In the learning based video compression approaches, it is an essential issue to compress pixel-level optical flow maps by developing new motion vector (MV) encoders. In... [more]

In the learning based video compression approaches, it is an essential issue to compress pixel-level optical flow maps by developing new motion vector (MV) encoders. In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder. To handle complex or simple motion patterns globally, our frame-level scheme RaFC-frame automatically decides the optimal flow map resolution for each video frame. To cope different types of motion patterns locally, our block-level scheme called RaFC-block can also select the optimal resolution for each local block of motion features. In addition, the rate-distortion criterion is applied to both RaFC-frame and RaFC-block and select the optimal motion coding mode for effective flow coding. Comprehensive experiments on four benchmark datasets HEVC, VTL, UVG and MCL-JCV clearly demonstrate the effectiveness of our overall RaFC framework after combing RaFC-frame and RaFC-block for video compression.

DOI	10.1007/978-3-030-58536-5_12
Citations	Scopus - 95

Conference (23 outputs)

Year

Citation

Altmetrics

Link

2025

Yang L, Wang Z, Chen Z, Liang X, Zhou L, 'Medxchat: A Unified Multimodal Large Language Model Framework Towards CXRS Understanding and Generation', 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) (2025) [E1]

DOI	10.1109/ISBI60581.2025.10981215
Citations	Scopus - 2

2025

Yu S, Jin C, Wang H, Chen Z, Jin S, Zuo Z, Xu X, Sun Z, Bingni Z, Wu J, Hao Z, Sun Q, 'Frame-Voyager: Learning to Query Frames for Video Large Language Models', International Conference on Learning Representations (ICLR) 2025 (2025)

2025

Guo J, Chen Z, Ma Y, Ding Y, Liu X, Kim J, Ouyang W, Tao D, 'ECLR'25: 2nd Workshop on Efficient Computing Under Limited Resources: Visual Computing', Proceedings 2025 IEEE Cvf International Conference on Computer Vision Workshops Iccv W 2025, 3189-3192 (2025)

DOI	10.1109/ICCVW69036.2025.00333

2025

, 'Proceedings of the 3rd International Workshop on Rich Media With Generative AI' (2025)

DOI	10.1145/3746262

2025

Li Y, Zhou L, Ling N, Chen Z, Wang W, Jiang W, 'M ³ VIR: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation', Proceedings of the 3rd International Workshop on Rich Media With Generative AI, 20-29 (2025)

DOI	10.1145/3746262.3761976

2025

Jiang W, Chen Z, Xu D, '(RichMediaGAI'25) 3rd International Workshop on Rich Media with Generative AI', Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025, 14296-14298 (2025)

DOI	10.1145/3746027.3762102

2025

Liu L, Chen Z, Xu D, '3D Gaussian Splatting Data Compression with Mixture of Priors', MM'2025 Proceedings of the 33rd ACM International Conference on Multimedia, 8341-8350 (2025) [E1]

DOI	10.1145/3746027.3755432

2025

Wu Y, Chen Z, Wang H, Xu D, 'Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models', MM'2025. Proceedings of the 33rd ACM International Conference on Multimedia, 9714-9723 (2025) [E1]

DOI	10.1145/3746027.3755081

2025

Wu Y, Wang H, Chen Z, Pang J, Xu D, 'On-Device Diffusion Transformer Policy for Efficient Robot Manipulation', Proceedings of the IEEE International Conference on Computer Vision, 14073-14083 (2025) [E1]

Citations	Scopus - 3

2024

Liu X, Chen Z, Luping Z, Xu D, Xi W, Bai G, Yihan Z, Zhao J, 'UFDA: Universal Federated Domain Adaptation with Practical Assumptions', Proceedings of the 38th AAAI Conference on Artificial Intelligence, 14026-14034 (2024) [E1]

DOI	10.1609/aaai.v38i12.29311
Citations	Scopus - 1

2024

Liu L, Hu Z, Chen Z, 'Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor', Generalizing from Limited Resources in the Open World Second International Workshop, GLOW 2024 Held in Conjunction with IJCAI 2024, 2160 CCIS, 3-17 (2024) [E1]

DOI	10.1007/978-981-97-6125-8_1
Citations	Scopus - 5

2024

Chen Z, Zhou L, Hu Z, Xu D, 'Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression', MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 11022-11031 (2024) [E1]

DOI	10.1145/3664647.3680943
Citations	Scopus - 1

2024

Guo J, Chen Z, Ma Y, Liu X, Kim J, Ouyang W, Tao D, 'EMCLR'24: 1st InternationalWorkshop on Efficient Multimedia Computing under Limited Resources', PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 1-2 (2024)

DOI	10.1145/3688863.3696341

2023

Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Implicit Representation by Explicit Learning', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June, 16727-16738 (2023) [E1]

DOI	10.1109/CVPR52729.2023.01605
Citations	Scopus - 2

2023

Liu L, Hu Z, Chen Z, Xu D, 'ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8047-8056 (2023) [E1]

Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus sol... [more]

Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on improving human vision perception. In this work, our objective is to enhance image compression methods for both human vision quality and machine vision tasks simultaneously. To achieve this, we introduce a novel approach to Partition, Transmit, Reconstruct, and Aggregate (PTRA) the latent representation of images to balance the optimizations for both aspects. By employing our method as a module in existing neural image codecs, we create a latent representation predictor that dynamically manages the bit-rate cost for machine vision tasks. To further improve the performance of auto-regressive-based coding techniques, we enhance our hyperprior network and predictor module with context modules, resulting in a reduction in bit-rate. The extensive experiments conducted on various machine vision benchmarks such as ILSVRC 2012, VOC 2007, VOC 2012, and COCO demonstrate the superiority of our newly proposed image compression framework. It outperforms existing neural image compression methods in multiple machine vision tasks including classification, segmentation, and detection, while maintaining high-quality image reconstruction for human vision.

DOI	10.1145/3581783.3612041
Citations	Scopus - 3Web of Science - 6

2023

Chen Z, Relic L, Azevedo R, Zhang Y, Gross M, Xu D, Zhou L, Schroers C, 'Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8543-8551 (2023) [E1]

DOI	10.1145/3581783.3611960
Citations	Scopus - 2Web of Science - 2

2022

Chen Z, Lu G, Hu Z, Liu S, Jiang W, Xu D, 'LSVC: A Learning-based Stereo Video Compression Framework', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 6063-6072 (2022)

In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) fr... [more]

In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) from both left and right views. Specifically, when compressing the current frame from each view, our framework reduces temporal redundancy by performing motion compensation using the reconstructed intra-view adjacent frame and at the same time exploits binocular redundancy by conducting disparity compensation using the latest reconstructed cross-view frame. Moreover, to effectively compress the introduced motion and disparity offsets for better compensation, we further propose two novel schemes called motion residual compression and disparity residual compression to respectively generate the predicted motion offset and disparity offset from the previously compressed motion offset and disparity offset, such that we can more effectively compress residual offset information for better bit-rate saving. Overall, the entire framework is implemented by the fully-differentiable modules and can be optimized in an end-to-end manner. Our comprehensive experiments on three automotive stereo video benchmarks Cityscapes, KITTI 2012 and KITTI 2015 demonstrate that our proposed framework outperforms the learning-based single-view video codec and the traditional hand-crafted multi-view video codec.

DOI	10.1109/CVPR52688.2022.00598
Citations	Scopus - 28

2021

Chen Z, Gu S, Zhu F, Xu J, Zhao R, 'IMPROVING FACIAL ATTRIBUTE RECOGNITION BY GROUP AND GRAPH LEARNING', Proceedings IEEE International Conference on Multimedia and Expo (2021)

Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of... [more]

Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlations that are spatial and non-spatial relationships. For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature. On the other hand, to discover the non-spatial relationship, we model a group-based Graph Correlation Learning to explore affinities of predefined part-based groups. We utilize such affinity information to control the communication between all groups and then refine the learned group features. Overall, we propose a unified network called Multi-scale Group and Graph Network. It incorporates these two newly proposed learning strategies and produces coarse-to-fine graph-based group features for improving facial attribute recognition. Comprehensive experiments demonstrate that our approach outperforms the state-of-the-art methods.

DOI	10.1109/ICME51207.2021.9428078
Citations	Scopus - 10

2020

Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W, 'Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition', 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 140-149 (2020)

DOI	10.1109/CVPR42600.2020.00022
Citations	Scopus - 1Web of Science - 678

2017

Chen Z, Zhou J, Wang X, Swanson J, Chen F, Feng D, 'Neural net-based and safety-oriented visual analytics for time-spatial data', Proceedings of the International Joint Conference on Neural Networks, 2017-May, 1133-1140 (2017)

Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform mach... [more]

Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform machine learning in data mining industry. This paper proposes a visual analytics pipeline for trajectory data enabling better understanding movements pattern of people using Neural Network as back-end and other visualization techniques as front-end for gaining information of preferences of attractions, similarities of groups, popularities of attractions and pattern of movement flow. Such understandings help to address the management issue by extracting the outstanding features to detect abnormal pattern such as detection of crime and predicting overall movements, and so on. Successfully dealing with those issues would have significant improvements of entire management of public facility such as parks and transportation.

DOI	10.1109/IJCNN.2017.7965979
Citations	Scopus - 5

2017

Zhi W, Yueng HWF, Chen Z, Zandavi SM, Lu Z, Chung YY, 'Using Transfer Learning with Convolutional Neural Networks to Diagnose Breast Cancer from Histopathological Images', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10637 LNCS, 669-676 (2017)

Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural netw... [more]

Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural networks to automatically diagnose breast cancer from patches of histopathological images. We compare the performance of using transfer learning with an off-the-shelf deep convolutional neural network architecture, VGGNet, and a shallower custom architecture. Our proposed final ensemble model, which contains three custom convolutional neural network classifiers trained using transfer learning, achieves a significantly higher image classification accuracy on the large public benchmark dataset than the current best results, for all image resolution levels.

DOI	10.1007/978-3-319-70093-9_71
Citations	Scopus - 46

2017

Zhi W, Chen Z, Yueng HWF, Lu Z, Zandavi SM, Chung YY, 'Layer Removal for Transfer Learning with Deep Convolutional Neural Networks', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10635 LNCS, 460-469 (2017)

It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre... [more]

It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre-trained on a very large source dataset. Then, a target dataset is transferred onto the neural network. This approach is a form of transfer learning, and allows very deep networks to achieve outstanding performance even when a small target dataset is available. It is thought that the bottom layers of the pre-trained network contain general information, which are applicable to different datasets and tasks, while the upper layers of the pre-trained network contain abstract information relevant to a specific dataset and task. While studies have been conducted on the fine-tuning of these layers, the removal of these layers have not yet been considered. This paper explores the effect of removing the upper convolutional layers of a pre-trained network. We empirically investigated whether removing upper layers of a deep pre-trained network can improve performance for transfer learning. We found that removing upper pre-trained layers gives a significant boost in performance, but the ideal number of layers to remove depends on the dataset. We suggest removing pre-trained convolutional layers when applying transfer learning on off-the-shelf pre-trained DCNNs. The ideal number of layers to remove will depend on the dataset, and remain as a parameter to be tuned.

DOI	10.1007/978-3-319-70096-0_48
Citations	Scopus - 2

2016

Liu G, Chen Z, Yeung HWF, Chung YY, Yeh WC, 'A new weight adjusted particle swarm optimization for real-time multiple object tracking', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 9948 LNCS, 643-651 (2016)

This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To... [more]

This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To this end, a new update strategy of inertia weight of the particles in WAPSO is designed to maintain particle diversity and prevent pre-mature convergence. Meanwhile, the implementation of a mechanism that enlarges the search space upon the detection of occlusion enhances WAPSO's robustness to non-linear target motion. In addition, the choice of Root Sum Squared Errors as the fitness function further increases the speed of the proposed approach. The experimental results has shown that in combination with the model feature that enables initialization of multiple independent swarms, the high-speed WAPSO algorithm can be applied to multiple non-linear object tracking for real-time applications.

DOI	10.1007/978-3-319-46672-9_72
Citations	Scopus - 6

Show 20 more conferences

Journal article (15 outputs)

Year

Citation

Altmetrics

Link

2026

Liu L, Chen Z, Hu Z, Xu D, 'An efficient adaptive compression method for human perception and machine vision tasks', Pattern Recognition, 180, 114421-114421 (2026)

DOI	10.1016/j.patcog.2026.114421

2026

Liu L, Wang C, Chen Z, Liu Z, Xu D, '4DGS-Craft: Consistent and Interactive 4D Gaussian Splatting Editing', IEEE Transactions on Circuits and Systems for Video Technology, 36, 10405-10417 (2026) [C1]

DOI	10.1109/TCSVT.2026.3674925

2026

Han T, Du D, Chen Z, Guo S, Ouyang W, Bai L, 'CRA5 a high-fidelity compressed reanalysis atmospheric dataset for weather and climate research.', Scientific data (2026) [C1]

DOI	10.1038/s41597-026-07381-2

2026

Jiang S, Chen Z, Han M, Gu S, 'Neural Stereo Video Compression with Hybrid Disparity Compensation', IEEE Transactions on Circuits and Systems for Video Technology (2026)

DOI	10.1109/TCSVT.2026.3695888

2026

Li Y, Zhou L, Ling N, Chen Z, Wang W, Jiang W, 'for benchmarking sparse-view novel view synthesis and 3D object removal', Multimedia Tools and Applications, 85 (2026)

DOI	10.1007/s11042-026-21779-5

2026

Huang W, Zhang J, Chen Z, Li G, Zhang L, Cao Y, Dong F, Ogawa T, Haseyama M, 'Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV', Proceedings of the Aaai Conference on Artificial Intelligence, 40, 5140-5148 (2026)

DOI	10.1609/aaai.v40i7.37428

2026

Fu Z, Cheng D, Zhang L, Huang W, Chen Z, Wu H, 'DeepSenseMoE: Harnessing Power of Time Series Foundation Models for Few-Shot Human Activity Recognition', Proceedings of the Aaai Conference on Artificial Intelligence, 40, 292-299 (2026)

DOI	10.1609/aaai.v40i1.36990

2026

Wang C, Chen Z, Yang JYH, Kim J, 'Improving lesion segmentation in medical images by global and regional feature compensation', Pattern Recognition, 172 (2026) [C1]

DOI	10.1016/j.patcog.2025.112461

2025

Yang L, Liang X, Wang Z, Diao Z, Huang X, Shen D, Tan X, Li H, Chen Z, Qiu S, Zhou L, 'Constructing a Unified Vision-Language Model for Chest Radiograph–based Diagnostics, Medical Education, and Data Augmentation', Radiology Cardiothoracic Imaging, 7 (2025) [C1]

DOI	10.1148/ryct.250033

2025

Zhou L, Ruan C, Ling N, Chen Z, Wang W, Jiang W, 'TVC: tokenized video compression with ultra-low bit rate', Visual Intelligence, 3 (2025) [C1]

DOI	10.1007/s44267-025-00098-7

2025

Yang L, Chen Z, Wang K, Zhou L, 'Improving CXR Bone Suppression by Exploiting Domain-Level and Instance-Level Information', IEEE Transactions on Medical Imaging, 44, 4143-4155 (2025) [C1]

For chest X-ray image (CXR) analysis, effective bone structure suppression is essential for uncovering lung abnormalities and facilitating accurate clinical diagnoses. ... [more]

For chest X-ray image (CXR) analysis, effective bone structure suppression is essential for uncovering lung abnormalities and facilitating accurate clinical diagnoses. While recent deep generative models, to some extent, improve the reconstruction quality of bone-suppressed CXRs, they often fall short in delivering substantial improvements in downstream diagnosis tasks. This limitation is attributed to a narrow focus on instance-specific details, neglecting broader domain-level knowledge, which hampers bone-suppression effectiveness. In response to these challenges, our proposed framework adopts a novel approach that integrates both instance-level and domain-level information. To capture instance information, our model employs a hybrid approach using both cross-covariance attention blocks (CABs) to underscore relevant image information and a followed Vision Transformers (ViTs) encoder for image feature embedding. To capture domain information, we introduce multi-head codebook attention (MCA) which leverages codebook structure with multi-head attention mechanism to capture global, domain-level information specific to the bone-suppressed CXR domain, thereby refining the synthesis process. During optimization, our two-stage training scheme involves a MCA learning stage that encapsulates the domain of bone-suppressed CXRs in MCA through a ViT-based GAN model, and a synthesis stage that employs the learned codebook to generate bone-suppressed CXRs from the original ones, enhancing instance synthesis through domain insights. Moreover, the incorporation of CABs further refines pixellevel instance information. Extensive experiments demonstrate the superior performance of our approach, improving PSNR by 8.36% and SSIM by 2.7% for bone suppression while boosting lung disease classification by 2.8% and 4.2% on two datasets and segmentation by 1.5%.

DOI	10.1109/TMI.2025.3564894

2025

Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization', IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 5818-5831 (2025) [C1]

Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces ... [more]

Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces implicitly. To enjoy the advantages of both, we propose a novel 3D representation, Neural Vector Fields (NVF), which adopts the explicit learning process to manipulate meshes and implicit unsigned distance function (UDF) representation to break the barriers in resolution and topology. This is achieved by directly predicting the displacements from surface queries and modeling shapes as Vector Fields, rather than relying on network differentiation to obtain direction fields as most existing UDF-based methods do. In this way, our approach is capable of encoding both the distance and the direction fields so that the calculation of direction fields is differentiation-free, circumventing the non-trivial surface extraction step. Furthermore, building upon NVFs, we propose to incorporate two types of shape codebooks, i.e., NVFs (Lite or Ultra), to promote cross-category reconstruction through encoding cross-object priors. Moreover, we propose a new regularization based on analyzing the zero-curl property of NVFs, and implement this through the fully differentiable framework of our NVF (ultra). We evaluate both NVFs on four surface reconstruction scenarios, including watertight vs non-watertight shapes, category-agnostic reconstruction vs category-unseen reconstruction, category-specific, and cross-domain reconstruction.

DOI	10.1109/TPAMI.2025.3552684

2025

Han T, Chen Z, Guo S, Xu W, Ouyang W, Bai L, 'Climate science data can be compressed efficiently by dual-stage extreme compression with a variational auto-encoder transformer', Communications Earth and Environment, 6 (2025) [C1]

DOI	10.1038/s43247-025-02903-z

2024

Guo J, Chen Z, Ma Y, Liu X, Kim J, Ouyang W, Tao D, 'EMCLR’24 Chairs’ Welcome', Emclr 2024 Proceedings of the 1st International Workshop on Efficient Multimedia Computing Under Limited Resources Co Located with mm 2024 (2024)

2022

Chen Z, Gu S, Lu G, Xu D, 'Exploiting Intra-Slice and Inter-Slice Redundancy for Learning-Based Lossless Volumetric Image Compression', IEEE TRANSACTIONS ON IMAGE PROCESSING, 31, 1697-1707 (2022) [C1]

DOI	10.1109/TIP.2022.3140608
Citations	Scopus - 6Web of Science - 28

Show 12 more journal articles

Patent (3 outputs)

Year	Citation	Altmetrics	Link
2025	Chen Z, Albuquerque ARGD, Schroers CR, Zhang Y, Relic L, 'Contextual video compression framework with spatial-temporal cross-covariance transformers' (2025)
2021	Chen Z, Xu J, Zhu F, Rui Z, 'Facial attribute recognition method and apparatus, and electronic device and storage medium' (2021)
2020	Chen Z, Xu J, Zhao R, 'Face recognition method and apparatus, electronic device, and storage medium' (2020)

Preprint (18 outputs)

Year

Citation

Altmetrics

Link

2026

Han T, Wen Z, Chen Z, Lin F, Gao J, Guo S, Bai L, 'Generative 3D Gaussian Splatting for Arbitrary-Resolution Atmospheric Downscaling and Forecasting' (2026)

DOI	10.2139/ssrn.6708185

2026

Han T, Wen Z, Chen Z, Du D, Guo S, Bai L, 'Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting' (2026)

DOI	10.48550/arxiv.2406.14399

2025

Wang C, Chen Z, Yang JYH, Kim J, 'Improving Lesion Segmentation in Medical Images by Global and Regional Feature Compensation' (2025)

DOI	10.48550/arxiv.2502.08675

2025

Liu L, Chen Z, Hu Z, Xu D, 'An Efficient Adaptive Compression Method for Human Perception and Machine Vision Tasks' (2025)

DOI	10.48550/arxiv.2501.04329

2025

Wu Y, Chen Z, Wang H, Xu D, 'Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models' (2025)

DOI	10.48550/arxiv.2411.18375

2024

Liu L, Hu Z, Chen Z, 'Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor' (2024)

DOI	10.48550/arxiv.2406.00791

2024

Wang Z, Chen Z, Wu Y, Zhao Z, Zhou L, Xu D, 'PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis' (2024)

DOI	10.48550/arxiv.2405.15463

2024

Chen Z, Zhou L, Hu Z, Xu D, 'Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression' (2024)

DOI	10.48550/arxiv.2405.04274

2024

Han T, Chen Z, Guo S, Xu W, Bai L, 'CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer' (2024)

DOI	10.48550/arxiv.2405.03376

2024

Yang L, Wang Z, Chen Z, Liang X, Zhou L, 'MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation' (2024)

DOI	10.48550/arxiv.2312.02233

2023

Liu X, Chen Z, Zhou L, Xu D, Xi W, Bai G, Zhao Y, Zhao J, 'UFDA: Universal Federated Domain Adaptation with Practical Assumptions' (2023)

DOI	10.48550/arxiv.2311.15570

2023

Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization' (2023)

DOI	10.48550/arxiv.2309.01512

2023

Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Implicit Representation by Explicit Learning' (2023)

DOI	10.48550/arxiv.2303.04341

2022

Wang Z, Huo X, Chen Z, Zhang J, Sheng L, Xu D, 'Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation' (2022)

DOI	10.48550/arxiv.2208.14893

2021

Chen Z, Gu S, Zhu F, Xu J, Zhao R, 'Improving Facial Attribute Recognition by Group and Graph Learning' (2021)

DOI	10.48550/arxiv.2105.13825

2020

Hu Z, Chen Z, Xu D, Lu G, Ouyang W, Gu S, 'Improving Deep Video Compression by Resolution-adaptive Flow Coding' (2020)

DOI	10.48550/arxiv.2009.05982

2020

Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W, 'Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition' (2020)

DOI	10.48550/arxiv.2003.14111

2017

Chen Z, Zhou J, Wang X, 'Visual Analytics of Movement Pattern Based on Time-Spatial Data: A Neural Net Approach' (2017)

DOI	10.48550/arxiv.1707.02554

Show 15 more preprints

Edit

Research Collaborations

The map is a representation of a researchers co-authorship with collaborators across the globe. The map displays the number of publications against a country, where there is at least one co-author based in that country. Data is sourced from the University of Newcastle research publication management system (NURO) and may not fully represent the authors complete body of work.

	Country	Count of Publications
	Australia	20
	China	11
	Hong Kong	6
	Singapore	4
	New Zealand	2
	Switzerland	1
	Taiwan, Province of China	1
	United States	1
	More...

Edit

Dr Zhenghao Chen

Position

Lecturer in Data Science
School of Computer and Information Sciences
College of Engineering, Science and Environment

Contact Details

Email	zhenghao.chen@newcastle.edu.au
Links	Twitter Personal webpage

Edit

Dr Zhenghao Chen

Career Summary

Biography

Qualifications

Keywords

Fields of Research

Professional Experience

UON Appointment

Academic appointment

Professional appointment

Awards

Honours

Scholarship

Teaching

Publications

Chapter (2 outputs)

Conference (23 outputs)

Journal article (15 outputs)

Patent (3 outputs)

Preprint (18 outputs)

Research Collaborations

Dr Zhenghao Chen

Position

Contact Details

Connect with me