| 2025 |
Yang L, Wang Z, Chen Z, Liang X, Zhou L, 'Medxchat: A Unified Multimodal Large Language Model Framework Towards CXRS Understanding and Generation', 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) (2025) [E1]
|
|
|
| 2025 |
Yu S, Jin C, Wang H, Chen Z, Jin S, Zuo Z, Xu X, Sun Z, Bingni Z, Wu J, Hao Z, Sun Q, 'Frame-Voyager: Learning to Query Frames for Video Large Language Models', International Conference on Learning Representations (ICLR) 2025 (2025)
|
|
|
| 2025 |
, 'Proceedings of the 3rd International Workshop on Rich Media With Generative AI' (2025)
|
|
|
| 2025 |
Li Y, Zhou L, Ling N, Chen Z, Wang W, Jiang W, 'M
3
VIR: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation', Proceedings of the 3rd International Workshop on Rich Media With Generative AI, 20-29 (2025)
|
|
|
| 2025 |
Jiang W, Chen Z, Xu D, '(RichMediaGAI'25) 3rd International Workshop on Rich Media with Generative AI', Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025, 14296-14298 (2025)
|
|
|
| 2025 |
Liu L, Chen Z, Xu D, '3D Gaussian Splatting Data Compression with Mixture of Priors', MM'2025 Proceedings of the 33rd ACM International Conference on Multimedia, 8341-8350 (2025) [E1]
|
|
|
| 2025 |
Wu Y, Chen Z, Wang H, Xu D, 'Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models', MM'2025. Proceedings of the 33rd ACM International Conference on Multimedia, 9714-9723 (2025) [E1]
|
|
|
| 2025 |
Wu Y, Wang H, Chen Z, Pang J, Xu D, 'On-Device Diffusion Transformer Policy for Efficient Robot Manipulation', Proceedings of the IEEE International Conference on Computer Vision, 14073-14083 (2025) [E1]
|
|
|
| 2024 |
Liu X, Chen Z, Luping Z, Xu D, Xi W, Bai G, Yihan Z, Zhao J, 'UFDA: Universal Federated Domain Adaptation with Practical Assumptions', Proceedings of the 38th AAAI Conference on Artificial Intelligence, 14026-14034 (2024) [E1]
|
|
|
| 2024 |
Liu L, Hu Z, Chen Z, 'Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor', Generalizing from Limited Resources in the Open World Second International Workshop, GLOW 2024 Held in Conjunction with IJCAI 2024, 2160 CCIS, 3-17 (2024) [E1]
|
|
|
| 2024 |
Chen Z, Zhou L, Hu Z, Xu D, 'Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression', MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 11022-11031 (2024) [E1]
|
|
|
| 2024 |
Guo J, Chen Z, Ma Y, Liu X, Kim J, Ouyang W, Tao D, 'EMCLR'24: 1st InternationalWorkshop on Efficient Multimedia Computing under Limited Resources', PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 1-2 (2024)
|
|
|
| 2023 |
Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Implicit Representation by Explicit Learning', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June, 16727-16738 (2023) [E1]
|
|
|
| 2023 |
Liu L, Hu Z, Chen Z, Xu D, 'ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8047-8056 (2023) [E1]
Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus sol... [more]
Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on improving human vision perception. In this work, our objective is to enhance image compression methods for both human vision quality and machine vision tasks simultaneously. To achieve this, we introduce a novel approach to Partition, Transmit, Reconstruct, and Aggregate (PTRA) the latent representation of images to balance the optimizations for both aspects. By employing our method as a module in existing neural image codecs, we create a latent representation predictor that dynamically manages the bit-rate cost for machine vision tasks. To further improve the performance of auto-regressive-based coding techniques, we enhance our hyperprior network and predictor module with context modules, resulting in a reduction in bit-rate. The extensive experiments conducted on various machine vision benchmarks such as ILSVRC 2012, VOC 2007, VOC 2012, and COCO demonstrate the superiority of our newly proposed image compression framework. It outperforms existing neural image compression methods in multiple machine vision tasks including classification, segmentation, and detection, while maintaining high-quality image reconstruction for human vision.
|
|
|
| 2023 |
Chen Z, Relic L, Azevedo R, Zhang Y, Gross M, Xu D, Zhou L, Schroers C, 'Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8543-8551 (2023) [E1]
|
|
|
| 2022 |
Chen Z, Lu G, Hu Z, Liu S, Jiang W, Xu D, 'LSVC: A Learning-based Stereo Video Compression Framework', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 6063-6072 (2022)
In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) fr... [more]
In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) from both left and right views. Specifically, when compressing the current frame from each view, our framework reduces temporal redundancy by performing motion compensation using the reconstructed intra-view adjacent frame and at the same time exploits binocular redundancy by conducting disparity compensation using the latest reconstructed cross-view frame. Moreover, to effectively compress the introduced motion and disparity offsets for better compensation, we further propose two novel schemes called motion residual compression and disparity residual compression to respectively generate the predicted motion offset and disparity offset from the previously compressed motion offset and disparity offset, such that we can more effectively compress residual offset information for better bit-rate saving. Overall, the entire framework is implemented by the fully-differentiable modules and can be optimized in an end-to-end manner. Our comprehensive experiments on three automotive stereo video benchmarks Cityscapes, KITTI 2012 and KITTI 2015 demonstrate that our proposed framework outperforms the learning-based single-view video codec and the traditional hand-crafted multi-view video codec.
|
|
|
| 2021 |
Chen Z, Gu S, Zhu F, Xu J, Zhao R, 'IMPROVING FACIAL ATTRIBUTE RECOGNITION BY GROUP AND GRAPH LEARNING', Proceedings IEEE International Conference on Multimedia and Expo (2021)
Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of... [more]
Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlations that are spatial and non-spatial relationships. For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature. On the other hand, to discover the non-spatial relationship, we model a group-based Graph Correlation Learning to explore affinities of predefined part-based groups. We utilize such affinity information to control the communication between all groups and then refine the learned group features. Overall, we propose a unified network called Multi-scale Group and Graph Network. It incorporates these two newly proposed learning strategies and produces coarse-to-fine graph-based group features for improving facial attribute recognition. Comprehensive experiments demonstrate that our approach outperforms the state-of-the-art methods.
|
|
|
| 2020 |
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W, 'Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition', 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 140-149 (2020)
|
|
|
| 2017 |
Chen Z, Zhou J, Wang X, Swanson J, Chen F, Feng D, 'Neural net-based and safety-oriented visual analytics for time-spatial data', Proceedings of the International Joint Conference on Neural Networks, 2017-May, 1133-1140 (2017)
Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform mach... [more]
Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform machine learning in data mining industry. This paper proposes a visual analytics pipeline for trajectory data enabling better understanding movements pattern of people using Neural Network as back-end and other visualization techniques as front-end for gaining information of preferences of attractions, similarities of groups, popularities of attractions and pattern of movement flow. Such understandings help to address the management issue by extracting the outstanding features to detect abnormal pattern such as detection of crime and predicting overall movements, and so on. Successfully dealing with those issues would have significant improvements of entire management of public facility such as parks and transportation.
|
|
|
| 2017 |
Zhi W, Yueng HWF, Chen Z, Zandavi SM, Lu Z, Chung YY, 'Using Transfer Learning with Convolutional Neural Networks to Diagnose Breast Cancer from Histopathological Images', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10637 LNCS, 669-676 (2017)
Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural netw... [more]
Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural networks to automatically diagnose breast cancer from patches of histopathological images. We compare the performance of using transfer learning with an off-the-shelf deep convolutional neural network architecture, VGGNet, and a shallower custom architecture. Our proposed final ensemble model, which contains three custom convolutional neural network classifiers trained using transfer learning, achieves a significantly higher image classification accuracy on the large public benchmark dataset than the current best results, for all image resolution levels.
|
|
|
| 2017 |
Zhi W, Chen Z, Yueng HWF, Lu Z, Zandavi SM, Chung YY, 'Layer Removal for Transfer Learning with Deep Convolutional Neural Networks', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10635 LNCS, 460-469 (2017)
It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre... [more]
It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre-trained on a very large source dataset. Then, a target dataset is transferred onto the neural network. This approach is a form of transfer learning, and allows very deep networks to achieve outstanding performance even when a small target dataset is available. It is thought that the bottom layers of the pre-trained network contain general information, which are applicable to different datasets and tasks, while the upper layers of the pre-trained network contain abstract information relevant to a specific dataset and task. While studies have been conducted on the fine-tuning of these layers, the removal of these layers have not yet been considered. This paper explores the effect of removing the upper convolutional layers of a pre-trained network. We empirically investigated whether removing upper layers of a deep pre-trained network can improve performance for transfer learning. We found that removing upper pre-trained layers gives a significant boost in performance, but the ideal number of layers to remove depends on the dataset. We suggest removing pre-trained convolutional layers when applying transfer learning on off-the-shelf pre-trained DCNNs. The ideal number of layers to remove will depend on the dataset, and remain as a parameter to be tuned.
|
|
|
| 2016 |
Liu G, Chen Z, Yeung HWF, Chung YY, Yeh WC, 'A new weight adjusted particle swarm optimization for real-time multiple object tracking', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 9948 LNCS, 643-651 (2016)
This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To... [more]
This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To this end, a new update strategy of inertia weight of the particles in WAPSO is designed to maintain particle diversity and prevent pre-mature convergence. Meanwhile, the implementation of a mechanism that enlarges the search space upon the detection of occlusion enhances WAPSO's robustness to non-linear target motion. In addition, the choice of Root Sum Squared Errors as the fitness function further increases the speed of the proposed approach. The experimental results has shown that in combination with the model feature that enables initialization of multiple independent swarms, the high-speed WAPSO algorithm can be applied to multiple non-linear object tracking for real-time applications.
|
|
|