🧑‍🎓 I am a fifth-year bachelor-straight-to-doctorate student at Academy for Engineering and Technology of Fudan University. As a part of the Cognition and Intelligent Technology Laboratory (CIT Lab), I am advised by Prof. Lihua Zhang (National Thousand Talents Program). Prior to this, I received the B.E. degree in Communication Engineering from the joint training of Yunnan University and the Chinese People's Armed Police (PAP), Kunming, China, in 2020.

Now, I supervise five postgraduate students in the CIT lab as the co-first author or corresponding author of the research works. :)

📖 Research Interests

  • Large Language/Visual Models: Pre-trained models, Tailored model construction, Hallucination mitigation, Model robustness, Agents
  • Multimodal Learning: Multimodal representation learning, Multimodal fusion, Multimodal perception debiasing
  • Intention Understanding: Context-aware emotion recognition, Sentiment analysis, Modality missingness
  • Autonomous Driving: Assistive driving perception, Multi-agent collaborative perception
  • I have published 20+ papers as the first author at the reputable journals and top international conferences, such as IEEE TPAMI, TCSVT, NeurIPS, CVPR, ICCV, ECCV, AAAI, ACM MM, KBS, and IEEE SPL. :)

    🎖 Honors and Scholarships

  • National Scholarship (Doctoral student, Top 1%), 2024
  • Pacemaker to Excellent Student (20 Winners per year), 2024
  • Academic Star Award (30 Winners per year), 2024
  • National Scholarship (Doctoral student, Top 1%), 2023
  • Huatai Securities Technology Scholarship (Top 1%), 2023
  • CICAI Finalist of Best Student Paper Award, 2023
  • Excellent Student Cadre, 2023
  • Outstanding League Cadre, 2023
  • National Scholarship (Postgraduate student, Top 1%), 2022
  • Excellent Student, 2022
  • 🔈 Academic Service

  • PC Member: ICLR, NeurIPS, AAAI, ICCV, CVPR, ECCV, ACM MM, ACL, IROS.
  • Journal Reviewer: IEEE TPAMI, TIP, TNNLS, TMM, RAL, and TCSVT.
  • 🔥 News

    • 2024.06:  🎉🎉 We propose MedAide, the omni multi-agent collaboration framework for healthcare applications.
    • 2024.09:  🎉🎉 2 paper accepted to NeurIPS 2024.
    • 2024.08:  🎉🎉 1 paper accepted to ACML 2024.
    • 2024.07:  🎉🎉 4 paper accepted to ECCV 2024, 1 paper accepted to TCSVT, 1 paper accepted to TPAMI.
    • 2024.06:  🎉🎉 We release PediatricsGPT, the first Chinese medical large language models for pediatric applications.
    • 2024.02:  🎉🎉 3 papers accepted to CVPR 2024.
    • 2023.12:  🎉🎉 1 paper accepted to AAAI 2024.
    • 2023.09:  🎉🎉 1 paper accepted to NeurIPS 2023.
    • 2023.07:  🎉🎉 2 papers accepted to ICCV 2023.

    📝 Selected Publications

    Equal contribution Corresponding author

    Large Language/Visual Models

    Submitted to ACL 2025
    sym

    MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration

    Jinjie Wei, Dingkang Yang, Yanshu Li, ..., Lihua Zhang

    • We are the first to propose the omni multi-agent collaboration framework for real-world scenarios with composite healthcare intents, which shows potential for advancing interactive systems for personalized healthcare.
    Submitted to AAAI 2025
    sym

    Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

    Dingkang Yang, Dongling Xiao, Jinjie Wei, ..., Ke Li, Lihua Zhang

    • We propose an efficient Comparator-driven Decoding-Time (CDT) framework to improve response factuality. The core philosophy is to equip target LLMs with comparators modeling different generative attributes separately during the decoding process, using logit distribution integration to facilitate next-token prediction in the factuality-robust directions.
    NeurIPS 2024
    sym

    PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

    Dingkang Yang, Jinjie Wei, Dongling Xiao, ..., Peng Zhai, Lihua Zhang

    • This paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline.
    ACML 2024
    sym

    Large Vision-Language Models as Emotion Recognizers in Context Awareness

    Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, ..., Lihua Zhang

    • We systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design a training-free framework to exploit the In-Context Learning (ICL) capabilities of LVLMs. 3) To leverage the rich knowledge base of LVLMs, we incorporate Chain-of-Thought (CoT) into our framework to enhance the reasoning ability and provide interpretable results.
    Submitted to ICLR 2025
    sym

    Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

    Jiawei Chen, Dingkang Yang, Tong Wu, ..., Lihua Zhang

    • We introduce the first benchmark dedicated to hallucination detection in the medical domain, Med-HallMark, and provide baselines for various LVLMs. We propose the first hallucination detection model, MediHallDetector, and demonstrate its superiority through extensive experiments. We present a new hallucination evaluation metric, MediHall Score, and show its effectiveness relative to traditional metrics through qualitative and quantitative analysis.
    ACM MM 2024
    sym

    Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

    Jiawei Chen, Dingkang Yang, Yue Jiang, ..., Lihua Zhang

    • We are the first to centre on finetuning a small subset of the Med-VLP's inherent parameters to adapt to downstream tasks. We conduct a comprehensive series of experiments finetuning foundational components of Med-VLMs, including systematic comparisons with existing PEFT methods centred on tuning extrinsic components.
    MICCAI 2024
    sym

    Can LLMs' Tuning Methods Work in Medical Multimodal Domain?

    Jiawei Chen, Yue Jiang, Dingkang Yang (Project advising), ..., Lihua Zhang

    • We delve into the fine-tuning methods of LLMs and conduct extensive experiments to investigate the impact of fine-tuning methods for large models on existing multimodal models in the medical domain from the training data level and the model structure level.
    Preprint 2024
    sym

    MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

    Yue Jiang, Jiawei Chen, Dingkang Yang, ..., Lihua Zhang

    • We introduce MedThink, a novel medical construction method that effectively mitigates hallucinations in LVLMs within the medical domain.

    Multimodal Learning/Intention Understanding

    ECCV 2024
    sym

    Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

    Dingkang Yang, Mingcheng Li, Dongling Xiao..., Lihua Zhang

    • Current multimodal learning task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a multimodal counterfactual inference analysis framework based on causality rather than conventional likelihood.
    IEEE TCSVT 2024
    sym

    Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and-Agnostic Representations

    Dingkang Yang, Mingcheng Li, Linhao Qu..., Lihua Zhang

    • We propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities. MEA overcomes the temporal asynchrony dilemma by capturing intra- and inter-modal element dependencies in exclusive and agnostic subspaces with the above-tailored components.
    IEEE TPAMI 2024
    sym

    Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training

    Dingkang Yang, Kun Yang, Haopeng Kuang..., Lihua Zhang

    • We embrace causal inference to disentangle the models from the impact of the context bias, and formulate the causalities among variables in the computer vision task via a customized causal graph. Subsequently, we present a causal intervention module to de-confound the confounder, which is built upon backdoor adjustment theory to facilitate seeking approximate causal effects during model training.
    CVPR 2024
    sym

    Robust Emotion Recognition in Context Debiasing

    Dingkang Yang, Kun Yang, Mingcheng Li, ..., Lihua Zhang

    • We devise CLEF, a model-agnostic CAER debiasing framework that facilitates existing methods to capture valuable causal relationships and mitigate the harmful bias in context semantics through counterfactual inference. CLEF can be readily adapted to state-of-the-art (SOTA) methods with different structures, bringing consistent and significant performance gains.
    Preprint 2024
    sym

    Towards Multimodal Human Intention Understanding Debiasing via Subject-Deconfounding

    Dingkang Yang, Dongling Xiao, Ke Li..., Lihua Zhang

    • Multimodal intention understanding (MIU) is an indispensable component of human expression analysis from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviors. Unfortunately, existing works all suffer from the subject variation problem due to data distribution discrepancies among subjects. Here, we propose SuCI, a simple yet effective causal intervention module to disentangle the impact of subjects acting as unobserved confounders and achieve model training via true causal effect.
    CVPR 2024
    sym

    De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

    Yuzheng Wang,Dingkang Yang, Zhaoyu Chen, ..., Lihua Zhang

    • We propose a KDCI framework to restrain the detrimental effect caused by the confounder and attempt to achieve the de-confounded distillation process. KDCI can be easily and flexibly combined with existing generation-based or sampling-based DFKD paradigms.
    CVPR 2024
    sym

    Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

    Mingcheng Li, Dingkang Yang, Xiao Zhao, ..., Lihua Zhang

    • We propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the multimodal sentiment analysis task under uncertain missing modalities.
    IEEE SPL 2024
    sym

    Towards Asynchronous Multimodal Signal Interaction and Fusion via Tailored Transformers

    Dingkang Yang, Haopeng Kuang, Kun Yang, Mingcheng Li, Lihua Zhang

    • We present a Transformer-driven Signal Interaction and Fusion (TSIF) approach to effectively model asynchronous multimodal signal sequences. TSIF consists of linear and cross-modal transformer modules with different duties.
    AAAI 2024
    sym

    A Unified Self-Distillation Framework for Multimodal Sentiment Analysis with Uncertain Missing Modalities

    Mingcheng Li, Dingkang Yang, Yuxuan Lei, ..., Lihua Zhang

    • We propose an unified multimodal missing modality self-distillation framework (UMDF) to tackle the missing modality dilemma in the MSA task. UMDF yields robust joint multimodal representations through distillationbased distribution supervision and attention-based multigrained interactions.
    CVPR 2023
    sym

    Context De-confounded Emotion Recognition

    Dingkang Yang, Zhaoyu Chen, Yuzheng Wang, ..., Lihua Zhang

    Project | ArXiv

    • We are the first to investigate the adverse context bias of the datasets in the context-aware emotion recognition task from the causal inference perspective and identify that such bias is a confounder, which misleads the models to learn the spurious correlation. In this case, we propose a contextual causal intervention module based on the backdoor adjustment to de-confound the confounder and exploit the true causal effect for model training.
    KBS 2023
    sym

    Target and Source Modality Co-reinforcement for Emotion Understanding from Asynchronous Multimodal Sequences

    Dingkang Yang, Yang Liu, Can Huang, ..., Peng Zhai, Lihua Zhang

    • Inspired by the human perception paradigm, we propose a target and source modality co-reinforcement approach to achieve sufficient crossmodal interaction and fusion at different granularities.
    ECCV 2022
    sym

    Emotion Recognition for Multiple Context Awareness

    Dingkang Yang, Shuai Huang, Shunli Wang, ..., Lihua Zhang

    Project | Supplementary | Data

    • We present a context-aware emotion recognition framework that combines four complementary contexts.
    ACM MM 2022
    sym

    Disentangled Representation Learning for Multimodal Emotion Recognition

    Dingkang Yang, Shuai Huang, Haopeng Kuang, Yangtao Du, Lihua Zhang

    • We propose a feature-disentangled multimodal emotion recognition method, which learns the common and private feature representations for each modality.
    ACM MM 2022
    sym

    Learning Modality-specific and -agnostic Representations for Asynchronous Multimodal Language Sequences

    Dingkang Yang, Haopeng Kuang, Shuai Huang, Lihua Zhang

    • We propose a multimodal fusion approach for learning modality-specific and modality-agnostic representations to refine multimodal representations and leverage the complementarity across different modalities.
    IEEE SPL 2022
    sym

    Contextual and Cross-modal Interaction for Multi-modal Speech Emotion Recognition

    Dingkang Yang, Shuai Huang, Yang Liu, Lihua Zhang

    • We propose a multimodal speech emotion recognition method based on interaction awareness.

    Driving/Collaborative Perception in Autonomous Driving

    NeurIPS 2023
    sym

    How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception

    Dingkang Yang, Kun Yang, Yuzheng Wang, ..., Peng Zhai, Lihua Zhang

    Project

    • Multi-agent collaborative perception has recently received widespread attention as an emerging application in driving scenarios. We propose How2comm, a collaborative perception framework that seeks a trade-off between perception performance and communication bandwidth.
    ICCV 2023
    sym

    AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

    Dingkang Yang, Shuai Huang, Zhi Xu, ..., Lihua Zhang

    Project | ArXiv

    • We propose an AssIstive Driving pErception dataset (AIDE) to facilitate further research on the vision-driven driver monitoring systems. AIDE captures rich information inside and outside the vehicle from several drivers in realistic driving conditions.
    ACM MM 2023
    sym

    What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling

    Kun Yang, Dingkang Yang, Jingyu Zhang, ...

    • We propose What2comm, a communication-efficient multiagent collaborative perception framework. Our framework outperforms previous approaches on real-world and simulated datasets by addressing various collaboration interferences, including communication noises, transmission delay, and localization errors, in an end-to-end manner.
    ICCV 2023
    sym

    Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception

    Kun Yang, Dingkang Yang, Jingyu Zhang, ...

    Project | ArXiv

    • Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. We propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner.