Projects per year
Personal profile
Personal profile
I am an Associate Professor in the Department of Intelligent Science at Xi’an Jiaotong-Liverpool University. I hold a Ph.D. in Acoustics from the Chinese Academy of Sciences and have previously worked at Brigham Young University, the University of Surrey, and Qualcomm. My research focuses on machine learning and signal processing for audio, speech, and acoustics. Since 2020, I have served as an Associate Editor for the Noise Control Engineering Journal and regularly review for top journals in the field.
Research interests
My research interests include machine learning for audio and speech, spatial audio modeling, intelligent sound event detection and localization, active noise control, and data-driven acoustic signal processing. I am particularly interested in developing deep learning approaches, Audio and Speech Large Language Models (LLMs), and generative models to enhance sound environment analysis and control.
Experience
I completed my Ph.D. in Acoustics at the Chinese Academy of Sciences in 2013. Since then, I have held research and engineering positions at Brigham Young University, the Institute of Acoustics (CAS), the University of Surrey, and Qualcomm. Currently, I work in the Department of Intelligent Science at Xi’an Jiaotong-Liverpool University and as a Visiting Scholar at the University of Surrey.
Teaching
INT402, Data Mining and Big Data Analytics
INT403, Spoken Language Processing
Awards and honours
I have received several awards for my research contributions in sound event localization and detection, including:
-
First place in the L3DAS22 Challenge Task 2 (“3D Sound Event Localization and Detection”)
-
Third place in the DCASE 2023 Challenge Task 3B (“Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes”)
-
Second place in the DCASE 2022 Challenge Task 3 (“Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes”)
-
First place in the DCASE 2020 Challenge Task 5 (“Urban Sound Tagging with Spatiotemporal Context”)
-
Second place in the DCASE 2019 Challenge Task 3 (“Sound Event Localization and Detection”)
I have also received the “Judges’ Award” and “Reproducible System Award” at the DCASE 2019 and 2020 workshops, recognizing the impact and reproducibility of my research.
In addition, during my doctoral research, I was awarded the National Scholarship and the Zhu-Li-Yue-Hua Outstanding Doctoral Scholarship for academic excellence.
Related documents
Education/Academic qualification
PhD, Ph.D. in Signal Processing and Acoustics, University of Chinese Academy of Sciences (UCAS),
2008 → 2013
Award Date: 1 Jun 2013
Bachelor, BSc in Electrical and Electronics Engineering., Nanjing University
2004 → 2008
Award Date: 1 Jun 2008
External positions
Visiting Scholar, University of Surrey
Jun 2025 → …
Associate Editor for Noise Control Engineering Journal
Jan 2020 → …
Research areas
- Machine Learning
- Audio Signal Processing
- Speech Processing
- Spatial Audio Modeling
- Sound Event Detection and Localization (SELD)
- Generative Audio Models
- Audio and Speech Large Language Models (LLMs)
Keywords
- QC Physics
- Acoustics
- QA75 Electronic computers. Computer science
- Machine learning
Person Types
- Staff
Fingerprint
- 2 Similar Profiles
Collaborations and top research areas from the last five years
-
Development of Sound Event Recognition Algorithms
2/06/25 → 30/06/28
Project: Collaborative Research Project
-
Implementation of Sound Source Detection and Localization Algorithm
15/12/24 → 14/12/25
Project: Collaborative Research Project
-
Methods Study on Multi-Task Learning for 3D Computational Environmental Audio Analysis
1/01/23 → 31/12/25
Project: Internal Research Project
-
Develop of sound detection simulation software
1/06/24 → 30/09/24
Project: Collaborative Research Project
-
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
Kang, F. & Cao, Y., May 2025.Research output: Contribution to conference › Paper › peer-review
-
WavJourney: Compositional Audio Creation with Large Language Models
Liu, X. & Cao, Y., May 2025, In: IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).Research output: Contribution to journal › Article › peer-review
-
EDTC: enhance depth of text comprehension in automated audio captioning
Tan, L. & Cao, Y., Feb 2024.Research output: Contribution to conference › Paper
-
POWER CUE ENHANCED NETWORK AND AUDIO-VISUAL FUISON FOR SOUND EVENT LOCALIZATION AND DETECTION OF DCASE2024 CHALLENGE
Guan, X. & Cao, Y., 2024, POWER CUE ENHANCED NETWORK AND AUDIO-VISUAL FUISON FOR SOUND EVENT LOCALIZATION AND DETECTION OF DCASE2024 CHALLENGE.Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review
-
Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection
Hu, J. & Cao, Y., Aug 2024, In: IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).Research output: Contribution to journal › Article › peer-review
Activities
-
Language-queried audio source separation
Yin Cao (Supervisor)
2024Activity: Supervision › Completed SURF Project
-
Deep source separation for speech and music
Yin Cao (Supervisor)
2023Activity: Supervision › Completed SURF Project
-
Multimodal Sound Event Localization and Detection
Yin Cao (Supervisor)
2023 → 2024Activity: Supervision › Master Dissertation Supervision
-
Audio Deepfake Detection
Yin Cao (Supervisor)
2022 → 2023Activity: Supervision › Master Dissertation Supervision
-
Noise Control Engineering Journal (Journal)
Yin Cao (Reviewer)
2021 → …Activity: Peer-review and editorial work of publications › Editorial work