Projects per year
Personal profile
Personal profile
I am an Associate Professor in the Department of Intelligent Science at Xi’an Jiaotong-Liverpool University. I hold a Ph.D. in Acoustics from the Chinese Academy of Sciences and have previously worked at Brigham Young University, the University of Surrey, and Qualcomm. My research focuses on machine learning and signal processing for audio, speech, and acoustics. Since 2020, I have served as an Associate Editor for the Noise Control Engineering Journal and regularly review for top journals in the field.
Research interests
My research interests include machine learning for audio and speech, spatial audio modeling, intelligent sound event detection and localization, active noise control, and data-driven acoustic signal processing. I am particularly interested in developing deep learning approaches, Audio and Speech Large Language Models (LLMs), and generative models to enhance sound environment analysis and control.
Experience
I completed my Ph.D. in Acoustics at the Chinese Academy of Sciences in 2013. Since then, I have held research and engineering positions at Brigham Young University, the Institute of Acoustics (CAS), the University of Surrey, and Qualcomm. Currently, I work in the Department of Intelligent Science at Xi’an Jiaotong-Liverpool University and as a Visiting Scholar at the University of Surrey.
Teaching
INT402, Data Mining and Big Data Analytics
INT403, Spoken Language Processing
Awards and honours
I have received several awards for my research contributions in sound event localization and detection, including:
-
First place in the L3DAS22 Challenge Task 2 (“3D Sound Event Localization and Detection”)
-
Third place in the DCASE 2023 Challenge Task 3B (“Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes”)
-
Second place in the DCASE 2022 Challenge Task 3 (“Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes”)
-
First place in the DCASE 2020 Challenge Task 5 (“Urban Sound Tagging with Spatiotemporal Context”)
-
Second place in the DCASE 2019 Challenge Task 3 (“Sound Event Localization and Detection”)
I have also received the “Judges’ Award” and “Reproducible System Award” at the DCASE 2019 and 2020 workshops, recognizing the impact and reproducibility of my research.
In addition, during my doctoral research, I was awarded the National Scholarship and the Zhu-Li-Yue-Hua Outstanding Doctoral Scholarship for academic excellence.
Related documents
Education/Academic qualification
PhD, Ph.D. in Signal Processing and Acoustics, University of Chinese Academy of Sciences (UCAS),
2008 → 2013
Award Date: 1 Jun 2013
Bachelor, BSc in Electrical and Electronics Engineering., Nanjing University
2004 → 2008
Award Date: 1 Jun 2008
External positions
Visiting Scholar, University of Surrey
Jun 2025 → …
Associate Editor for Noise Control Engineering Journal
Jan 2020 → …
Research areas
- Machine Learning
- Audio Signal Processing
- Speech Processing
- Spatial Audio Modeling
- Sound Event Detection and Localization (SELD)
- Generative Audio Models
- Audio and Speech Large Language Models (LLMs)
Keywords
- QC Physics
- Acoustics
- QA75 Electronic computers. Computer science
- Machine learning
Person Types
- Staff
Fingerprint
- 1 Similar Profiles
Collaborations and top research areas from the last five years
-
Development of Sound Event Recognition Algorithms
Cao, Y. (PI)
2/06/25 → 30/06/28
Project: Collaborative Research Project
-
Development of Sound Event Recognition Algorithms
Cao, Y. (PI) & Liang, S. (Team member)
12/05/25 → 30/06/28
Project: Collaborative Research Project
-
Integrating spatial information for speaker diarization
Liang, S. (PI), Cao, Y. (Team member) & Wang, Q. (Team member)
1/05/25 → 30/04/28
Project: Collaborative Research Project
-
Implementation of Sound Source Detection and Localization Algorithm
Cao, Y. (PI) & Liang, S. (Team member)
15/12/24 → 14/12/25
Project: Collaborative Research Project
-
Methods Study on Multi-Task Learning for 3D Computational Environmental Audio Analysis
Cao, Y. (PI)
1/01/23 → 31/12/25
Project: Internal Research Project
-
DiffStereo: End-to-End Mono-to-Stereo Audio Generation with Diffusion Transformer
Zhang, S. & Cao, Y., 2025.Research output: Contribution to conference › Paper › peer-review
-
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
Kang, F. & Cao, Y., May 2025.Research output: Contribution to conference › Paper › peer-review
-
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
Hu, J. & Cao, Y., 2025, In: IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).Research output: Contribution to journal › Article › peer-review
-
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
Hu, J. & Cao, Y., 2025.Research output: Contribution to conference › Paper › peer-review
-
WavJourney: Compositional Audio Creation with Large Language Models
Liu, X. & Cao, Y., May 2025, In: IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).Research output: Contribution to journal › Article › peer-review
Activities
-
Language-queried audio source separation
Yin Cao (Supervisor)
2024Activity: Supervision › Completed SURF Project
-
Deep source separation for speech and music
Yin Cao (Supervisor)
2023Activity: Supervision › Completed SURF Project
-
Multimodal Sound Event Localization and Detection
Yin Cao (Supervisor)
2023 → 2024Activity: Supervision › Master Dissertation Supervision
-
Audio Deepfake Detection
Yin Cao (Supervisor)
2022 → 2023Activity: Supervision › Master Dissertation Supervision
-
Noise Control Engineering Journal (Journal)
Yin Cao (Reviewer)
2021 → …Activity: Peer-review and editorial work of publications › Editorial work