Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

Yizhou Tan; Haojun Ai; Shengchen Li; Mark D. Plumbley

doi:10.1109/TASLP.2024.3353578

Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

Yizhou Tan, Haojun Ai^*, Shengchen Li, Mark D. Plumbley

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9%, 19.8%, and 10.7% on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.

Original language	English
Pages (from-to)	1286-1297
Number of pages	12
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	32
DOIs	https://doi.org/10.1109/TASLP.2024.3353578
Publication status	Published - 12 Jan 2024

Keywords

Acoustic scene classification
domain adaptation
feature disentanglement

Access to Document

10.1109/TASLP.2024.3353578

Cite this

@article{dd11b34859a74fa8a09ab306b3c358fc,

title = "Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement",

abstract = "Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9%, 19.8%, and 10.7% on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.",

keywords = "Acoustic scene classification, domain adaptation, feature disentanglement",

author = "Yizhou Tan and Haojun Ai and Shengchen Li and Plumbley, {Mark D.}",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

month = jan,

day = "12",

doi = "10.1109/TASLP.2024.3353578",

language = "English",

volume = "32",

pages = "1286--1297",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

}

TY - JOUR

T1 - Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

AU - Tan, Yizhou

AU - Ai, Haojun

AU - Li, Shengchen

AU - Plumbley, Mark D.

PY - 2024/1/12

Y1 - 2024/1/12

N2 - Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9%, 19.8%, and 10.7% on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.

AB - Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9%, 19.8%, and 10.7% on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.

KW - Acoustic scene classification

KW - domain adaptation

KW - feature disentanglement

UR - http://www.scopus.com/inward/record.url?scp=85182928431&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2024.3353578

DO - 10.1109/TASLP.2024.3353578

M3 - Article

AN - SCOPUS:85182928431

SN - 2329-9290

VL - 32

SP - 1286

EP - 1297

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this