TY - GEN
T1 - Joint Optimization of Recurrent Networks Exploiting Source Auto-regression for Source Separation
AU - Nie, Shuai
AU - Xue, Wei
AU - Liang, Shan
AU - Zhang, Xueliang
AU - Liu, Wenju
AU - Qiao, Liwei
AU - Li, Jianping
N1 - Publisher Copyright:
Copyright © 2015 ISCA.
PY - 2015
Y1 - 2015
N2 - In music interferences condition, source separation is very difficult. In this paper, we propose a novel recurrent network exploiting the auto-regressions of speech and music interference for source separation. An auto-regression can capture the shortterm temporal dependencies in data to help the source separation. For the separation, we independently separate the magnitude spectra of speech and interference from the mixture spectra by including an extra masking layer in the recurrent network. Compared to directly evaluating the ideal mask, the extra masking layer relaxes the assumption of independence between speech and interference which is more suitable for the realworld environments. Using the separated spectra of speech and interference, we further explore a discriminative training objective and joint optimization framework for the proposed network, which incorporates the correlations and spectral dependencies of speech and interference into the separation. Systematic experiments show that the proposed model is competitive with the state-of-the-art method in singing-voice separations.
AB - In music interferences condition, source separation is very difficult. In this paper, we propose a novel recurrent network exploiting the auto-regressions of speech and music interference for source separation. An auto-regression can capture the shortterm temporal dependencies in data to help the source separation. For the separation, we independently separate the magnitude spectra of speech and interference from the mixture spectra by including an extra masking layer in the recurrent network. Compared to directly evaluating the ideal mask, the extra masking layer relaxes the assumption of independence between speech and interference which is more suitable for the realworld environments. Using the separated spectra of speech and interference, we further explore a discriminative training objective and joint optimization framework for the proposed network, which incorporates the correlations and spectral dependencies of speech and interference into the separation. Systematic experiments show that the proposed model is competitive with the state-of-the-art method in singing-voice separations.
KW - Autoregressive models
KW - Deep recurrent neural networks
KW - Discriminative training objective
KW - Source separation
UR - http://www.scopus.com/inward/record.url?scp=84959099213&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2015-666
DO - 10.21437/Interspeech.2015-666
M3 - Conference Proceeding
AN - SCOPUS:84959099213
VL - 2015-January
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 3307
EP - 3311
BT - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Y2 - 6 September 2015 through 10 September 2015
ER -