Abstract
Supervised speech separation uses a supervised learning algorithm to learn a mapping from an input noisy signal to an output target signal. In recent years, due to the development of deep learning, supervised separation algorithm has become the most important research direction in speech separation area and the training target has a significant impact on the performance of the speech separation algorithm. Ideal ratio mask is a commonly used training target, which can improve speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use an optimal ratio mask as the training target, and use the deep neural network (DNN) as the separation model. Experiments are carried out under various noise environments and signal to noise ratio conditions, and the results show that the optimal ratio mask outperforms other training targets in general.
Translated title of the contribution | Supervised Speech Separation Using Optimal Ratio Mask |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 1876-1887 |
Number of pages | 12 |
Journal | Zidonghua Xuebao/Acta Automatica Sinica |
Volume | 44 |
Issue number | 10 |
DOIs | |
Publication status | Published - Oct 2018 |
Externally published | Yes |
Keywords
- Deep neural network (DNN)
- Speech separation
- Supervised learning
- Training targets