A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking

Shan Liang*, Wenju Liu, Wei Jiang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Although ratio mask may achieve better speech separation results than that by binary mask, present speech separation systems usually set Ideal Binary Mask (IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask (IRM) directly. In this paper, a generalization algorithm from the binary mask to ratio mask is proposed. Since the key issue in IRM estimation is the noise tracking, we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions. Then, we use a Gaussian Markov Random Field (GMRF) to model the correlation of noise estimation between adjacent units. Finally, we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask. Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

Original languageEnglish
Pages (from-to)632-637
Number of pages6
JournalShengxue Xuebao/Acta Acustica
Volume38
Issue number5
Publication statusPublished - May 2013
Externally publishedYes

Fingerprint

Dive into the research topics of 'A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking'. Together they form a unique fingerprint.

Cite this