A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking

Shan Liang; Wenju Liu; Wei Jiang

A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking

Shan Liang^*, Wenju Liu, Wei Jiang

^*Corresponding author for this work

CAS - Institute of Automation

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Although ratio mask may achieve better speech separation results than that by binary mask, present speech separation systems usually set Ideal Binary Mask (IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask (IRM) directly. In this paper, a generalization algorithm from the binary mask to ratio mask is proposed. Since the key issue in IRM estimation is the noise tracking, we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions. Then, we use a Gaussian Markov Random Field (GMRF) to model the correlation of noise estimation between adjacent units. Finally, we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask. Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

Original language	English
Pages (from-to)	632-637
Number of pages	6
Journal	Shengxue Xuebao/Acta Acustica
Volume	38
Issue number	5
Publication status	Published - May 2013
Externally published	Yes

Cite this

@article{219f0ba581e64c28ae99f8b2b58e7c4f,

title = "A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking",

abstract = "Although ratio mask may achieve better speech separation results than that by binary mask, present speech separation systems usually set Ideal Binary Mask (IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask (IRM) directly. In this paper, a generalization algorithm from the binary mask to ratio mask is proposed. Since the key issue in IRM estimation is the noise tracking, we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions. Then, we use a Gaussian Markov Random Field (GMRF) to model the correlation of noise estimation between adjacent units. Finally, we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask. Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.",

author = "Shan Liang and Wenju Liu and Wei Jiang",

year = "2013",

month = may,

language = "English",

volume = "38",

pages = "632--637",

journal = "Shengxue Xuebao/Acta Acustica",

issn = "0371-0025",

number = "5",

}

TY - JOUR

T1 - A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking

AU - Liang, Shan

AU - Liu, Wenju

AU - Jiang, Wei

PY - 2013/5

Y1 - 2013/5

N2 - Although ratio mask may achieve better speech separation results than that by binary mask, present speech separation systems usually set Ideal Binary Mask (IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask (IRM) directly. In this paper, a generalization algorithm from the binary mask to ratio mask is proposed. Since the key issue in IRM estimation is the noise tracking, we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions. Then, we use a Gaussian Markov Random Field (GMRF) to model the correlation of noise estimation between adjacent units. Finally, we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask. Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

AB - Although ratio mask may achieve better speech separation results than that by binary mask, present speech separation systems usually set Ideal Binary Mask (IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask (IRM) directly. In this paper, a generalization algorithm from the binary mask to ratio mask is proposed. Since the key issue in IRM estimation is the noise tracking, we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions. Then, we use a Gaussian Markov Random Field (GMRF) to model the correlation of noise estimation between adjacent units. Finally, we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask. Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

UR - http://www.scopus.com/inward/record.url?scp=84883826951&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84883826951

SN - 0371-0025

VL - 38

SP - 632

EP - 637

JO - Shengxue Xuebao/Acta Acustica

JF - Shengxue Xuebao/Acta Acustica

IS - 5

ER -

A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking

Abstract

Other files and links

Fingerprint

Cite this