A new noise-tracking algorithm for generalizing binary time-frequency (T-F) masking to ratio masking

Shan Liang, Wei Jiang, Wenju Liu

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

In this paper, we attempt to generalize the ideal binary mask (IBM) estimation to the ideal ratio mask (IRM) estimation. Under binary masking, the error in IBM estimation may greatly distort the original speech spectrum. The main purpose of this paper is using ratio mask to smooth this negative impact. Since the key issue is the noise tracking, we firstly use exponential distributions to model the distribution of noise power with binary mask and mixture power as condition. Then, we use a Gaussian distribution to model the correlation of noise estimation between adjacent T-F units. As the IBM of majority units can be estimated correctly, the correlation model could reduce the impact introduced by the error in IBM estimation. Systematic experiments show that our algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.

Original languageEnglish
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages950-953
Number of pages4
Publication statusPublished - 2012
Externally publishedYes
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: 9 Sept 201213 Sept 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume2

Conference

Conference13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/TerritoryUnited States
CityPortland, OR
Period9/09/1213/09/12

Keywords

  • Bayesian rule
  • Ideal binary mask
  • Ideal ratio mask
  • Markov chain Monte Carlo

Fingerprint

Dive into the research topics of 'A new noise-tracking algorithm for generalizing binary time-frequency (T-F) masking to ratio masking'. Together they form a unique fingerprint.

Cite this