基于优化浮值掩蔽的监督性语音分离

Sha Sha Xia; Xue Liang Zhang; Shan Liang

doi:10.16383/j.aas.2017.c160748

基于优化浮值掩蔽的监督性语音分离

Translated title of the contribution: Supervised Speech Separation Using Optimal Ratio Mask

Sha Sha Xia, Xue Liang Zhang^*, Shan Liang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Supervised speech separation uses a supervised learning algorithm to learn a mapping from an input noisy signal to an output target signal. In recent years, due to the development of deep learning, supervised separation algorithm has become the most important research direction in speech separation area and the training target has a significant impact on the performance of the speech separation algorithm. Ideal ratio mask is a commonly used training target, which can improve speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use an optimal ratio mask as the training target, and use the deep neural network (DNN) as the separation model. Experiments are carried out under various noise environments and signal to noise ratio conditions, and the results show that the optimal ratio mask outperforms other training targets in general.

Translated title of the contribution	Supervised Speech Separation Using Optimal Ratio Mask
Original language	Chinese (Traditional)
Pages (from-to)	1876-1887
Number of pages	12
Journal	Zidonghua Xuebao/Acta Automatica Sinica
Volume	44
Issue number	10
DOIs	https://doi.org/10.16383/j.aas.2017.c160748
Publication status	Published - Oct 2018
Externally published	Yes

Keywords

Deep neural network (DNN)
Speech separation
Supervised learning
Training targets

Access to Document

10.16383/j.aas.2017.c160748

Cite this

@article{4b628546f345439498346e666e0d34cb,

title = "基于优化浮值掩蔽的监督性语音分离",

abstract = "Supervised speech separation uses a supervised learning algorithm to learn a mapping from an input noisy signal to an output target signal. In recent years, due to the development of deep learning, supervised separation algorithm has become the most important research direction in speech separation area and the training target has a significant impact on the performance of the speech separation algorithm. Ideal ratio mask is a commonly used training target, which can improve speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use an optimal ratio mask as the training target, and use the deep neural network (DNN) as the separation model. Experiments are carried out under various noise environments and signal to noise ratio conditions, and the results show that the optimal ratio mask outperforms other training targets in general.",

keywords = "Deep neural network (DNN), Speech separation, Supervised learning, Training targets",

author = "Xia, {Sha Sha} and Zhang, {Xue Liang} and Shan Liang",

year = "2018",

month = oct,

doi = "10.16383/j.aas.2017.c160748",

language = "繁体中文",

volume = "44",

pages = "1876--1887",

journal = "Zidonghua Xuebao/Acta Automatica Sinica",

issn = "0254-4156",

number = "10",

}

TY - JOUR

T1 - 基于优化浮值掩蔽的监督性语音分离

AU - Xia, Sha Sha

AU - Zhang, Xue Liang

AU - Liang, Shan

PY - 2018/10

Y1 - 2018/10

N2 - Supervised speech separation uses a supervised learning algorithm to learn a mapping from an input noisy signal to an output target signal. In recent years, due to the development of deep learning, supervised separation algorithm has become the most important research direction in speech separation area and the training target has a significant impact on the performance of the speech separation algorithm. Ideal ratio mask is a commonly used training target, which can improve speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use an optimal ratio mask as the training target, and use the deep neural network (DNN) as the separation model. Experiments are carried out under various noise environments and signal to noise ratio conditions, and the results show that the optimal ratio mask outperforms other training targets in general.

AB - Supervised speech separation uses a supervised learning algorithm to learn a mapping from an input noisy signal to an output target signal. In recent years, due to the development of deep learning, supervised separation algorithm has become the most important research direction in speech separation area and the training target has a significant impact on the performance of the speech separation algorithm. Ideal ratio mask is a commonly used training target, which can improve speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use an optimal ratio mask as the training target, and use the deep neural network (DNN) as the separation model. Experiments are carried out under various noise environments and signal to noise ratio conditions, and the results show that the optimal ratio mask outperforms other training targets in general.

KW - Deep neural network (DNN)

KW - Speech separation

KW - Supervised learning

KW - Training targets

UR - http://www.scopus.com/inward/record.url?scp=85058893206&partnerID=8YFLogxK

U2 - 10.16383/j.aas.2017.c160748

DO - 10.16383/j.aas.2017.c160748

M3 - 文章

AN - SCOPUS:85058893206

SN - 0254-4156

VL - 44

SP - 1876

EP - 1887

JO - Zidonghua Xuebao/Acta Automatica Sinica

JF - Zidonghua Xuebao/Acta Automatica Sinica

IS - 10

ER -

基于优化浮值掩蔽的监督性语音分离

Abstract

Keywords

Access to Document

Other files and links

Cite this