Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Zifeng Zhao¹, Rongzhi Gu¹, Dongchao Yang¹, Jinchuan Tian¹, Yuexian Zou^{1, 2} 1 Peking University 2 Peng Cheng Laboratory

Introduction

This is a demo for our paper Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction. In the following, we will show the performance of both supervised training and the proposed weakly supervised training(SAMoM for short) for comparison.

Block Diagram of the Proposed SAMoM Training

Demo 1: Performance on Libri2Mix^[2]

Sample 1: ( Female + Male ) => Female

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR	ERROR	ERROR

Sample 2: ( Female + Male ) => Male

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR	ERROR	ERROR

Sample 3: ( Male + Male ) => Male

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR	ERROR	ERROR

Sample 4: ( Female + Female ) => Female

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR	ERROR	ERROR

Demo 2: Cross-domain Evaluation^[3]

Sample 1: ( Female + Male ) => Female

Mixture	Baseline: w/o Doamin Adaptation	Ours: w/ Doamin Adaptation
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 2: ( Female + Male ) => Male

Mixture	Baseline: w/o Doamin Adaptation	Ours: w/ Doamin Adaptation
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 3: ( Male + Male ) => Male

Mixture	Baseline: w/o Doamin Adaptation	Ours: w/ Doamin Adaptation
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 4: ( Female + Female ) => Female

Mixture	Baseline: w/o Doamin Adaptation	Ours: w/ Doamin Adaptation
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Demo 3: Noisy Scenario^[2][4]

Sample 1: ( Female + Male + Noise ) => Female

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 2: ( Female + Male + Noise ) => Male

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 3: ( Male + Male + Noise ) => Male

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Sample 4: ( Female + Female + Noise ) => Female

Mixture	Baseline: Supervised Training	Ours: Weakly Supervised Training
ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!	ERROR !!! Cannot Play Audio !!!

Links

[Paper] [Bibtex] [Demo GitHub]

News

2022-06-15 Paper accepted by INTERSPEECH 2022
2022-04-15 Paper available on arXiv

References

[1] M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, “Improving speaker discrimination of target speech extraction with time-domain speakerbeam,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 691695.
[2] J. Cosentino, M. Pariente, S. Cornell, A. Deleforge, and E. Vincent, “Librimix: An open-source dataset for generalizable speech separation,” arXiv preprint arXiv:2005.11262, 2020.
[3] H. Bu, J. Du, X. Na, B. Wu, H. Zhang, “Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline,” 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA). IEEE, 2017: 1-5.
[4] G. Wichern, J. Antognini, M. Flynn, L. R. Zhu, E. McQuinn, D. Crow, E. Manilow, and J. Le Roux, “WHAM!: extending speech separation to noisy environments,” in Interspeech, 2019, pp. 1368–1372.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Introduction

Demo 1: Performance on Libri2Mix^[2]

Demo 2: Cross-domain Evaluation^[3]

Demo 3: Noisy Scenario^[2][4]

Links

News

References

Files

index.md

Latest commit

History

index.md

File metadata and controls

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Introduction

Demo 1: Performance on Libri2Mix[2]

Demo 2: Cross-domain Evaluation[3]

Demo 3: Noisy Scenario[2][4]

Links

News

References

Demo 1: Performance on Libri2Mix^[2]

Demo 2: Cross-domain Evaluation^[3]

Demo 3: Noisy Scenario^[2][4]