Source Separation using Sparse NMF and Graph Regularization on Vietnamese Dataset

Tuan Pham

doi:10.31130/ict-ud.2020.98

Tuan Pham The University of Danang, University of Technology and Education, Vietnam

DOI: https://doi.org/10.31130/ict-ud.2020.98

Abstract

Source separation is popular problem in which English datasets is used by default. Besides, source separation or speech enhancement is an important pre-processing step for following processes e.g. automatic speech recognition, automatic answering machine or hearing ads…However, experiments of source separation on Vietnamese dataset is quite modest as well as lack of Vietnamese standard datasets for source separation. To deal these issues, we build a Vietnamese dataset for source separation by collecting utterances of broadcasters from VTV’s official website. Moreover, a novel method was proposed by using sparse non-negative matrix factorization and graph regularization. Experiments showed that the proposed method is outperformed baseline.

Downloads

Download data is not yet available.

References

[1] D. D. Lee, and H. S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, 2001.
[2] C. Févotte, and J. Idier, Algorithms for nonnegative matrix factorization with the beta-divergence, Neural Computation 2011.
[3] P. Hoyer, Non-negative matrix factorization with sparseness Constraints, Journal of Machine Learning Research, 2004, pp. 1457-1469.
[4] J. Eggert, and E. Körner, Sparse coding and NMF, in Procesding IEEE International Joint Conference on Neural Networks, 2004, pp. 2529 - 2533.
[5] N. Mikkel, Speech separation using non-negative features and sparse non-negative matrix factorization, Interspeech, 2007, pp. 19-33.
[6] M. Belkin, and P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, Advances in neural information processing systems, Cambridge, MA: MIT Press.
[7] C. Fevotte, N. Bertin, and J. L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music, Neural Comput, 2009, pp. 793-830.
[8] E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Trans. Audio, Speech and Language Processing, 2006, pp. 1462-1469.
[9] J. S. Garofolo. TIMIT acoustic phonetic continuous speech corpus, Linguistic Data Consortium, 1993.
[10] J. L. Roux, F. Weninger, and J. R. Hershey, Sparse NMF – halfbaked or well done?, Mitsubishi Electric Research Laboratories Technical Report, 2015.