進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2208201121195700
論文名稱(中文) 具群組稀疏性之貝氏非負矩陣分解應用於音樂訊號分離
論文名稱(英文) Bayesian NMF with Group Sparsity and Its Application for Music Source Separation
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 99
學期 2
出版年 100
研究生(中文) 林宗翰
研究生(英文) Tsung-Han Lin
學號 p76981243
學位類別 碩士
語文別 中文
論文頁數 152頁
口試委員 口試委員-許聞廉
口試委員-廖弘源
口試委員-陳瑞彬
指導教授-簡仁宗
口試委員-孫永年
中文關鍵字 貝氏  非負矩陣分解  群組稀疏性  音樂訊號分離 
英文關鍵字 Bayesian  NMF  Group Sparsity  Music Source Separation 
學科別分類
中文摘要 非負矩陣分解(Non-negative Matrix Factorization, NMF)演算法目前已被廣泛地發展並運用在許多實用之多媒體系統,這套演算法是基於部分表示法(Parts Representation)或是加入稀疏度限制的特性來表示或分解資料。然而,傳統非負矩陣分解缺乏統計模型分析及模型稀疏度詮釋,造成其延展性及模型規則化(Model Regularization)不足的問題並且凸顯出如何控制矩陣分解中稀疏度大小的重要性。本研究提出以貝氏(Bayesian)理論為基礎的群組稀疏性的非負矩陣分解並應用於音樂訊號分離,分解出具韻律性(Rhythmic)及具諧波性(Harmonic)的音訊來源,我們的作法是將音樂訊號轉換到對數強度頻譜(Logarithmic Magnitude Spectrum)形成非負矩陣,基於貝氏理論的基礎,透過拉普拉斯比例混合機率分佈(Laplacian Scale Mixture Distribution)為主的稀疏事前機率分佈(Sparse Prior Distribution)架構出能分解矩陣中基底向量間的關聯性並達到稀疏表示效果及解決模型過度估測(Over-Estimation)問題,本論文將基底矩陣群組化並劃分為兩個群組,所有音樂訊號皆由一組基底向量(Shared Basis)及另一組獨特性基底向量(Individual Basis)所共同表示,共享性基底架構不同訊號間的共有統計特性,獨特性基底向量是用來補償共享性資料表示(Shared Data Representation)之外的殘差資訊(Residual Information),使用相關性高的共享性基底及獨特性基底來完整表示音樂訊號並完成整套群組稀疏性貝氏非負矩陣分解(Bayesian NMF with Group Sparsity, GS-BNMF),然後再反轉換到時域(Time Domain)得到分解出來時間訊號。在本研究中,我們發展出Gibbs取樣(Gibbs Sampling) 演算法透過近似推論(Approximate Inference)及模型事後機率遞迴式地取樣出收斂後的模型參數並實現出GS-BNMF演算法。最後我們將本論文提出來的方法應用於具節奏性音樂之單一通道訊號分離(Single-Channel Signal Separation),分離出具韻律性之鼓音及其它具諧波性之音源,在不同實驗分析比較中驗證了本方法之有效性。
英文摘要 Non-negative matrix factorization (NMF) has been well developed and applied for many practical multimedia systems. In general, NMF is a kind of parts representation which factorizes a data matrix into product of a basis matrix and a weight matrix. NMF is solved by imposing the sparseness constraint so that the observed signals are robustly represented by a set of basis vectors and its corresponding sensing weights. However, conventional NMF is lack of statistical modeling and interpretation and is difficult to control degrees of sparseness. The extension of considering model regularization is limited. Also, controlling the sparseness in data representation becomes a crucial research topic. In this dissertation, we propose a Bayesian NMF with group sparsity (GS-BNMF) and apply it for music source separation, or specifically separation of single-channel music signal into rhythmic source signal and harmonic source signal. In the beginning, we first transform music signals within a time segment into the corresponding log magnitude spectral signals and establish a non-negative data matrix. Our idea is to comply with Bayesian theory and introduce a Laplacian scale mixture distribution as a sparse prior to construct a GS-BNMF procedure. We fulfill NMF through investigating the relevance of basis vectors for data representation and tackling the over-estimation problem via sparse coding. We build up two groups of basis vectors for representation of music signals. One is the shared basis and the other is the individual basis. The shared basis vectors are estimated to cover the shared statistics of music signals and the individual basis vectors are calculated to compensate the residual information that shared basis vectors could not characterize. Due to incorporation of sparse prior, GS-BNMF identifies the relevant shared basis vectors and individual basis vectors for data representation. The irrelevant basis vectors are not used. After having the factorized matrices, the demixed signals in log magnitude spectrum domain are then converted to the corresponding time signals via inverse Fourier transform. In this study, we develop the approximate inference based on Gibbs sampling and apply it to recursively sample the model parameters according to posterior distributions. The GS-BNMF is implemented accordingly. The experiments on single-channel signal separation of music signal into drum (or rhythmic) signal and harmonic signal show the effectiveness of proposed methods.
論文目次 中文摘要 IV
ABSTRACT VI
誌謝 VIII
章節目錄 X
表目錄 I
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 研究方法簡介 3
1.4章節概要 5
第二章 未知訊號分離與獨立成分分析 6
2.1 未知訊號分離 6
2.1.1 雞尾酒會問題 7
2.2 獨立成份分析 8
2.2.1基本理論 9
2.2.2中央極限定理 10
2.2.3 非高斯特性與峰態 11
2.2.4 集中化、白色化 12
第三章 非負矩陣分解與群組稀疏表示法 14
3.1 非負矩陣分解簡介 14
3.2 非負矩陣分解演算法 14
3.3 具稀疏性之非負矩陣分解 15
3.3.1 稀疏性 15
3.3.2 稀疏性對非負矩陣分解的重要性 16
3.3.3 非負矩陣分解應用於未知訊號分離 17
3.4 任務相關訊號分離 21
3.4.1 非負矩陣局部共同分解 21
3.4.2 群組非負矩陣分解 22
3.5 群組稀疏編碼 23
3.5.1 具有拉普拉斯比例混合事前資訊的稀疏編碼 24
3.5.2 群組稀疏性 27
第四章 群組稀疏貝氏非負矩陣分解 29
4.1具節奏性音樂之未知訊號分離 29
4.2 模型架構 29
4.2.1 基底矩陣模型 30
4.2.2 係數矩陣模型 31
4.3 變異性貝氏和GIBBS取樣做模型推論之比較 37
4.4 GIBBS取樣與METROPOLIS-HASTINGS演算法 39
4.5 與相關文獻方法之比較 42
4.6 參數推論過程 43
第五章 實驗 56
5.1 實驗設定 56
5.2 音樂訊號分離 57
5.3 具諧波性的(HARMONIC)音訊 58
5.4 實驗結果 59
第六章 結論與未來研究方向 92
6.1結論 92
6.2 未來研究方向 93
參考文獻 95
附錄一 99
參考文獻 [1] H. Attias, “A variational Bayesian framework for graphic model”, Neural Information on Processing Systems (NIPS), vol. 12, pp. 209-215, 2000.
[2] C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
[3] A. T. Cemgil, “Bayesian inference for nonnegative matrix factorisation models”, Technical Report CUED/F-INFENG/TR.609, Cambridge University Engineering Department, 2008.
[4] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit”, Society for Industrial and Applied Mathematics (SIAM), vol. 43, no. 1, pp. 129-159, 2001.
[5] J.-T. Chien and B.-C. Chen, “A new independent component analysis for speech recognition and separation”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006.
[6] J.-T. Chien, H.-L. Hsieh and S. Furui, “A new mutual information measure for independent component analysis”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1817-1820, 2008.
[7] A. Cichocki, R. Zdunek, and S. Amari, “New algorithms for non-negative matrix factorization in applications to blind sources separation”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, pp. 621-624, 2006.
[8] P. Comon, “Independent component analysis, a new concept?” Signal Processing, vol. 36, pp. 287-314, 1994.
[9] N. Dobigeon, S. Moussaoui, J.-Y. Tourneret, and C. Carteret, “Bayesian separation of spectral sources under non-negativity and full additivity constraints”, Signal Processing, vol. 89, no. 12, pp. 2657-2669, 2009.
[10] D. Donoho and Y. Tsaig, “Fast solution of l1-norm minimization problems when the solution may be sparse,” IEEE Transactions on Information Theory, vol. 54, no. 11, pp. 4789–4812, 2008.
[11] Z. Duan, Y. Zhang, C. Zhang and Z. Shi, “Unsupervised single-channel music source separation by average harmonic structure modeling”, IEEE Transactions On Audio Speech And Language Processing, vol. 16, pp. 766-778, 2008.
[12] C. Févotte and S. J. Godsill, “A Bayesian approach for blind separation of sparse sources”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2174 – 2188, 2006.
[13] D. FitzGerald, M. Cranitch, and E. Coyle, “Shifted nonnegative matrix factorisation for sound source separation”, in IEEE Workshop on Statistical Signal Processing, pp. 1132-1137, 2005.
[14] D. FitzGerald, M. Cranitch, and E. Coyle, “Sound source separation using shifted non-negative tensor factorization”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, 2006.
[15] P. J. Garrigues and B. A. Olshausen, “Group sparse coding with a Laplacian scale mixture prior”, Neural Information on Processing Systems(NIPS), pp. 1-9, 2010.
[16] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints”, Journal of Machine Learning Research (JMLR), vol. 5, pp. 1457-1469, 2004.
[17] H.-L. Hsieh and J.-T. Chien, “A new nonnegative matrix factorization for independent component analysis”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2026-2029, 2010.
[18] J. Huang, X. Huang and D. Metaxas, “Learning with dynamic group sparsity”, IEEE 12th International Conference on Computer Version, pp. 64-71, 2009.
[19] A. Hyvärinen, “Survey on independent component analysis,” Neural Computing Surveys, vol. 2, pp. 94-128, 1999.
[20] A. Hyvärinen and E. Oja, “Independent component analysis: algorithm and application,” Neural Networks, vol. 13, pp. 411-430, 2001.
[21] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Transactions on Neural Network, vol. 10, pp. 626-634, 1999.
[22] S. Ikeda, “Factor analysis preprocessing for ICA,” in Proceedings of the Third International Workshop on Independent Component Analysis and Blind Signal Separation, pp. 29-35, 2001.
[23] M. Kim and S. Choi, “On spectral basis selection for single channel polyphonic music separation” in Proceedings of the International Conference on Artificial Neural Networks (ICANN), vol. 2, pp. 157-162, 2005.
[24] M. Kim, J. Yoo, K. Kang and S. Choi, “Blind rhythmic source separation: nonnegativity and repeatability”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2006-2009, 2010.
[25] K. Kiviluoto and E. Oja, “Independent component analysis for parallel financial time series,” In Proceedings of the International Conference on Neural Information Processing (ICONIP), vol. 2, pp. 895-898, 1998.
[26] A. Klapuri, “Signal processing methods for the automatic transcription of music”, Ph.D. thesis, pp. 952-15, 2004.
[27] D. D. Lee and H. S. Seung, “Algorithm for non-negative matrix factorization”, Neural Information Processing System (NIPS), pp. 556-562, 2000.
[28] H. Lee and S. Choi, “Group nonnegative matrix factorization for EEG classification”, in Proc. Int. Conf. Artificial Intelligence and Statistics (AISTATS), pp. 320-327, 2009.
[29] J. -H. Lee, T. -W. Lee, F. A. Jolesz, and S. -S. Yoo, ”Independent vector analysis (IVA): multivariate approach for fMRI group study”, NeuroImage, vol. 40, pp. 86-109, 2008.
[30] A. Lefévre, F. Bach and C. Févotte, “Itakura-Saito nonnegative matrix factorization with group sparsity”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011.
[31] W. Liu, N. Zheng, and X. Lu. “Non-negative matrix factorization for visual coding” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 293-296, 2003.
[32] S. Makeig, A. Bell, T. Jung, and T. Sejnowski, “Independent component analysis of electroencephalographic,” Advances in Neural Information Processing System, vol. 8, Cambridge, MA: MIT Press, pp. 145-151, 1996.
[33] S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret. “Separation of non-negative mixture of non-negative sources using a Bayesian approach and MCMC sampling”, IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4133-4145, 2006.
[34] V. P. Pauca , F. Shahnaz, M. W. Berry and R. J. Plemmons, “Text mining using non-negative matrix factorizations”, Proceedings of SIAM International Conference on Data Mining, vol. 54, pp. 452-456, 2004.
[35] R. Salakhutdinov, and A. Mnih, “Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo”. Proceedings of the International Conference on Machine Learning (ICML), pp. 880-887, 2008.
[36] M. N. Schmidt and R. S. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization”, International Conference on Spoken Language Processing (INTERSPEECH), 2006.
[37] M. N. Schmidt, O. Winther, and L. K. Hansen, “Bayesian non-negative matrix factorization”, Proceeding of International Conference on Independent Component Analysis and Signal Separation, vol. 5441, LNCS, pp. 540-547, 2009.
[38] R. Raina, A. Battle, H. Lee, B. Packer and A. Y. Ng, “Self-taught learning: Transfer learning from unlabeled data”, Proceedings of the Twenty-fourth International Conference on Machine Learning, pp. 759-766, 2007.
[39] S. Senecal and P. -O. Amblard, “Bayesian separation of discrete sources via Gibbs sampling”, Proceeding of International Conference on Independent Component Analysis and Blind Signal Separation, pp. 556-572, 2000.
[40] P. Smaragdis, B. Raj, and M. V. Shashanka, “Supervised and semi-supervised separation of sounds from single-channel mixtures”, Proceeding of International Conference on Independent Component Analysis and Blind Signal Separation, pp. 414-421, 2007.
[41] Y. W. Teh, “A hierarchical Bayesian language model based on Pitman-Yor processes”, Proceeding of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 985–992, 2006.
[42] R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society, Series B , vol. 58, no. 1, pp. 267-288, 1996.
[43] Y. Tsaig and D. L. Donoho, “Extensions of compressed sensing”, Signal Processing, pp. 549-571, 2006.
[44] T. O. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 3, pp. 1066-1074, 2007.
[45] S. Yildirim and M. Saraclar, “Single channel music and speech separation using non-negative matrix factorization”, Signal Processing and Communications Applications Conference, pp. 301-304, 2009.
[46] J. Yoo, M. Kim, K. Kang and S. Choi, “Nonnegative matrix partial co-factorization for drum source separation”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1942-1945, 2010.
[47] Y. Zhang and Y. Fang, “A NMF algorithm for blind separation of uncorrelated signals” International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), vol. 3, pp. 999-1003, 2007.
[48] M. Zhong and M. Girolami, “Reversible Jump MCMC for Non-Negative Matrix Factorization”, Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 5, pp. 663-670, 2009.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-08-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-08-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw