進階搜尋


 
系統識別號 U0026-0812200913574029
論文名稱(中文) 新穎獨立成份分析應用於雜訊語音辨識
論文名稱(英文) A Novel Independent Component Analysis for Noisy Speech Recognition
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 95
學期 2
出版年 96
研究生(中文) 趙昶凱
研究生(英文) Chang-Kai Chao
學號 p7694160
學位類別 碩士
語文別 中文
論文頁數 105頁
口試委員 口試委員-吳宗憲
口試委員-王新民
口試委員-王小川
口試委員-陳信希
口試委員-李同益
指導教授-簡仁宗
中文關鍵字   交互資訊  語音辨識  獨立成份分析 
英文關鍵字 speech recognition  mutual information  entropy  ICA 
學科別分類
中文摘要 獨立成份分析(ICA)是被廣泛的應用在解決未知訊號分離的問題上。本研究提出了一種新穎的獨立成份分析演算法,並應用在雜訊環境下語音辨識;對於將獨立成份分析應用於HMM分群或是語者分群演算法的概念,就是將觀察資料的特徵向量透過ICA分解混合後並且得到對應的獨立成份,而這些獨立成份就代表著潛藏在某特定語者的語音訊號或是語音特徵向量中性別、腔調或是環境噪音等的資訊,本研究的重點在於發展新穎ICA目標函數之非監督式分群演算法,我們也應用ICA演算法於未知訊號分離和應用於在雜訊語音辨識等問題。
在獨立成份分析過程中,我們定義好用來量測特徵向量間獨立性的目標函數,利用此目標函數求得一個將相依性或是交互資訊最小化之最佳解混合矩陣(Demixing Matrix),然而ICA的前提必須是來源訊號彼此是獨立的且其機率分佈為非高斯分佈。在本研究中,我們利用Jensen's Inequality發展新的獨立性量測方法並實現出參數型及非參數型獨立成份分析,在參數型部分我們使用廣義高斯模型(Generalized Gaussian Model)來模組化語音特徵向量所具有的非高斯特性,在非參數型部分我們使用Parzen Window方法來建立非參數型機率密度,一種新穎獨立成份分析演算法發展出來並求得解混合矩陣。我們也評估本研究所提出的ICA目標函數所估測出的解混合矩陣與文獻上其他方法之效能比較,這些方法包括最大相似度函數、最小化交互資訊以及最大化負熵等方法。
在實驗部分,我們探討未知訊號分離及應用ICA於HMM分群和評估在Aurora2語料庫的噪音環境下的語音辨識效率。初步的實驗結果顯示本論文提出的方法得到較佳的參數估測收斂效果及語音辨識率。
英文摘要 Independent component analysis (ICA) is a widely accepted mechanism in solving blind source separation (BSS) problem. In this study, we develop a new ICA approach for unsupervised learning and apply it for hidden Markov model (HMM) clustering and noisy speech recognition. The underlying concept of proposed ICA algorithm is to de-mix the HMM mean vectors and identify the corresponding mixture sources prior to HMM clustering. These independent sources represent the specific noise conditions embedded in speech signals or features. We focus on presenting a general unsupervised learning algorithm based on a new ICA objective function. We will apply this algorithm for BSS and different problems in speech recognition.
In ICA procedure, we follow up a predefined objective function measuring the dependence among feature vectors and derive an optimal demixing matrix, which can minimize the measure of dependence or mutual information. The basic assumptions of ICA include the source signals being mutually independent and having non-Gaussian distribution. We are using the Jensen’s inequality to derive a new metric of dependence measure. The parametric and nonparametric ICA approaches are developed. The generalized Gaussian model is used to characterize the non-Gaussianity of an acoustic random vector. We exploit a parametric ICA using generalized Gaussian distribution and also a nonparametric ICA using the Parzen window based distribution. We evaluate the efficiency and effectiveness of the proposed objective function in finding ICA demixing matrix compared to the existing objection functions including maximum likelihood, minimum mutual information and maximum non-entropy, etc. In the experiments, we investigate the performance of BSS and noisy speech recognition. We are using this ICA method for HMM clustering and evaluating speech recognition performance on AURORA2 noisy speech database. The preliminary results show that the proposed ICA achieves faster convergence property and higher recognition rate.
論文目次 中文摘要..................................................I
Abstract..................................................Ⅲ
圖目錄....................................................Ⅷ
表目錄...................................................XIV
第一章 緒論................................................1
1.1 前言.................................................1
1.2 研究動機與目的.......................................2
1.3 研究方法簡介.........................................3
1.4 章節概要.............................................5
第二章 資訊理論簡介.......................................7
2.1 前言.................................................7
2.2 熵(Entropy)..........................................8
2.2.1 邊際熵.............................................9
2.2.2 聯合熵.............................................9
2.2.3 條件熵............................................10
2.3 交互資訊............................................10
2.4 其他衡量交互資訊的方法..............................11
2.4.1 各種不同量測交互資訊方法的比較....................13
2.4.2 不同衡量交互資訊的方法與獨立之關係................20
第三章 文獻探討.........................................22
3.1 前言................................................22
3.2 獨立成分分析基本理論................................24
3.2.1 中央極限定理......................................26
3.2.2 非高斯特性的量測與峰態............................27
3.2.3 集中化與白色化....................................29
3.3 最佳化演算法........................................30
3.4 獨立成份分析量測準則................................31
3.4.1 獨立成份分析基於負熵..............................31
3.4.2 獨立成份分析基於最大相似度函數....................32
3.4.3 獨立成份分析基於交互資訊..........................33
3.4.4 交互資訊與非高斯特性關係..........................34
3.4.5 交互資訊與相似度函數..............................36
第四章 獨立性量測及獨立成分分析演算法於HMM分群..........38
4.1 前言................................................38
4.2 新穎獨立成分分析量測準則............................38
4.2.1 Jensen's Inequality...............................39
4.2.2 Jensen's Inequality and Arithmetic-Geometric Mean
Inequality...............................................41
4.2.3 透過Jensen’s inequality量測隨機變數之獨立性......41
4.3 獨立成份之機率密度估測..............................43
4.3.1 參數型獨立成份分析................................43
4.3.2 非參數型獨立成份分析..............................48
4.4 新穎之獨立成份分析演算法............................51
4.5 獨立成份分析與交互資訊之關係........................52
4.6 不同量測方法收斂速度比較............................54
4.7 隱藏式馬可夫模型....................................60
4.8 隱藏式馬可夫模型分群................................62
第五章 實驗.............................................69
5.1 實驗設定............................................69
5.2 未知訊號分離實驗....................................71
5.2.1 模擬資料實驗......................................71
5.2.2 實際語音訊號實驗..................................75
5.3 噪音環境下的語音辨識................................85
5.3.1 不同空間與不同分群個數比較........................85
5.4 實驗討論............................................92
5.5 系統展示介紹........................................93
第六章 結論與未來研究方向...............................96
6.1 結論................................................96
6.2 未來研究方向........................................96
參考文獻.................................................98
附錄....................................................102
參考文獻 [1]L. Bahl , J.Maker , P.Cohen , F. Jelinek, B. Lewis, R. Mercer, “Recognition of a Continuously Read Natural Corpus,” in Proc. Int. Conf. Acoustics Speech and Signal Processing, (ICASSP '80) Vol. 5, pp.872-875, Apr. 1980
[2]R. Boscolo, H. Pan, and V. P. Roychowdhury, “Non-Parametric ICA,” in Proc. of the Third International Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2001), San Diego, USA 2001.
[3]R. Boscolo, H. Pan, V. P. Roychowdhury, “Independent Component Analysis Based on Nonparametric Density Estimation,” IEEE transaction on Neural Networks Vol. 15 No. 1 January 2004
[4]G. Box, and G. Tiao, (1973). “Bayesian Inference in Statistical Analysis,” John Wiley and Sons .
[5]Y. Chen, “Blind Separation Using Convex Functions,” IEEE Transaction on Signal Processing, Vol.53, N0.6, JUNE 2005
[6]P.Ding , X.Kang , and L. Zhang, “Personal Recognition Using ICA,” in Proc. ICONIP ,2001
[7]J. T. Chien and B. C. Chen, “Independent component analysis using nonparametric likelihood ratio criterion,” in Proc. Int. Conf. Acoustics Speech and Signal Processing, vol. 5, pp. 173-176, Philadelphia, March 2005.
[8]J. T. Chien and B. C. Chen , “A new independent component analysis for speech recognition and separation,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, pp.1245-1254 July 2006
[9]DAHYOT, ROZENN, WILSON, SIMON PAUL, “Robust Scale Estimation for the Generalized Gaussian Probability Density Function,” Advances in Methodology and Statistics 2006
[10]R. Faltlhauser; G, Ruske, ”Robust speaker clustering in eigenspace”, Automatic Speech Recognition and Understanding, pp.57-60, 2001,
[11]C. Huang, T. Chen, S. Li, E. Chang and J. Zhou Microsoft Research “ Analysis of Speaker Variability,” Eurospeech pp1377-1381. 2001
[12]A. Hyvärinen. “Independent Component Analysis in the Presence of Gaussian Noise by Maximizing Joint Likelihood,” Neuro computing, 22:49-67, 1998.
[13]A. Hyvärinen, J. Karhunen, E. Oja, “Independent Component Analysis,” Wiley, New York, 2001.
[14]A. Hyvärinen. “Fast and Robust Fixed-Point Algorithms for Independent Component Analysis,” IEEE Transactions on Neural Networks 10(3):626-634, 1999.
[15]A. Hyvärinen and U. Köster. FastISA: “A Fast Fixed-point Algorithm for Independent Subspace Analysis,” Proc. European Symposium on Artificial Neural Networks, Bruges, Belgium, 2006
[16]G. J. Jang; T. W. Lee; Y. H. Oh; “Learning Statistically Efficient Features for Speaker Recognition,” in Proc. Int. Conf. Acoustics Speech and Signal Processing, (ICASSP '01). Vol.1, pp.437-440, 2001
[17]B. H. Juang, “Pattern Recognition in Speech and Language Processing,” CRC Press 2003
[18]J. N. Kapur and H. K. Kesavan, “Entropy Optimization Principles with Applications,”Academic Press, Inc., New York, 1992.
[19]J.N. Kapur, "Measures of Information and Their Applications,” John Wiley & Sons. 1994
[20]P. Kumar , A. Johnson, “On A Symmetric Divergence Measure and Information Inequalities,” Journal of inequalities in Pure and Applied Mathematics
[21]S. Kullback, “Information Theory and Statistics, Dover Publications,” Inc., New York, 1968.
[22]T. W. Lee; G. J. Jang; “The Statistical Structures of Male and Female Speech Signals,” in Proc. Int. Conf. Acoustics Speech and Signal Processing, (ICASSP '01). Vol. ,pp.105-108, 2001
[23]J. H. Lee; H. Y. Jung; T. W. Lee; S. Y. Lee; “Speech Feature Extraction using Independent Component Analysis,” in Proc. Int. Conf. Acoustics Speech and Signal Processing (ICASSP '00.) Vol.3, pp.1631-1634, June 2000
[24]T. W. Lee, M. S. Lewicki “The Generalized Gaussian Mixture Model Using ICA,” in Proc. of the Second International Workshop on Independent Component Analysis And Blind Signal Separation (ICA-2000)
[25]J. Lin, “Divergence Measure Based on the Shannon Entropy”, IEEE Transactions on information theory, Vol. 37, No. 1 January 1991
[26] S.Makeig , A. Bell, T.Jung , T. Sejnowski “Independent Component Analysis of Electroencephalographic,” Advances in Neural Information Processing System , vol. 8, Cambridge , MA: MIT Press , pp.145-151 , 1996
[27]J. Principle , D.Xu and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering , vol 1 , S. Haykin(Ed.), John Wiley ans Sons, New York ,2000, Ch. 7.
[28]S. Rao, J. C. Sanchez, S. Han, J. C. Principe, “Spike Sorting Using Non Parametric Clustering Via Cauchy Schwartz PDF Divergence,” in Proc. Int. Conf. Acoustics Speech and Signal Processing, (ICASSP 2006), Vol. 5 , 2006
[29]J. Rosca and A. Kofmehl “Cepstrum-Like Representations for Text Independent Speaker Recognition,” in Proc. of the Fourth International Symposium on Independent Component Analysis And Blind Signal Separation (ICA2003) April 2003 , Nara , Japan
[30]P. Singh, B. Raj, R.Stern, “Automatic Generation of Subword Units for Speech Recognition System,” IEEE Transaction on Speech and Audio Processing, Vol.10 No.2 February 2002.
[31]C.E. Shannon. “A Mathematical Theory of Communication,” Bell System Technical J. Thomas , Elements of Information Theory. New York: John Wiley & Sons,1991.
[32]C. E. Shannon and W. Weaver, “The Mathematical Theory of Communication,University of Illinois Press,” Urbana, 1962.
[33]N. Thatphithakkul, B. Kruatrachue, C. Wutiwiwatchai , S. Marukatat, V. Boonpiam, "Robust Speech Recognition Using PCA-Based Noise Classification", 10th International Conference on Speech and Computer , SPECOM 2005
[34]D. Xu “Energy Entropy And Information Potential For Neural Computation,” University of Flordia 1999
[35]D. Xu , J. C. Principle , J. Fisher III , H. C. Wu, “A Novel Measure For Independent Component Analysis (ICA),” in Proc. Int. Conf. Acoustics Speech and Signal Processing, (ICASSP '98) Vol. 2, pp.1161-1164, 1998
[36]D. Xu, ; J.C. Principe, “Feature Evaluation using Quadratic Mutua Information,” Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on Vol.1, pp.459-463, July 2001
[37]S. Yooung, J. Jansen, J. Odell, D. Ollason, P. Woodland, “The HTK BOOK(Version 2.0),” ECRL, 1995
[38]Z. Zhang, S. Furui, “An Online Incremental Speaker Adaptation Method Using Speaker-Clustered Initial Models,” in Proc. ICSLP, pp. III-694–697, Beijing, 2000
[39]Z. Zhang and S. Furui, “Piecewise-linear transformation-based HMM adaptation for noisy speech”, Speech Communication, Volume 42, Issue 1,pp. 43-58 January 2004
[40]http://www.elec.qmul.ac.uk/icarn/events/icarn06/callforpapers.html
[41]陳柏誠 “新穎獨立成份分析應用於隱藏式馬可夫模型分群及未知訊號分離,” 國立成功大學資訊工程學系碩士論文, July 2004
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2008-08-30起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2010-08-30起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw