進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1607201923462600
論文名稱(中文) 基於隱藏式馬可夫模型之語音情緒辨識的初始模型探討
論文名稱(英文) Initial Model Study of Speech Emotion Recognition Using Hidden Markov Model Based System
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 107
學期 2
出版年 108
研究生(中文) 黃俊修
研究生(英文) Jyun-Siou Huang
電子信箱 rock19910423@gmail.com
學號 N26061393
學位類別 碩士
語文別 英文
論文頁數 96頁
口試委員 指導教授-邱瀝毅
共同指導教授-雷曉方
口試委員-郭致宏
口試委員-姚書農
中文關鍵字 情緒辨識  梅爾倒頻譜  高斯混和模型  隱藏式馬可夫模型 
英文關鍵字 Emotion Recognition  Mel-frequency Cepstral Coefficient(MFCC)  Gaussian Mixture Model(GMM)  Hidden Markov Model(HMM) 
學科別分類
中文摘要 語音情緒辨識在人機互動領域上是最重要的主題之一,可以應用在聊天機器人、心理檢測、安全提醒等等情境。近幾年,已嘗試過非常多不同的特徵及分類器,例如,音高、共振峰、梅爾倒頻譜係數這些特徵以及支援向量機、高斯混和模型、隱藏式馬可夫模型、類神經網路等等分類器。
語音情緒可視為一段語氣的狀態變化。在本研究中,基於此論點以高斯混和模型、離散隱藏式馬可夫模型、連續隱藏式馬可夫模型作為分類器並選用梅爾倒頻譜係數作為特徵的辨識率結果做出一個完整的比較,並得出使用狀態變化機率的隱藏式馬可夫模型優於使用統計資訊做分類的高斯混和模型,以及使用多維機率密度函數的連續隱藏式馬可夫模型優於使用離散機率的離散隱藏式馬可夫模型。
上述系統存在的下溢位及奇異性問題,也會在本論文討論及提出解決方法。再者,關於隱藏式馬可夫模型中初始模型假設的討論也會在本論文提出。
最後,最高的辨識率分別以高斯混和模型、離散隱藏式馬可夫模型、連續隱藏式馬可夫模型作為分類器的結果分別為,58.07%、65.67%、89.20%。而最高的平均辨識率分別以高斯混和模型、離散隱藏式馬可夫模型、連續隱藏式馬可夫模型作為分類器的結果為,51.12%、53.20%、70.15%。辨識率結果顯示出連續隱藏式馬可夫模型這三者中辨識率最好的分類器。
英文摘要 Emotion recognition from speech signal is one of the most important topics in human-machine interaction, it is used for the chatting bot, mental examination, safety warning …etc. For the past few years, there has been tried for several speech features and classifier, e.g. pitch, formant, Mel-frequency Cepstral Coefficient (MFCC) features and Support Vector Machine (SVM), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), Artificial Neuron Network (ANN) classifiers.
Emotional speech can be regarded as a prosody state flow. In this work, the accuracy results among GMM, discrete HMM and continuous HMM are compared which are used the MFCC as speech features. There are also underflow and singularity problems among the above systems, it will be discussed and overcome. Moreover, the pre-processing of initial model hypothesis for HMM classifier will be discussed in this paper.
Finally, the highest accuracy results of recognition are 58.07%, 65.67%, 89.20% for GMM, discrete HMM, continuous HMM classifiers respectively, and the highest average accuracy results of recognition are 51.12%, 53.20%, 70.15% for GMM, discrete HMM, continuous GMM classifiers respectively. The results show that continuous HMM are the best classifier among them. And it supports that the accuracy used HMM considering state flow outperforms GMM which is considering only statistical information. Moreover, it also supports that the continuous HMM which is used multivariant dimension probability density outperforms the discrete HMM used discrete probability.
論文目次 摘要 I
Abstract III
致謝 V
List of Figures VIII
List of Tables XIII

Chapter 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.1.1 Human-Machine Interaction (HMI) 1
1.1.2 Human Speech Emotions 2
1.1.3 Recognition Systems 3
1.2 Related Works 5
1.3 Motivation 5
1.4 Thesis Organization 6

Chapter 2 REVIEW 8
2.1 Feature Extraction 8
2.1.1 Cochlea Model 8
2.1.2 Mel-frequency Cepstral Coefficient 10
2.2 Classifiers 14
2.2.1 Gaussian Mixture Model [23] 14
2.2.2 Discrete Hidden Markov Model 18
2.2.3 Continuous Hidden Markov Model 29
2.2.4 Classifier Comparison 30
2.3 Hidden Markov Model Topology 32
2.4 Hidden Markov Model Issues 33
2.4.1 Underflow 33
2.4.2 Singularity 33

Chapter 3 PROPOSED METHODOLOGY 35
3.1 Database 35
3.2 Feature Extraction 36
3.3 Initial Model Construction 39
3.3.1 Initial State Hypothesis 40
3.3.2 Initial GMM Constraint 45
3.3.3 Initial Model for Different Classifier 45
3.4 Issues Solution 46
3.4.1 Solution for Underflow 46
3.4.2 Solution for Singularity 48
3.5 System Architecture 51
3.6 Accuracy Estimation 54

Chapter 4 EXPERIMENT RESULTS AND COMPARISONS 55
4.1 Experiment Settings 55
4.2 Comparisons for GMM Classifier System 56
4.2.1 Comparisons for Number of Components 56
4.2.2 Comparisons for GMM Constraint 60
4.2.3 Comparisons for Feature Type 65
4.3 Comparisons for Discrete HMM Classifier System 67
4.3.1 Comparisons for Number of States 67
4.3.2 Comparisons for Vector Quantization Levels 72
4.4 Comparisons for Continuous HMM Classifier System 75
4.4.1 Comparisons for Number of States 76
4.4.2 Comparisons for Number of Components 79
4.4.3 Comparisons for GMM Constraint 82
4.4.4 Comparisons for Initial State Hypothesis 85
4.5 Comparisons for Different Systems with A Good Settings 89

Chapter 5 CONCLUSIONS AND FUTURE WORKS 91
REFERENCES 93

參考文獻 [1] Burkert, Peter, et al., “Dexpression: Deep convolutional neural network for expression recognition,” arXiv preprint arXiv:1509.05371, 2015.
[2] Yu, Zhiding, and Cha Zhang, “Image based static facial expression recognition with multiple deep network learning.,” Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, pp. 435-442, 2015.
[3] Chatzikyriakidis, Stergios, et al., “An overview of Natural Language Inference Data Collection: The way forward?.,” Proceedings of the Computing Natural Language Inference Workshop, 2017.
[4] Koolagudi, Shashidhar G., YV Srinivasa Murthy, and Siva P. Bhaskar, “Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition.,” International Journal of Speech Technology 21.1, vol. 21, no. 1, pp. 167-183, 2018.
[5] Darwin, Charles, and Phillip Prodger., The expression of the emotions in man and animals., USA: Oxford University Press, 1998.
[6] Russell, James A., “A circumplex model of affect.,” Journal of personality and social psychology 39.6, vol. 39, no. 6, p. 1161, 1980.
[7] Vayrynen, Eero., Emotion recognition from speech using prosodic features., Oulu: University of Oulu, 2014.
[8] Dellaert, Frank, Thomas Polzin, and Alex Waibel., “Recognizing emotion in speech.,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP'96., vol. 3, pp. 1970-1973, 1996.
[9] El Ayadi, Moataz, Mohamed S. Kamel, and Fakhri Karray., “Survey on speech emotion recognition: Features, classification schemes, and databases.,” Pattern Recognition, vol. 44, no. 3, pp. 572-587, 2011.
[10] Nwe, Tin Lay, Say Wei Foo, and Liyanage C. De Silva., “Speech emotion recognition using hidden Markov models.,” Speech communication, vol. 41, no. 4, pp. 603-623, 2003.
[11] Kishore, KV Krishna, and P. Krishna Satish., “Emotion recognition in speech using MFCC and wavelet features.,” in 2013 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, 2013.
[12] Nogueiras, Albino, et al., “Speech emotion recognition using hidden Markov models.,” in Seventh European Conference on Speech Communication and Technology., Aalborg, Denmark, 2001.
[13] Sato, Nobuo, and Yasunari Obuchi., “Emotion recognition using mel-frequency cepstral coefficients.,” Information and Media Technologies, vol. 2, no. 3, pp. 835-848, 2007.
[14] Bitouk, Dmitri, Ragini Verma, and Ani Nenkova., “Class-level spectral features for emotion recognition.,” Speech communication 52.7-8, vol. 52, no. 7-8, pp. 613-625, 2010.
[15] Bjorn Schuller, Gerhard Rigoll, and Manfred Lang, “Hidden Markov Model-based Speech Emotion Recognition,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., Hong Kong, China, 2003.
[16] Wagner, Johannes, Thurid Vogt, and Elisabeth Andre., “A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech.,” in International Conference on Affective Computing and Intelligent Interaction. Springer., Berlin, Heidelberg, 2007.
[17] Rabiner, Lawrence R. ,Biing-Hwang Juang, and Janet C. Rutledge., Fundamentals of speech recognition., Upper Saddle River, United States: Pearson Education (US), 1993.
[18] Yu, Dong, and Li Deng., AUTOMATIC SPEECH RECOGNITION., London: Springer, 2016.
[19] Kwon, Oh-Wook, et al., “Emotion recognition by speech signals.,” in Eighth European Conference on Speech Communication and Technology., Geneva, Switzerland, 2003.
[20] “Auris Medical Cochlear therapies - The Inner Ear.,” [Online]. Available: http://www.aurismedical.com/seiten_e/01_about.htm.
[21] “Cochlear Implant HELP - Electrodes and Channels.,” [Online]. Available: https://cochlearimplanthelp.com/journey/choosing-a-cochlear-implant/electrodes-and-channels/.
[22] Singh, Satyanand, and E. G. Rajan., “Vector quantization approach for speaker recognition using MFCC and inverted MFCC.,” International journal of computer applications, vol. 17, no. 1, pp. 0975-8887, 2011.
[23] Gupta, Maya R., and Yihua Chen., “Theory and use of the EM algorithm.,” Foundations and TrendsR in Signal Processing, vol. 4, no. 3, pp. 223-296, 2011.
[24] Rabiner, Lawrence R., and Biing-Hwang Juang., “An introduction to hidden Markov models.,” ieee assp magazine, vol. 3, no. 1, pp. 4-16, 1986.
[25] Rabiner, Lawrence R., “A tutorial on hidden Markov models and selected applications in speech recognition.,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[26] Juang, Bing-Hwang, Stephene Levinson, and M. Sondhi., “Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.).,” IEEE Transactions on Information Theory, vol. 32, no. 2, pp. 307-309, 1986.
[27] Liporace, L., “Maximum likelihood estimation for multivariate observations of Markov sources.,” IEEE Transactions on Information Theory, vol. 28, no. 5, pp. 729-734, 1982.
[28] “Surrey Audio-Visual Expressed Emotion (SAVEE) Database.,” [Online]. Available: http://kahlan.eps.surrey.ac.uk/savee/.
[29] Rabiner, Lawrence R., et al., “Some properties of continuous hidden Markov model representations.,” AT&T technical journal, vol. 64, no. 6, pp. 1251-1270, 1985.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-07-30起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-07-30起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw