進階搜尋


 
系統識別號 U0026-0812200912080844
論文名稱(中文) 用於語音情緒辨識的混和類神經網路模型之發展
論文名稱(英文) Development of a Hybrid Neural Network Model for Emotion Recognition from Speech
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 94
學期 2
出版年 95
研究生(中文) 廖惇利
研究生(英文) Duan-Li Liao
學號 P7693433
學位類別 碩士
語文別 英文
論文頁數 79頁
口試委員 口試委員-李健興
指導教授-郭耀煌
指導教授-郭淑美
口試委員-曾新穆
口試委員-洪盟峰
中文關鍵字 遺傳演算法  混合模型  情緒辨識  類神經網路 
英文關鍵字 Emotion Recognition  Artificial Neural Network  Genetic Algorithms  Hybrid Model 
學科別分類
中文摘要 情緒辨識在人機介面上的應用是相當廣泛的。能夠正確的辨識情緒一直是我們研究的目的。在本篇論文中,我們採用混合模型的方法,提出一個新穎的混合模型來提升辨識率。在我們的方法中,為了提升辨識率我們將各個單一模型的資訊正規化,並且透過混合各個單一模型的資訊來取得一個互補的輸出。最後,依照此混合模型的輸出我們能夠得到更準確的辨識率。在建構單一模型的部分,我們使用類神經網路來建構單一辨識模型。為了解決離散型問題與加快模型訓練的過程,我們採用實數型遺傳演算法來當作模型的學習演算法。對於語音情緒辨識,我們採用德國情緒語音資料庫與丹麥情緒語音資料庫當作我們驗證模型的資料庫,分別辨識七個與五個情緒分類。我們選擇音高、能量、音速、梅爾倒頻譜係數與性別當作特徵向量。由實驗可知,對於相同性質資訊所建構的單一模型採用我們所提出的架構皆能夠達到較好的效能。

英文摘要 The application of the emotion recognition in the Human-Computer Interaction (HCI) is quite popular. To accurately recognize emotion is the goal of our investigation. In this thesis, we adopt the approach of model combination and purpose a novel hybrid model to improve the recognition rate. In our approach, we normalize the information of each single model and get a complementary result by combining those single models for improving the recognition rate. Finally, we obtain a more accurate recognition rate by the output of the hybrid model. We use artificial neural network to construct a single model. For solving the discrete type problems and speeding up the learning process, we adopt the real-coded genetic algorithms as the learning algorithm of the hybrid model. In the emotion recognition from speech, we use the Berlin database of emotional speech and Danish Emotional Speech database (DES) as the database of hybrid model, and we classify seven and five emotions respectively. We select the pitch, the energy, the speed of speech, the Mel-Frequency Cepstrum Coefficients (MFCCs), and the gender as the feature vector. As the experimental result shown, we achieve a better performance by combining several single models for the same resource.

論文目次 CHAPTER 1 INTRODUCTION 1
CHAPTER 2 LITERATURE 4
2.1 EMOTIONAL SPEECH CORPUS 4
2.2 EMOTIONAL RECOGNITION APPROACHES 5
2.3 RESEARCHES ON EMOTIONAL FEATURE ANALYSIS 7
CHAPTER 3 SPEECH SIGNAL PROCESS 11
3.1 PREPROCESSING 11
3.2 FEATURE EXTRACTION 12
 3.2.1 Prosodic feature 12
  3.2.1.1 Pitch 12
  3.2.1.2 Energy 15
  3.2.1.3 The Speed of Speech 16
 3.2.2 Frequency feature 17
  3.2.2.1 Mel Frequency Cepstrum Coefficients 17
CHAPTER 4 METHODOLOGY 22
4.1 NEURAL NETWORK – BUILDING A SINGLE MODEL 22
 4.1.1 Back-propagation Network (BPN) 24
 4.1.2 Extended-Neuron Network (ENN) 27
 4.1.3 Fuzzy-Neuron Network (FNN) 29
4.2 ARCHITECTURE OF HYBRID MODEL 31
 4.2.1 Authority layer 32
 4.2.2 Summarization layer 36
4.3 LEARNING ALGORITHM OF HYBRID MODEL – GENETIC ALGORITHMS 38
CHAPTER 5 EXPERIMENT 46
5.1 INTRODUCTION OF THE DATABASE 46
5.2 FEATURE EXTRACTION 49
5.3 TRAINING SINGLE MODEL 51
5.4 HYBRID THE MULTIPLE MODELS 59
5.5 SUMMARY 63
CHAPTER 6 CONCLUSION AND FUTURE WORK 65
6.1 CONCLUSION 65
6.2 FUTURE WORK 66
REFERENCE 68
APPENDIX A 71

參考文獻 Reference
[1] Aishah, A.R., Komiya, R., “A Preliminary Study of Emotion Extraction from Voice,” National Conference on Computer Graphics and Multimedia, Malacca, 2002.

[2] Aishah Abdul Razak, Mohamad Izani Zainal Abidin, and Ryoichi Komiya, “A preliminary speech analysis for recognizing emotion,” Student Conference on Research and Development. pp 49-54, 2003.

[3] Aishah Abdul Razak, Ryoichi Komiya, and Mohamad Izani Zainal Abidin, “Comparison Between Fuzzy and NN Method for Speech Emotion Recognition,” Proceedings of the Third International Conference on Information Technology and Applications, vol. 1, pp. 297-302, 2005.

[4] Bishop, C.M., ‘Neural Networks for Pattern Recognition’, Oxford, England: Clarendon Press, 1995.

[5] Bou-Ghazale, S. E., Hansen, J. H. L., “A Comparative Study of Traditional and Newly Proposed Features for Recognition of Speech Under Stress,” IEEE Trans. Speech & Audio Proc., vol. 8, pp. 429-442, July 2000.

[6] Dimitrios Ververidis,Constantine Kotropoulos, Ioannis Pitas, “Automatic emotional speech classification,”, pp. 593-596, 2004.


[7] Dimitrios Ververidis and Constantine Kotropoulos, “Emotional speech classification using Gaussian mixture models and the Sequential Floating Forward Selection Algorithm,” IEEE International Conference on Multimedia & Expo , Amsterdam, The Netherlands, July 2005.

[8] Donn Morrison, Ruili Wang, “Real-Time Spoken Affect Classification and Its Application in Call-Centres,” Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) vol. 2, pp. 483-487, 2005.

[9] F. Dellaert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, vol. 3, pp. 1970-1973, Philadelphia, PA, USA, 1996.
[10] F. Yu, E. Chang, Y.Q. Xu, and H.Y. Shum, “Emotion Detection from Speech to Enrich Multimedia Content,” IEEE Pacific Rim Conference on Multimedia, Bejing, China, 2001.

[11] Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter Sendlmeier and Benjamin Weiss “A Database of German Emotional Speech,” Proceedings Interspeech 2005.

[12] Hsiao-Chuan Wang, “Speech Signal Processing (in Chinese),” Chuan Hwa Science & Technology Book Co. Taiwan, 2004.

[13] I. S. Engberg, and A. V. Hansen, “Documentation of the Danish Emotional Speech Database (DES),” Internal AAU report, Center for Person Kommunikation, Denmark, 1996.

[14] I-Cheng Yeh, “Modeling Chaotic Two-Dimensional Mapping with Fuzzy-Neuron Networks,” Fuzzy Sets and Systems, vol. 105, no. 3, pp. 421-427, 1999.

[15] I-Cheng Yeh, “Classification Using Extended-Neuron Networks,” Journals of Computers, vol. 14, no. 2, pp. 16-22, 2002.

[16] J. H. Holland, “Adaption in Natural and Artificial Systems,” MIT Press, Ann Arbor, 1975.

[17] J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural Computing and Applications, vol. 9, pp. 290-296, 2000.

[18] J.F. Kaiser, “Discrete-Time Speech Signal Processing,” Prentic Hall PTR, 2002.

[19] R. Nakatsu, A. Solomides, and N. Tosa, “Emotion recognition and its application to computer agents with spontaneous interactive capabilities," in Proc. IEEE Int. Conf. Multimedia Computing and Systems, vol. 2, pp. 804-808, Florence, Italy, July 1999.

[20] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing magazine, vol. 18, no. 1, pp. 32–80, Jan. 2001.

[21] S. Morishima and H. Harashima, “A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface,” IEEE Jou. Selected Areas in Communications, vol. 9, no. 4, pp. 594-600, May 1991.

[22] S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, “Bimodal Fusion of Emotional Data in an Automotive Environment,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

[23] Thao Nguyen, Iris Bass, Mingkun Li, and Ishwar K. Sethi, “Investigation of Combining SVM and Decision Tree for Emotion Classification,” Proceedings of the Seventh IEEE International Symposium on Multimedia, 2005.

[24] YI-LIN LIN, GANG WEI “Speech Emotion Recognition Based On HMM And SVM,” Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, pp18-21 August 2005.

[25] Yongjin Wang, Ling Guan, “Recognizing Human Emotion from Audiovisual Information,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

[26] Zbigniew Michalewicz, “Genetic Algorithms + Data Structures = Evolution Programs,” third ed., Springer, Berlin, 1996.

[27] 黃冠傑, “類神經網路視窗軟體之設計,” 碩士論文,中華大學土木工程學系, 民國92年。
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2009-08-22起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2009-08-22起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw