進階搜尋


 
系統識別號 U0026-2408201614023700
論文名稱(中文) 應用降噪自動編碼器及長短期記憶模型於真實心情之偵測
論文名稱(英文) Real Mood Detection Using Denoising Autoencoder and LSTM
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 104
學期 2
出版年 105
研究生(中文) 傅翔祺
研究生(英文) Hsiang-Chi Fu
學號 P76031276
學位類別 碩士
語文別 英文
論文頁數 52頁
口試委員 指導教授-吳宗憲
口試委員-王駿發
口試委員-戴顯權
口試委員-楊家輝
口試委員-王家慶
中文關鍵字 語音情緒辨識  長期情緒追蹤  心情偵測  降噪自動編碼器  長短期神經網路 
英文關鍵字 speech emotion recognition  long-term emotion tracking  mood detection  denoising autoencoder  long short term memory 
學科別分類
中文摘要 在快速變遷的社會環境裡,要處理的情緒課題也愈來愈複雜。人類有時甚至不知道自己已經有了負面情緒的產生,使得負面的情緒逐漸累積成為一種心理疾病。因此建立一個客觀的情緒追蹤系統,以幫助使用者偵測心情以達到更好的情緒管理是一值得研究之課題。
在普遍認知上,雖然他人感知的情緒與使用者表達的情緒是相近的,但有研究指出感知的情緒與表達的情緒之間事實上存在著差異,且根據不同的人格特質也會有不同的表達情緒。因此本論文蒐集具使用者人格特質及自我標記的情緒語料庫來建立情緒轉換模型,以得到更貼近使用者表達的真實情緒。考量心情是長期的情緒累積而成,因此本論文建立另一個長期的情緒語料庫,藉由收集長時間的情緒語料和使用者當天的心情標記,來建立情緒與心情之間的結構關係。
考量在情緒表現上,人可能同時存在著多種情緒的表現,因此本論文以支持向量機分類(SVM)為基底建立情緒剖面預測模型,可以表示各種情緒之分布。接著,使用具自我標記的語料庫來考量在情緒中他人感知與自我表達的差異來建立高斯分布,藉此分布產生雜訊以建立更多的輸入資料,並將自我表達的情緒設定為目標資料,以訓練降噪自動編碼器(Denosing Autoencoder)進行情緒轉換模型之建立。最後考量情緒與時間上的關係,藉由具有記憶單元的長短期記憶模型(LSTM)以建構出情緒的歷史軌跡,藉此來達到心情偵測之目的
本論文實驗共錄製10位參與者的長期情緒語料,其中心情標記為正向的104筆,負向的96筆,採用leave-one-speaker-out的方式進行評估。實驗結果顯示,本論文所提出具人格特質的真實表達情緒追蹤的心情偵測模型可達64.5%,相較於以HMM的情緒為追蹤之結果提升約5%。未來希望可以考量對話文字內容及社群文章之追蹤以達到更好的心情偵測系統。
英文摘要 In a rapidly changing social environment, emotions are more and more difficult to handle for human beings. Sometimes people do not even know that they have negative emotions. As a result, the accumulation of negative emotions become a mental illness. Thus, it is essential to develop an emotion tracking system to help users manage their emotions. In current study, an extended subjective self-report method is generally used for measuring emotions.
Even though it is commonly accepted that the emotion perceived by the listener is close to the intended emotion conveyed by the speaker, several research indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different expressed emotions. Based on this investigation, this thesis proposes an emotion conversion model which characterizes the relationship between the perceived emotion and the expressed emotion of the user for a specific personality. Emotion conversion from perceived to expressed emotions is applied based on the personality traits of the user. This thesis considers mood swing as a long-term accumulation of emotions. A database containing user’s long-term speech data and mood annotation is collected. This database is used for constructing the temporal relationships between emotion and mood.
In order to reflect the real mood from people, an SVM-based emotion model is developed to generate multiple probabilistic class labels. Moreover, a Gaussian distribution is built to generate noisy data since there is a difference between expressed and perceived emotions. The input is the expressed emotion value contaminated by the generated noise and the target is the expressed emotion for denoising autoencoder (DAE) training. Finally, for modeling the temporal fluctuation of emotions, a long short-term memory (LSTM)-based mood model is constructed for mood detection.
In mood detection experiments, the mood database was provided by 10 participants. There were 104 positive moods and 96 negative moods. Leave-one-speaker-out cross validation was employed for evaluation. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, which improves 5%, comparing to the HMM-based method. In the future, the tracking of the dialog content and blog of the users can be applied to obtain a better performance.
論文目次 中文摘要 I
Abstract III
誌謝 V
Table of Contents VI
List of Tables IX
List of Figures X
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Background 2
1.3 Literature Review 3
1.3.1 Emotional Speech Databases 3
1.3.2 Emotion Perception and Emotion Expression 3
1.3.3 Emotion and Personality Trait 5
1.3.4 Long-term Tracking 7
1.4 Problem and Goal 8
1.5 The Organization of this Thesis 9
Chapter 2 Emotional Database Design and Collection 10
2.1 Emotion with Personality Database (EP-DB) 10
2.1.1 Data Collection 11
2.1.2 Emotional Video Selection 14
2.1.3 Environment 15
2.1.4 Data Annotation 16
2.2 Long-Term Emotion Database (LT-DB) 17
2.2.1 Data Collection 18
2.2.2 Environment 19
2.2.3 Data Annotation 20
2.3 MHMC Emotion Database 21
Chapter 3 Proposed Method 22
3.1 Speech Preprocessing 23
3.2 Emotion Profile Prediction 25
3.3 Emotion Conversion with Personality 26
3.3.1 Training Data Construction 26
3.3.2 Denoising Autoencoder 29
3.4 Long-term tracking and Mood Detection 33
Chapter 4 Experimental Results and Discussion 38
4.1 Database Analysis 38
4.2 System Performance 40
4.2.1 Emotion Profile Prediction 40
4.2.2 Emotion Conversion 41
4.2.3 Mood Detection 45
4.3 Performance Comparison 47
Chapter 5 Conclusions and Future Work 49
Reference 50
參考文獻 1] M. Reddy, “Depression: the disorder and the burden,” Indian journal of psychological medicine, vol. 32, no. 1, pp. 1, 2010.
[2] “Google Ventures Investments 2015 Year in Review,” 2015.
[3] S. P. Robbins, Organizational behavior, 14/E: Pearson Education India, 2001.
[4] V. Petrushin, "Emotion in speech: Recognition and application to call centers."
[5] E. Douglas-Cowie, R. Cowie, and M. Schröder, "A new emotion database: considerations, sources and scope."
[6] N. Amir, S. Ron, and N. Laor, "Analysis of an emotional speech corpus in Hebrew based on objective criteria."
[7] F. Yu, E. Chang, Y.-Q. Xu, and H.-Y. Shum, "Emotion detection from speech to enrich multimedia content." pp. 550-557.
[8] F. Schiel, S. Steininger, and U. Türk, "The SmartKom Multimodal Corpus at BAS."
[9] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, "A database of German emotional speech." pp. 1517-1520.
[10] C. Busso, and S. S. Narayanan, "The expression and perception of emotions: comparing assessments of self versus others." pp. 257-260.
[11] K. P. Truong, M. A. Neerincx, and D. A. Van Leeuwen, "Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data." pp. 318-321.
[12] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, pp. 335-359, 2008.
[13] R. R. McCrae, and O. P. John, “An introduction to the five‐factor model and its applications,” Journal of personality, vol. 60, no. 2, pp. 175-215, 1992.
[14] S. Kshirsagar, "A multilayer personality model." pp. 107-115.
[15] Personality-Central, “Extroversion-Introversion preferences.”
[16] A. P. Association, Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub, 2013.
[17] E.-H. Jang, B.-J. Park, S.-H. Kim, and J.-H. Sohn, "Emotion classification based on physiological signals induced by negative emotions: Discriminantion of negative emotions by machine learning algorithm." pp. 283-288.
[18] A. Gaggioli, P. Cipresso, S. Serino, and G. Riva, “Psychophysiological correlates of flow during daily activities,” Stud. Health Technol. Inform, vol. 191, pp. 65-69, 2013.
[19] E. Mostafa, A. Farag, A. Shalaby, A. Ali, T. Gault, and A. Mahmoud, "Long term facial parts tracking in thermal imaging for uncooperative emotion recognition." pp. 1-6.
[20] L. Zhong, Y. Li, X. Wei, G. Li, Z. Wang, and Y. Jiang, "System design for monitoring infant speech emotion." pp. 952-955.
[21] K.-Y. Lam, J. Wang, J. K.-Y. Ng, S. Han, L. Zheng, C. H. C. Kam, and C. J. Zhu, “SmartMood: Toward Pervasive Mood Tracking and Analysis for Manic Episode Detection,” IEEE Transactions on Human-Machine Systems, vol. 45, no. 1, pp. 126-131, 2015.
[22] R. F. Dickerson, E. I. Gorlin, and J. A. Stankovic, "Empath: a continuous remote emotional health monitoring system for depressive illness." p. 5.
[23] K.-h. Chang, D. Fisher, and J. Canny, “Ammon: A speech analysis library for analyzing affect, stress, and mental health on mobile phones,” Proceedings of PhoneSense, vol. 2011, 2011.
[24] R. Wiseman, 59 Seconds: Motivation: Think A Little, Change A Lot: Pan Macmillan, 2011.
[25] T. Giannakopoulos, “A method for silence removal and segmentation of speech signals, implemented in Matlab,” University of Athens, Athens, vol. 2, 2009.
[26] F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor." pp. 1459-1462.
[27] E. Mower, and S. Narayanan, "A hierarchical static-dynamic framework for emotion classification." pp. 2372-2375.
[28] C. Cortes, and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
[29] E. Mower, M. J. Mataric, and S. Narayanan, “A framework for automatic human emotion classification using emotion profiles,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1057-1070, 2011.
[30] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders." pp. 1096-1103.
[31] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[32] M. Wöllmer, A. Metallinou, N. Katsamanis, B. Schuller, and S. Narayanan, "Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions." pp. 4157-4160.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw