進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1007201817263700
論文名稱(中文) 基於深度神經網路之情緒辨識系統及其於人形機器人之應用
論文名稱(英文) Deep Neural Network Based Emotion Recognition System for Humanoid Robot
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 106
學期 2
出版年 107
研究生(中文) 蔡定男
研究生(英文) Ting-Nan Tsai
學號 N26054142
學位類別 碩士
語文別 英文
論文頁數 66頁
口試委員 指導教授-李祖聖
口試委員-白能勝
口試委員-呂虹慶
口試委員-邱俊賢
口試委員-黃國勝
中文關鍵字 卷積神經網路  情緒辨識  長短期記憶神經網路  遷移學習 
英文關鍵字 Convolutional Neural Network  Emotion recognition  Long Short-Term Memory  Transfer Learning 
學科別分類
中文摘要 機器人在與人互動的過程中,如何辨識對方的情緒是一個非常重要的問題。本論文提出一個應用於人形機器人之情緒辨識系統,讓機器人透過網路攝影機截取互動對象的影像,辨識其情緒並給予適當的回應。本論文所提出的情緒辨識系統是基於深度神經網路的方式,學習辨識六種基本情緒,包含開心、難過、驚訝、恐懼、厭惡及生氣。整個系統架構共分為四個部分,首先將使用卷積神經網路做為特徵提取器,對大量單一圖片的人臉表情資料庫進行訓練,並做為提取圖形特徵的手段;接著是以長短期記憶神經網路訓練序列影像的數據庫,學習出影像隨時間的動態變化相對於六種情緒的關聯,也就是提取序列影像中時間的特徵;本論文所提出的神經網路架構將結合兩者的優勢,同時考量臉部表情的圖型特徵以及隨時間變化的特性,以學習辨識影像序列的人臉情緒辨識。更進一步藉由遷移式學習,使情緒辨識系統的性能有顯著的提升。最後,以留一法交叉驗證比較不同方法之間的辨識率以及在人形機器人上實現即時情緒辨識能力。
英文摘要 It is crucial for robots to recognize human emotions during the interaction between human and robot. Therefore, this thesis proposes an emotion recognition system for a humanoid robot. The robot is equipped with a camera in order to capture the image of the user's face and the goal is for the robot to respond appropriately according to the user's emotion which is recognized by our system. The emotion recognition system, based on a deep neural network, learns the six basic emotions including happiness, anger, disgust, fear, sadness and surprise. The whole structure of the system consists of four steps: the first step takes advantage of a convolutional neural network to extract visual features by learning on a great amount of static images; the second step utilizes a long short-term memory recurrent neural network to figure out the relationship between the transformation of facial expressions in image sequences and the six basic emotions; the third step combines the advantages of both CNN and LSTMs by integrating them into our model; the last step but not least improves the performance of the emotion recognition system by using transfer learning, which is a method to transfer the knowledge of related but different problems. Finally, the performance of the proposed system is verified by leave-one-out cross validation and is compared with other models. Then the proposed system is applied to the interaction between human and robot to demonstrate the practicability of this system.
論文目次 中文摘要 I
ABSTRACT II
ACKNOWLEDEMENTS III
CONTENTS IV
LIST OF FIGURES VI
LIST OF TABLES VIII
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Related Work 4
1.3 Thesis Organization 5
Chapter 2 Deep Neural Network and Transfer Learning 7
2.1 Introduction 7
2.2 Convolution Neural Network 8
2.2.1 Introduction to Convolutional Neural Network 8
2.2.2 Convolution Layer 11
2.2.3 Pooling Layer 13
2.2.4 Activation Layer 13
2.2.5 Fully connected and global average pooling layer 14
2.2.6 Softmax 15
2.2.7 Residual block 16
2.3 Long Short-Term Memory Networks (LSTMs) 18
2.3.1 Introduction to Traditional Recurrent Neural Network 18
2.3.2 Long Short-Term Memory 19
2.4 Transfer Learning 22
2.4.1 Introduction to Transfer Learning 22
2.4.2 Categories of Transfer Learning 23
2.4.3 Inductive Transfer Learning 23
2.4.4 Layer transferring and layer sharing 24
Chapter 3 The Proposed Models by Combining CNN and LSTMs 26
3.1 Introduction 26
3.2 CNN Model 27
3.3 The Proposed Models by Combining CNN and LSTMs 30
3.3.1. The LSTM network 30
3.3.2. CNN feature extractor 31
3.3.3. Combination of CNN and LSTM 32
3.4 Transferring parameters of the CNN 33
3.5 Enhanced model 34
Chapter 4 Simulations and Experimental Results 36
4.1 Introduction 36
4.2 Simulations 37
4.2.1. Databases 37
4.2.2. Data preprocessing 39
4.2.3. Simulation platform 39
4.2.4. Leave-one-out cross-validation 40
4.3 Experimental Setup 50
4.3.1. Robot Harley 50
4.3.2. Camera 51
4.3.3. Computer 52
4.4 Experiment I 55
4.5 Experiment II 56
4.5.1. Experimental environment 56
4.5.2. Scenario 57
4.6 Summary 59
Chapter 5 Conclusion and Future Works 61
5.1. Conclusion 61
5.2. Future works 62
References 64
參考文獻 [1] K. Dautenhahn, “Methodology and themes of human-robot interaction: a growing research Field,” International Journal of Advanced Robotic Systems, vol. 4, no. 1, pp. 15, 2007.
[2] L. Parker, F. E. Schneider, and A. C. Schultz, Multi-robot systems: from swarms to intelligent automata, Dordrecht: Springer, 2005.
[3] K. R. Scherer, “What are emotions? and how can they be measured?,” Social Science Information, vol. 44, no. 4, pp. 695-729, Dec. 2005.
[4] V. Mayya, R. M. Pai, and M. M. Manohara Pai, “Automatic Facial Expression Recognition using DCNN”, Proce. Comp. Sci, vol. 93, pp. 453-461, 2016.
[5] K. Zhang, Y. Huang, and Y. Du, L. Wang, “Facial expression recognition based on deep evolutional spatial-temporal networks”, IEEE Trans. Image Process., vol. 26, no. 9, pp. 4193-4203, Mar. 2017.
[6] Y. Byeon and K. Kwak, “Facial expression recognition using 3D convolutional neural network,” International Journal of Advanced Computer Science and Applications, vol. 5, no. 12, 2014.
[7] W. Zhang, Y. Zhang, L. Ma, J. Guan, and S. Gong, “Multimodal learning for facial expression recognition”, Pattern Recognit., vol. 48, no. 10, pp. 3191-3202, 2015.
[8] X. Fan and T. Tjahjadi, “A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences,” Pattern Recognit., vol. 48, no. 11, pp. 3407-3416, 2015.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Annual Conference on Neural Information Processing Systems, 2012, pp. 1097-1105.
[10] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[11] M. Sundermeyer, H. Ney, and R. Schlüter, “From feedforward to recurrent LSTM neural networks for language modeling,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 3, pp. 517-529, 2015.
[12] S. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
[13] A. T. Lopes, E. De Aguiar, A. F. De Souza, and T. Oliveira-Santos, “Facial expression recognition with convolutional neural networks: Coping with few data and the training sample order,” Pattern Recognit., vol. 61, pp. 610-628, Jan. 2017.
[14] “Six basic emotions,” Managementmania.com, 2018. [Online]. Available: https://managementmania.com/en/six-basic-emotions.
[15] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[16] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014.
[17] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift, ” arXiv:1502.03167, 2015.
[18] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” Proc. Conf. Artificial Intelligence and Statistics, 2011.
[19] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3431-3440, 2015.
[20] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2018.
[21] Z. Yu and C. Zhang, “Image based static facial expression recognition with multiple deep network learning,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 435–442, ACM, 2015.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Proc. Int. Conf. Learn. Representations, 2015.
[23] A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 6645–6649, May 2013.
[24] M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” in Proc. Interspeech, pp. 194-197, 2012.
[25] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, 2016.
[26] N. Jean, M. Burke, M. Xie, W. Davis, D. Lobell, and S. Ermon, “Combining satellite imagery and machine learning to predict poverty,” Science, vol. 353, no. 6301, pp. 790-794, 2016.
[27] D. Hubel and T. Wiesel, “Receptive fields and functional architecture of monkey striate cortex,” The Journal of Physiology, vol. 195, no. 1, pp. 215-243, 1968.
[28] R. Hahnloser, R. Sarpeshkar, M. Mahowald, R. Douglas, and H. Seung, “Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947-951, Jun. 2000.
[29] M. Lin, Q. Chen, and S. Yan, “Network in network. ,” arXiv:1312.4400, 2013.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, ” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) 2016
[31] F. A. Gers, J. Schmidhuber, F. Cummins, “Learning to forget: Continual prediction with LSTM,” Proc. 9th Int. Conf. Artif. Neural Netw. (ICANN), vol. 2, pp. 850-855, Sep. 1999.
[32] S. Ruder, “An overview of multi-task learning in deep neural networks, ” 2017.
[33] A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, 2017.
[34] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (ck + ): A complete dataset for action unit and emotion-specified expression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 94–101, Jun. 2010.
[35] D. Kingma and J. Ba. “Adam: A method for stochastic optimization,” ICLR, 2015.
[36] Webcamera, Logitech C920[Online]. Available:
http://www.logitech.com/zh-tw/product/hd-pro-webcam-c920?crid=34
[37] Industrial computer, PICO880[Online]. Available:
http://www.axiomtek.com.tw/
[38] P. Viola and M. J. Jones, “Robust real-time object detection,” In IEEE ICCV Workshop on Statistical and Computational Thesis of Vision, 2001.
[39] G. Bradski. The opencv library. Dr. Dobb's Journal of Software Tools, 2000.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2023-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2023-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw