||Sentence Attention-based Continuous Dialog State Tracking and Reinforcement Learning for Interview Coaching
||Institute of Computer Science and Information Engineering
本論文之研究主題為對話管理。在對話系統中，需要對話管理來決定對話流程，其中包含對話狀態追蹤和對話決策。傳統的對話狀態追蹤須由人工定義要追蹤的語意欄位，而本論文以主題機率分佈作為句子語意表示進行對話追蹤，而當對話回合由多個句子所組成時，其中可能包含不相關之句子，本論文結合卷積張量神經網路（CNTN）以及主題機率分佈，對多語句對話進行語句關注（Sentence Attention），給予每一句子重要度權重，再透過基於長短期記憶遞歸神經網路的自編碼器（LSTM-based Autoencoder）分別建立句子和對話回合之間的轉換及累積關係，藉此得到對話狀態，最後本論文為此面試流程設計獎勵函數（Reward Function）並利用強化學習中的Double Q-learning來建立觀察狀態和系統動作之間的關係。
Admission interviews are one of the most frequently used methods of student selection. Even though people know the importance of such interviews, very few people practice their interview skills effectively by seeking professional help. Many students thus lack interview experience, and are likely to be nervous during an interview. There are many ways that can improve students’ interview skills, one of which is to hire a professional interview coach. This is the most direct way to practice interview skills, but it is also rather expensive.
The main purpose of this thesis is thus to develop a dialog manager for an interview coaching system. In a dialog system, Dialog State Tracking (DST) and Dialog Policy (Policy) both are important tasks. The traditional approaches define the semantic slots manually for dialog state representation and tracking. This thesis adopts the topic profiles of the sentences as the representation of a dialog state. When the input sequence consists of several sentences, the summary vector is likely to contain noisy information from many irrelevant feature vectors. This thesis thus applies a sentence attention mechanism by combining the Convolutional Neural Tensor Network (CNTN) and Topic Profile for dialog state tracking. An LSTM-based autoencoder is used as dialog state tracker to model the transition and accumulation of dialog states. Finally, by applying Reinforcement Learning (RL) along with the designed reward functions, the agent learns its behavior from the interactions in an environment for making action decisions.
This study collected 260 interview dialogs containing 3,016 dialog turns. A five-fold cross validation scheme was employed for evaluation. The results show that the proposed method performed better than the semantic slot-based baseline method by comparing the statistical data on the number of normal taken actions, follow-up taken actions and accumulative reward by Dialog Policy in the collected corpus.
List of Tables VII
List of Figures VIII
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Literature Review 3
1.3.1 Interview Coaching System 3
1.3.2 Dialog State Tracking 4
1.3.3 Attention Mechanisms 5
1.3.4 Dialog Policy 6
1.4 Problems and Proposed Methods 9
1.5 Research Framework 12
Chapter 2 MHMC Interview Database 13
2.1 Data Collection 13
2.2 Corpus Introduction 13
Chapter 3 Proposed Methods 16
3.1 Establishment of Topic Model 17
3.2 Sentence Attention Mechanism 20
3.2.1 Convolutional Neural Tensor Network (CNTN) 21
3.2.2 Sentence Attention – CNTN 23
3.2.3 Sentence Attention – Topic Profile 24
3.3 Dialog State Tracking 25
3.3.1 Long Short-Term Memory 26
3.3.2 LSTM-based Autoencoder 28
3.3.3 Establishment and Training of The DST 29
3.4 Reinforcement Learning in Dialog Policy 31
3.4.1 Agent of Reinforcement Learning 32
3.4.2 Reward Function 36
3.4.3 Policy Model Training 38
Chapter 4 Experimental Results and Discussion 40
4.1 Relevance Classification Performance 40
4.2 Evaluation of the LSTM-based Autoencoder 42
4.3 Evaluation of System Performance 45
4.3.1 Comparison of Topic Profile and Semantic Slot 45
4.3.2 Evaluation of Sentence Representation with Attention Mechanism 48
4.3.3 Discussion 49
Chapter 5 Conclusion and Future Work 51
 J. Williams, A. Raux, and M. Henderson, "The dialog state tracking challenge series: A review," Dialogue & Discourse, vol. 7, no. 3, pp. 4-33, 2016.
 (2016). 今年起大學個人申請最高占7成. Available: http://www.chinatimes.com/realtimenews/20160222005034-260405
 Palladian. Available: http://www.palladiancr.com/
 H. Jones and N. Sabouret, "TARDIS-A simulation platform with an affective virtual recruiter for job interviews," IDGEI (Intelligent Digital Games for Empowerment and Inclusion), 2013.
 M. E. Hoque, M. Courgeon, J.-C. Martin, B. Mutlu, and R. W. Picard, "Mach: My automated conversation coach," in Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013, pp. 697-706: ACM.
 M. J. Smith et al., "Virtual reality job interview training in adults with autism spectrum disorder," Journal of Autism and Developmental Disorders, vol. 44, no. 10, pp. 2450-2463, 2014.
 M. Henderson, "Machine learning for dialog state tracking: A review," in Proc. of The First International Workshop on Machine Learning in Spoken Language Processing, 2015.
 V. Zue et al., "JUPlTER: a telephone-based conversational interface for weather information," IEEE Transactions on speech and audio processing, vol. 8, no. 1, pp. 85-96, 2000.
 S. Larsson and D. R. Traum, "Information state and dialogue management in the TRINDI dialogue move engine toolkit," Natural language engineering, vol. 6, no. 3&4, pp. 323-340, 2000.
 J. D. Williams, "Web-style ranking and SLU combination for dialog state tracking," in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 282-291.
 M. Henderson, B. Thomson, and S. J. Young, "Deep Neural Network Approach for the Dialog State Tracking Challenge," in SIGDIAL Conference, 2013, pp. 467-471.
 S. Lee, "Structured discriminative model for dialog state tracking," in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 442-451.
 H. Ren, W. Xu, Y. Zhang, and Y. Yan, "Dialog state tracking using conditional random fields," in Proceedings of SIGDIAL, 2013.
 M. Henderson, B. Thomson, and S. Young, "Word-based dialog state tracking with recurrent neural networks," in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 292-299.
 L. Zilka and F. Jurcicek, "Incremental LSTM-based dialog state tracker," in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, 2015, pp. 757-762: IEEE.
 O. Plátek, P. Bělohlávek, V. Hudeček, and F. Jurčíček, "Recurrent Neural Networks for Dialogue State Tracking," arXiv preprint arXiv:1606.08733, 2016.
 S.-s. Shen and H.-y. Lee, "Neural attention models for sequence classification: Analysis and application to key term extraction and dialogue act detection," arXiv preprint arXiv:1604.00077, 2016.
 D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
 K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention," in International Conference on Machine Learning, 2015, pp. 2048-2057.
 張俊林. (2016). 以Attention Model為例談談兩種研究創新模式. Available: http://blog.csdn.net/malefactor/article/details/50583474
 L. Shang, Z. Lu, and H. Li, "Neural responding machine for short-text conversation," arXiv preprint arXiv:1503.02364, 2015.
 M.-T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," arXiv preprint arXiv:1508.04025, 2015.
 W.-N. Hsu, Y. Zhang, and J. Glass, "Recurrent Neural Network Encoder with Attention for Community Question Answering," arXiv preprint arXiv:1603.07044, 2016.
 Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, and G. Hu, "Attention-over-attention neural networks for reading comprehension," arXiv preprint arXiv:1607.04423, 2016.
 C.-J. Lee, S.-K. Jung, K.-D. Kim, D.-H. Lee, and G. G.-B. Lee, "Recent approaches to dialog management for spoken dialog systems," Journal of Computing Science and Engineering, vol. 4, no. 1, pp. 1-22, 2010.
 M. F. McTear, "Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit," development, vol. 5, no. 7, 1998.
 E. Levin, R. Pieraccini, and W. Eckert, "Using Markov decision process for learning dialogue strategies," in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, vol. 1, pp. 201-204: IEEE.
 S. Young, M. Gašić, B. Thomson, and J. D. Williams, "Pomdp-based statistical spoken dialog systems: A review," Proceedings of the IEEE, vol. 101, no. 5, pp. 1160-1179, 2013.
 V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015.
 S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, "Continuous deep q-learning with model-based acceleration," arXiv preprint arXiv:1603.00748, 2016.
 Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, "Dueling network architectures for deep reinforcement learning," arXiv preprint arXiv:1511.06581, 2015.
 H. Van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-Learning," in AAAI, 2016, pp. 2094-2100.
 I. Szita and A. Lörincz, "Learning Tetris using the noisy cross-entropy method," Neural computation, vol. 18, no. 12, pp. 2936-2941, 2006.
 T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
 V. Mnih et al., "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937.
 H. Cuayáhuitl, "Simpleds: A simple deep reinforcement learning dialogue system," in Dialogues with Social Robots: Springer, 2017, pp. 109-118.
 D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
 Wikipedia contributors. Latent Dirichlet allocation. Available: https://en.wikipedia.org/w/index.php?title=Latent_Dirichlet_allocation&oldid=786924256
 yangliuy. (2012). 概率語言模型及其變形系列-LDA及Gibbs Sampling. Available: http://www.52nlp.cn/%E6%A6%82%E7%8E%87%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%8F%8A%E5%85%B6%E5%8F%98%E5%BD%A2%E7%B3%BB%E5%88%97-lda%E5%8F%8Agibbs-sampling
 X. Qiu and X. Huang, "Convolutional Neural Tensor Network Architecture for Community-Based Question Answering," in IJCAI, 2015, pp. 1305-1311.
 R. Socher, D. Chen, C. D. Manning, and A. Ng, "Reasoning with neural tensor networks for knowledge base completion," in Advances in neural information processing systems, 2013, pp. 926-934.