||Dialog Action Decision Using Deep Reinforcement Learning for Question Generation in an Interview Coaching System
||Institute of Computer Science and Information Engineering
Interview Coaching System
Deep Reinforcement Learning
Long-Short Term Memory
在對話動作(dialog Action)決策方面，本論文針對語料特性採取深度強化學習(Deep Reinforcement Learning)的學習方式。深度強化學習是由語料中標記的狀態(State)來決定動作(Action)。此外，本論文考量對話狀態(State)、動作(Action)及問題與答案的相關性來產生面試問題。針對面試問題之產生，本論文使用樹狀的決策方式，針對動作(Action)及還沒問過的欄位值(Slot)並利用相關性模型挑選出適合的問題樣板。
本論文之語料庫由12人錄製而成，共包含75個對話(540問答集)，並依照對話管理及面試問題產生，分別標記對話狀態(Dialog State)、動作(Action)及問答相關性(Relevance)。本論文為了觀察深度強化學習的學習效果，根據不同架構測試1000場模擬出的對話，評量平均的回合數及欄位值填答數，希望能在最少的回合數裡讓使用者能答出最多的欄位值。本論文所提出的架構在各種比較下都有較好的表現。在問題產生方面，相關性模型則是採用5-fold驗證方式，驗證出當長短期記憶模型(Long-Short Term Memory, LSTM)之隱藏層節點數為128個時效果最佳。綜觀Question Generation Model，本論文修改Traum提出的標準，並實際測試10場對話，評估本論文產生出的問題的適當性分數。並邀請5位受測者評估其自然性(Naturalness)與實用性(Utility)，其分數顯示出本論文所提方法已達到具鼓勵性及可接受之效果。
We often have the interview opportunity when we try to pursue a higher education or find a job. The best way to prepare for an interview is to review the different types of possible interview questions you will be asked during an interview and practice responding to the questions. An interview coaching system is designed to simulate an interviewer to provide mock interview practice simulation sessions. The previous interview coaching systems provided the information including facial preference, head nodding and shaking, response time, and volume etc. to let the users know their own performance in the simulated interview. However, most of these systems need a sufficient number of dialog data and generally only provide the pre-designed interview questions. In this thesis, we propose an approach to dialog action detection based on deep reinforcement learning for the interview coaching system.
In dialog action detection, deep reinforcement learning is adopted to learn the relation between dialog states and dialog actions based on the labeled dialog states and actions in a collected corpus. In interview question generation, a tree-based decision is used to choose a proper question template based on the obtained dialog state and the action.
For training and evaluation, twelve participants were invited to provide the interview corpus. In total, there were 75 dialog, consisting of 540 question-answer pairs, in the corpus. In order to evaluate the ability of deep reinforcement learning, we tested 1000 simulated dialogs. The average number of completed slots and average number of turns were used for evaluation. The assumption for this evaluation is the fewer the dialog turns needed to complete the interview the better the coaching system. Based on this criterion, the proposed method outperformed other approaches. In question generation, we validated the relevance model for choosing the best interview question from the decision tree by using 5-fold validation. We observe that the proposed system achieved good accuracy and relevance score when 128 nodes in hidden layers of the LSTM was used. For the evaluation of the whole interview system, we modified the evaluation criteria provided by Traum and a questionnaire is used based on 10 test dialogs. Five subjects were invited to score the Naturalness and Utility. According to these scores, our model achieved an encouraging and acceptable performance.
List of Tables IX
List of Figures X
Chapter 1. Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Literature Review 3
1.3.1 Interview Coaching System 3
1.3.2 Dialogue Management 4
1.3.3 Natural Language Generation 6
1.4 Problems 8
1.5 Proposed Idea 9
1.6 Thesis Architecture 10
Chapter 2. System Framework and Related Work 11
2.1 System Overview 11
2.2 Word Embedding 12
2.3 Dialog State Tracking 13
2.4 Action Decision 15
2.5 Question Generation 16
Chapter 3. Proposed Method 17
3.1 Corpus Collection and Annotation 18
3.1.1 Corpus Collection and Definition 18
3.1.2 Corpus Annotation 22
3.2 Action Decision Model 24
3.2.1 Deep Q-network 25
3.2.2 Training Action Decision Model 29
3.3 Question Generation Model 31
3.3.1 Action Decision and State Decision 32
3.3.2 Relevance Decision and Relevance model 33
Chapter 4. Experiment and Discussion 37
4.1 Evaluation and Comparison on Action Decision Model 38
4.1.1 Evaluation 38
4.1.2 Comparison 39
4.2 Evaluation on Relevance Model 41
4.3 Evaluation on Question Generation Model 43
Chapter 5. Conclusion and Future Work 47
5.1 Conclusion 47
5.2 Future Work 48
 M. E. Hoque, M. Courgeon, J.-C. Martin, B. Mutlu, and R. W. Picard, "Mach: My automated conversation coach," in Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013, pp. 697-706.
 H. Jones and N. Sabouret, "TARDIS-A simulation platform with an affective virtual recruiter for job interviews," IDGEI (Intelligent Digital Games for Empowerment and Inclusion), 2013.
 E. Levin, R. Pieraccini, and W. Eckert, "Using Markov decision process for learning dialogue strategies," in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, pp. 201-204.
 S. Young, M. Gašić, B. Thomson, and J. D. Williams, "Pomdp-based statistical spoken dialog systems: A review," Proceedings of the IEEE, vol. 101, pp. 1160-1179, 2013.
 H. Cuayáhuitl, "SimpleDS: A Simple Deep Reinforcement Learning Dialogue System," arXiv preprint arXiv:1601.04574, 2016.
 E. Ferreira and F. Lefevre, "Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards," Computer Speech & Language, vol. 34, pp. 256-274, 2015.
 S. P. Singh, M. J. Kearns, D. J. Litman, and M. A. Walker, "Reinforcement Learning for Spoken Dialogue Systems," in Nips, 1999, pp. 956-962.
 P.-H. Su, D. Vandyke, M. Gasic, D. Kim, N. Mrksic, T.-H. Wen, et al., "Learning from real users: Rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems," arXiv preprint arXiv:1508.03386, 2015.
 G. Tesauro, "Temporal difference learning and TD-Gammon," Communications of the ACM, vol. 38, pp. 58-68, 1995.
 C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, pp. 279-292, 1992.
 M. Riedmiller, "Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method," in European Conference on Machine Learning, 2005, pp. 317-328.
 V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
 M. Agarwal, R. Shah, and P. Mannem, "Automatic question generation using discourse cues," in Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications, 2011, pp. 1-9.
 W. Chen, "Aist, G., Mostow, J.: Generating questions automatically from informational text," in Proceedings of the 2nd Workshop on Question Generation, held at the Conference on AI in, Education, 2009, pp. 17-24.
 Y. C. S. A. Hasan, "Towards Automatic Topical Question Generation."
 S. S. Pradhan, W. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky, "Shallow Semantic Parsing using Support Vector Machines," in HLT-NAACL, 2004, pp. 233-240.
 L. Ratinov and D. Roth, "Design challenges and misconceptions in named entity recognition," in Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 2009, pp. 147-155.
 A. H. Oh and A. I. Rudnicky, "Stochastic language generation for spoken dialogue systems," in Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems-Volume 3, 2000, pp. 27-32.
 F. Mairesse and S. Young, "Stochastic language generation in dialogue using factored language models," Computational Linguistics, 2014.
 T.-H. Wen, M. Gasic, D. Kim, N. Mrksic, P.-H. Su, D. Vandyke, et al., "Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking," arXiv preprint arXiv:1508.01755, 2015.
 T.-H. Wen, M. Gasic, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, "Semantically conditioned lstm-based natural language generation for spoken dialogue systems," arXiv preprint arXiv:1508.01745, 2015.
 Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," journal of machine learning research, vol. 3, pp. 1137-1155, 2003.
 T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, "Recurrent neural network based language model," in Interspeech, 2010, p. 3.
 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
 L. Žilka and F. Jurčíček, "LecTrack: Incremental Dialog State Tracking with Long Short-Term Memory Networks," in International Conference on Text, Speech, and Dialogue, 2015, pp. 174-182.
 L.-J. Lin, "Reinforcement learning for robots using neural networks," DTIC Document1993.
 S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, pp. 1735-1780, 1997.
 D. R. Traum, S. Robinson, and J. Stephan, "Evaluation of Multi-party Virtual Reality Dialogue Interaction," in LREC, 2004.