||Dialogue State Tracking and Response Relevance Scoring Using Two-Level LSTM for Interview Coaching
||Institute of Computer Science and Information Engineering
long short-term memory(LSTM)
Dialogue State Tracking
過去也有不少學者提出練習面試或社交技巧的系統，但是他們所提供給使用者的回饋大部分都是非語言行為的回饋，而且他們所提問的問題是比較制式化，並不像人與人之間的對話。所以本論文藉由對話狀態追蹤(dialog state tracking, DST)推斷出使用者回答的語意，再根據使用者的語意產生問句，以讓面試系統更貼近真實面試。因此本論文提出對話狀態追蹤的方法，先將使用者的回答進行詞向量特徵轉換，然後再藉由長短期記憶神經網路(long short-term memory, LSTM)萃取面試者回應中每一句子的語意向量，及對話回合(dialogue turn)回應的語意向量，最後經由類神經網路分類器(artificial neural network, ANN)決定該對話回合之對話狀態中每一欄位值(slot)，用以決定最終對話狀態。此外為了讓使用者知道自己有沒有確切回答問題，整個面試結束後，本系統藉由LSTM找出問題和答案的關係，並提供面試者回應結果與面試問題之相關性分數。
College graduates often have the opportunities to participate in the interview when they try to pursue further studies or find a job. But they do not have many opportunities to practice interview during school. In order to increase the opportunities for the students to practice interview skills, this thesis constructs a system that can allow students to practice interview flexibly and timely.
In the past, several coaching systems have been constructed to help users practice their interview skills or social skills. They mostly provide non-verbal behavior feedback to the users and the questions they provide are generally fixed. Because fixed questions are unlike the dialogues between people, this thesis presents an idea to detect the semantic of users’ answer by dialogue state tracking (DST) and generate questions to let coaching system closer to real interview. This thesis presents an approach to dialogue state tracking (DST) in an interview conversation by using the long short-term memory (LSTM) and artificial neural network (ANN) to construct the interview coaching system. First, the techniques of word embedding are employed for word distributed representation. Then, each input sentence is represented by a sentence hidden vector using the LSTM-based sentence model. The sentence hidden vectors for each sentence are fed to the LSTM-based answer model to map the interviewee’s answer to an answer hidden vector. For dialogue state detection, the answer hidden vector is finally used to detect the dialogue state using an ANN-based dialogue state detection model. The ANN-based dialogue state detection model will output the result of each slot to determine the final dialogue state. Furthermore, after user completes the mock interview, the coaching system will provide a relevance score. This thesis presents an approach to detect the relationship between questions and answers by using the long short-term memory (LSTM) and finally gives user a relevance score.
For evaluation, twelve participants were invited to provide the interview corpus. In total, there were 75 dialogues, consisting of 540 question-answer pairs, in the corpus. Five-fold cross validation was adopted to evaluate the system performance. We used F-measure to evaluate the performance. The F-measure of our method was 0.298 which achieved the highest performance compared to the baseline system. The baseline was constructed based on the one-level LSTM and the feature vector for word representation was based on one-hot encoding and the F-measure of the baseline achieved only 0.1671. We used accuracy to evaluate the performance of the relevance score and the result was 84.2%.
List of Tables X
List of Figures XII
Chapter 1. Introduction 1
1.1 Motivation 2
1.2 Literature Review 3
1.2.1 Social Skill Training and Interview Coaching System 3
1.2.2 Relevance Score 6
1.2.3 Dialogue State Tracking 8
1.2.4 Interview Nonverbal Behavior Performances 10
1.3 Problem and Goal 11
1.4 Research Framework 12
Chapter 2. Interview Dialogue Corpus 14
2.1 Dialogue State Tracking 14
2.2 Relevance Score 18
Chapter 3. Proposed Method 21
3.1 Real-Time Feedback 22
3.1.1 Smiling 23
3.1.2 Nodding and Eye Contact 24
3.1.3 Volume 25
3.2 Summary Feedback-Answer Relevance 25
3.2.1 Chinese Word Segmentation 26
3.2.2 Word Embedding 27
3.2.3 Long-Short Term Memory 30
3.3 Dialogue State Tracking 31
3.3.1 LSTM-Based Sentence Model 32
3.3.2 LSTM-Based Answer Model 35
3.3.3 ANN 37
3.4 System Interface 37
Chapter 4. Experimental Results 38
4.1 Relevance Score 38
4.1.1 The Accuracy of The Relevant 38
4.1.2 Relevance Score 39
4.2 Dialogue State Tracking 40
4.2.1 Corpus 40
4.2.2 Baseline 42
4.2.3 Performance Evaluation 43
4.2.4 Discussion 50
Chapter 5. Conclusions and Future Work 52
5.1 Conclusions 52
5.2 Future Work 52
 "面試10大禁忌:咬指甲、撥弄頭髮大扣分," http://www.ettoday.net/news/20120822/90962.htm.
 "面試禁忌," http://www.yes123.com.tw/admin/knowhow/knowhow_b2_2.asp.
 B. Schuller, E. Marchi, S. Baron-Cohen, H. O’Reilly, P. Robinson, I. Davies, et al., "Asc-inclusion: Interactive emotion games for social inclusion of children with autism spectrum conditions," in Proceedings 1st International Workshop on Intelligent Digital Games for Empowerment and Inclusion (IDGEI 2013) held in conjunction with the 8th Foundations of Digital Games, 2013.
 M. R. Ali, "Automated conversation skills assistant," in Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on, 2015, pp. 760-765.
 H. Tanaka, S. Sakti, G. Neubig, T. Toda, H. Negoro, H. Iwasaka, et al., "Automated social skills trainer," in Proceedings of the 20th International Conference on Intelligent User Interfaces, 2015, pp. 17-27.
 L. Batrinca, G. Stratou, A. Shapiro, L.-P. Morency, and S. Scherer, "Cicero-towards a multimodal virtual audience platform for public speaking training," in International Workshop on Intelligent Virtual Agents, 2013, pp. 116-128.
 財團法人中華民國自閉症基金會, "何謂自閉症？," http://www.fact.org.tw/menu.php?m_id=18.
 K. Anderson, E. André, T. Baur, S. Bernardini, M. Chollet, E. Chryssafidou, et al., "The TARDIS framework: intelligent virtual agents for social coaching in job interviews," in Advances in computer entertainment, ed: Springer, 2013, pp. 476-491.
 M. E. Hoque, M. Courgeon, J.-C. Martin, B. Mutlu, and R. W. Picard, "Mach: My automated conversation coach," in Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013, pp. 697-706.
 G. Salton, A. Wong, and C.-S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975.
 M. Bendersky and W. B. Croft, "Discovering key concepts in verbose queries," in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 491-498.
 M. Bendersky, D. Metzler, and W. B. Croft, "Learning concept importance using a weighted dependence model," in Proceedings of the third ACM international conference on Web search and data mining, 2010, pp. 31-40.
 G. A. Miller, "WordNet: a lexical database for English," Communications of the ACM, vol. 38, pp. 39-41, 1995.
 K.-J. Chen, "Lexical Semantic Representation and Semantic Composition An Introduction to E-HowNet," 2009.
 M. Tan, B. Xiang, and B. Zhou, "LSTM-based Deep Learning Models for non-factoid answer selection," arXiv preprint arXiv:1511.04108, 2015.
 M. Feng, B. Xiang, M. R. Glass, L. Wang, and B. Zhou, "Applying deep learning to answer selection: A study and an open task," in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015, pp. 813-820.
 Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, 2015.
 J. D. Williams, "Web-style ranking and SLU combination for dialog state tracking," in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 282-291.
 J. Williams, A. Raux, D. Ramachandran, and A. Black, "The dialog state tracking challenge," in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 404-413.
 K. Sun, L. Chen, S. Zhu, and K. Yu, "A generalized rule based tracker for dialogue state tracking," in Spoken Language Technology Workshop (SLT), 2014 IEEE, 2014, pp. 330-335.
 R. Kadlec, M. Vodolán, J. ich Libovický, J. Macek, and J. Kleindienst, "Knowledge-based dialog state tracking," in Spoken Language Technology Workshop (SLT), 2014 IEEE, 2014, pp. 348-353.
 J. D. Williams, "A critical analysis of two statistical spoken dialog systems in public use," in Spoken Language Technology Workshop (SLT), 2012 IEEE, 2012, pp. 55-60.
 S. Lee, "Structured discriminative model for dialog state tracking," in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 442-451.
 H. Ren, W. Xu, Y. Zhang, and Y. Yan, "Dialog state tracking using conditional random fields," in Proceedings of SIGDIAL, 2013.
 M. Henderson, B. Thomson, and S. Young, "Word-based dialog state tracking with recurrent neural networks," in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 292-299.
 L. Žilka and F. Jurčíček, "LecTrack: Incremental Dialog State Tracking with Long Short-Term Memory Networks," in International Conference on Text, Speech, and Dialogue, 2015, pp. 174-182.
 A. S. Imada and M. D. Hakel, "Influence of nonverbal communication and rater proximity on impressions and decisions in simulated employment interviews," Journal of Applied Psychology, vol. 62, p. 295, 1977.
 R. J. Forbes and P. R. Jackson, "Non‐verbal behaviour and the outcome of selection interviews," Journal of Occupational Psychology, vol. 53, pp. 65-72, 1980.
 A. R. Feiler and D. M. Powell, "Behavioral expression of job interview anxiety," Journal of Business and Psychology, vol. 31, pp. 155-171, 2016.
 S. P. Levine and R. S. Feldman, "Women and men’s nonverbal behavior and self-monitoring in a job interview setting," Applied HRM Research, vol. 7, pp. 1-14, 2002.
 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
 P. J. Werbos, "Backpropagation through time: what it does and how to do it," Proceedings of the IEEE, vol. 78, pp. 1550-1560, 1990.
 Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, pp. 157-166, 1994.
 K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, "LSTM: A search space odyssey," arXiv preprint arXiv:1503.04069, 2015.