進階搜尋


 
系統識別號 U0026-0812200911384832
論文名稱(中文) 基於自然語言處理技術之網路文件問答系統
論文名稱(英文) NLP-based Question Answering System with application on WEB documents
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 93
學期 2
出版年 94
研究生(中文) 江柏勳
研究生(英文) Bo-Xun Jiang
電子信箱 jiangbs@cad.csie.ncku.edu.tw
學號 p7692164
學位類別 碩士
語文別 中文
論文頁數 53頁
口試委員 口試委員-吳宗憲
口試委員-高成炎
口試委員-蔡志忠
口試委員-蔡正發
指導教授-蔣榮先
中文關鍵字 搜尋引擎  查詢建構  自然語言  問答系統 
英文關鍵字 Search Engine  Query Formulator  Question Answering System  Natural Language Processing  NLP 
學科別分類
中文摘要   自然語言問答系統為使答案正確率提高,必須對問句所詢問的意向(Intension Analysis)做分析,意即了解問句的內容。在本論文中便是透過自然語言處理技術,藉由對問句之字詞結構與所含語意做分析,協助判定問句之詢問意函。由於英文問句之疑問詞多含有對於所求答案類型之語意資訊,比如when開頭的問句所求為時間、日期等答案類型,而where則是有關地方、位置等答案類型,因此在本論文中將詢問意向界定在於問句所求之答案類型,並針對問句所求答案類型之不同做分類。
  
  對問句做分類的目的,主要是為了能在系統後續的處理中建構各個答案類型之最佳處理策略。由於本系統是採用網路搜尋引擎作為文件擷取工具,為使擷取文件含有答案之機率提高,必須依據問句之答案類型資訊,選擇本系統所建查詢樣板庫中相對應之樣板,以轉換該問句於搜尋引擎上之最佳查詢。此外,不同答案類型的答案在文件語句的字詞結構中亦有不同的出現位置,因此針對個別的答案類型將進行不同的答案萃取處理以增加答案之正確率。
英文摘要   To raise the accuracy of a natural language question-answering system, it is imperative to perform intention analysis on the inquiries, that is, to understand the content of the questions. In this thesis, we analyze both the syntactic structure and semantic interpretation of the questions to diagnose the intentions through natural language processing techniques. In English, the interrogative sentences usually contain semantic information telling the type of answers expected. For instance, sentences begin with the word “when” anticipate date or time information as the responded answers, while place or location information is expected by the sentences start with the word “where”. Accordingly, we categorize the questions based on their corresponding answer types.
  
  The main intent of such categorization of questions is to assist in the template construction of various answer types for the system processing later on. When a question is submitted to our system, it first identifies the corresponding answer type for the query and then rephrase the question into a form such that the probability of documents retrieved from public Search Engines containing the expected responses is boosted. Furthermore, since an answer can appear in different places in a sentence for different answer types, the answer retrieval process is carried out separately for individual answer type in order to increase the precision.
論文目次 第一章 導論.......1
1.1 研究動機......1
1.2 問題描述......2
1.3 解決方法......2
1.4 章節概要......3
第二章 相關研究...4
2.1 搜尋引擎之資訊擷取及呈現方式.......4
2.2 問答系統介紹..5
2.2.1 TREC QA.....5
2.2.2 WEB QA......6
2.3 問答系統處理架構........11
2.4 問題分類機制............11
2.5 文字與資訊相關處理技術..12
2.6 自然語言處理技術........13
第三章 自然語言問答系統CKQA..........15
3.1 問題分析(Question Analyzer)....17
3.1.1 問題之答案類型分類(Answer Type Classification)..18
3.1.2 問題資訊之分析(Question Information Parsing)....20
3.2 查詢建構(Query Formulator).....22
3.2.1 基於Google搜尋引擎之查詢轉換...23
3.2.2 問句至查詢之轉換......23
3.3 網頁文件擷取(WEB Document Retrieval).....25
3.3.1 Google Web APIs之使用..........26
3.3.2 Google Web APIs之查詢限制......28
3.4 答案萃取(Answer Extractor).....29
3.4.1 網路文件轉換為純文字(Convert WEB documents to plain texts)......30
3.4.2 候選答案的萃取(Extract candidate answers)......31
3.5 答案排序(Answer Ranking).......34
第四章 實驗結果與分析.......35
4.1 實驗設計................35
4.1.1 實驗資料介紹..........35
4.1.2 實驗流程設計..........37
4.2 實驗結果與分析..........38
4.2.1 實驗一之結果分析......38
4.2.2 實驗二之結果分析......40
4.2.2.1 與START問答系統的比較........41
4.2.2.2 與NSIR問答系統的比較.........43
4.2.2.3 與AnswerBus問答系統的比較....44
第五章 結論與未來研究方向............46
5.1 結論...........46
5.2 未來研究方向...46
參考文獻............48
附錄一 詞性標記列表..................50
附錄二 自然語言問答系統CKQA之展示....52

參考文獻 [1] Agichtein, E., Lawrence, S., Gravano, L. Learning Search Engine Specific Query Transformations for Question Answering. 10th WWW Conference, 2001.
[2] Allan Heydon, Marc Najork, “Mercator: A Scalable, Extensible Web Crawler”, World Wide Web, vol. 2, no. 4, pp. 219--229, 1999.
[3] Amit Singhal, “ Modern Information Retrieval: A Brief Overview”, Google, Inc.
[4] Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran, “Omnibase: Uniform access to heterogeneous data for question answering”. In Proceedings of the 7th International Workshop on Applications of Natural Language to Information Systems (NLDB 2002), 2002,
[5] Dragomir R. Radev, Weiguo Fan, Hong Qi, Harris Wu, and Amardeep Grewal, “Probabilistic question answering on the web”, in WWW ’02: Proceedings of the eleventh international conference on World Wide Web, (Honolulu, Hawaii, USA), ACM Press, 2002.
[6] Ellen Riloff, Janyce Wiebe, “Learning Extraction Patterns for Subjective Expressions”. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03),2003
[7] Ellen M. Voorhees, “The TREC-8 question answering track report”. In proceedings of the 8th Text REtrieval Conference (TREC), 1999
[8] Ellen M. Voorhees, “Overview of the TREC 2001 Question Answering Track”. In Proceedings of the Tenth Text REtrieval Conference (TREC), 2001.
[9] Ellen.M. Voorhees, “Overview of the TREC 2002 Question Answering Track”. In Proceedings of the Eleventh Text REtrieval Conference (TREC), 2002.
[10] Stephen Soderland, “Learning to Extract Text-based Information from the World Wide Web”. In Proceedings of the Third International Conference on  Knowledge Discovery and Data Mining (KDD-97), 1997.
[11] website:TREC, http://trec.nist.gov/
[12] website:START Question Answeing System, http://start.csail.mit.edu/
[13] website:NSIR , http://tangra.si.umich.edu/clair/NSIR/html/nsir.cgi
[14] website:Monty Tagger, http://web.media.mit.edu/~hugo/montylingua/
[15] website:CIA – The World Factbook , published by the US Central Intelligence Agency, http://www.cia.gov/cia/publications/factbook/
[16] website:AnswerBus, http://www.answerbus.com/index.shtml
[17] website:Google, http://www.google.com.tw/
[18] website:Yahoo, http://tw.yahoo.com/
[19] website:Google Web APIs, http://www.google.com/apis/
[20] website:The Internet Movie Database , http://www.imdb.com/
[21] Xin Li and Dan Roth,“Learning Question Classifiers”. In Proceedings of the 19th International Conference on Computational Linguistics, 2002
[22] Zhiping Zheng,“AnswerBus question answering system”. In proceeding of 2002 Human Language Technology Conference (HLT 2002), 2002.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2006-08-08起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2006-08-08起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw