進階搜尋


下載電子全文  
系統識別號 U0026-2408201101275100
論文名稱(中文) 一個以搜尋引擎為基礎的生醫字詞語意相關度量測之互生方法
論文名稱(英文) A Search Engine-based Mutually Reinforcing Approach on Measuring Semantics Relatedness of Biomedical Terms
校院名稱 成功大學
系所名稱(中) 醫學資訊研究所
系所名稱(英) Institute of Medical Informatics
學年度 99
學期 2
出版年 100
研究生(中文) 陳弘宇
研究生(英文) Hung-Yu Chen
學號 q56981070
學位類別 碩士
語文別 英文
論文頁數 59頁
口試委員 口試委員-陳淑慧
口試委員-王惠嘉
口試委員-蔡美玲
指導教授-高宏宇
中文關鍵字 語意關係  詞彙樣式  HITS演算法  搜尋引擎 
英文關鍵字 Semantic relatedness  Lexical pattern  HITS Algorithm  Search engine 
學科別分類
中文摘要 評估兩個生物醫學字詞之間語意的關聯程度對於生物醫學領域的資訊檢索、自然語言處理或是文獻探勘是一個很重要的任務。當兩個字詞出現在同一個句子中,通常是描述兩個字詞之間存在的關係,從這些關係中可以了解字詞之間的關聯程度。在以往的多數研究中,學者利用搜尋引擎提供的資訊,觀察生物醫學字詞之間在句子中組成的形式並且針對組成的形式建立成一個詞彙樣式(lexical pattern),而詞彙樣式的確提供了了解兩個字詞之間關聯的特性,但是字詞之間的關係不能說明字詞之間關聯性的強弱。所以,在這篇研究中,我們提出了一個語意樣式互生排名Mutually Reinforcing Lexical Pattern Ranking (ReLPR)演算法,針對高度關聯的生物醫學同義詞組學習具有影響力的生物醫學同義詞詞彙樣式,利用這些生物醫學同義詞詞彙樣式評估生物醫學字詞之間的關聯程度。ReLPR演算法的概念是針對裝載詞彙樣式的容器和詞彙樣式之間的關係決定詞彙樣式的一個影響力,辭彙容器所含有的詞彙樣式會決定一個辭彙容器提供字詞之間關聯程度是否是重要的資訊,因此容器內的詞彙樣式就有其影響能力。最後的實驗顯示我們的方法在兩個生物醫學的資料集下,共相關係數達到0.803~0.838,與之前研究的方法比較我們評估生物醫學字詞之間語意的關聯程度有明顯的改善。
英文摘要 Identifying the semantic relatedness of two biomedical terms is an important task for the information retrieval, natural language processing, and text mining in the biomedical field. When two terms co-occur in a sentence, there are several semantic relations between them. In view of these semantic relations, we can understand the manner in which two terms are associated. In previous study, the information of search engines has been extensively used to analyze the patterns of biomedical terms and transform them into lexical patterns. The lexical patterns represent the characteristics of two terms, but they are unable to estimate the correlations of two terms. Therefore, in this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in the biomedical field. The ReLPR algorithm employs the lexical patterns and their pattern containers to assess the influence of pattern structures from search engines, and the lexical patterns of containers determine the capability of semantic relatedness. As a result, the correlation coefficients of the Re algorithm, on the average, achieved 0.82 on various datasets, which shows the ReLPR algorithm performed significantly better than previous methods.
論文目次 CONTENT
中文摘要 III
ABSTRACT IV
誌謝 V
FIGURE LISTING VIII
TABLE LISTING X
1. INTRODUCTION 1
1.1 Background 1
1.2 Motivation 5
1.3 Our approach 12
1.4 Paper structure 13
2. RELATED WORK 14
2.1 Related search 14
2.1.1 Ontology-based approach 14
2.1.2 Corpus-based approach 15
2.1.3 Search engine-based approach 16
2.2 Knowledge resources 19
2.2.1 Yahoo! search BOSS 19
2.2.2 MedicineNet.com 20
2.2.3 Synonyms.net 20
3. METHOD 22
3.1 Acquisition of synonym pairs 23
3.2 Crawl concept pair from search engine 24
3.3 Extracting Lexical Pattern from Snippets 25
3.4 ReLPR: Mutually Reinforcing Lexical Pattern Ranking algorithm 26
3.5 Measuring Semantic Relatedness 31
4. EXPERIMENTS 33
4.1 Dataset 33
4.2 Evaluation criterions 36
4.3 Description of comparing baseline method 36
4.4 Comparison of results of rank correlation coefficient 38
4.4.1 Analysis of training set 38
4.4.2 Compare with baseline method 47
4.4.3 Compare with other method 54
5. CONCLUSIONS 56
6. REFERENCES 57
參考文獻 REFERENCES
[1]. Al-Mubaid, H. and H.A. Nguyen, Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 2009. 39(4): p. 389-398.
[2]. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
[3]. Bollegala, D., Y. Matsuo, and M. Ishizuka, Measuring semantic similarity between words using web search engines, in Proceedings of the 16th international conference on World Wide Web. 2007, ACM: Banff, Alberta, Canada.
[4]. Bollegala, D., Y. Matsuo, and M. Ishizuka, Measuring the similarity between implicit semantic relations using web search engines, in Proceedings of the Second ACM International Conference on Web Search and Data Mining. 2009, ACM: Barcelona, Spain.
[5]. Bollegala, D., Y. Matsuo, and M. Ishizuka, A Web Search Engine-Based Approach to Measure Semantic Similarity between Words. Knowledge and Data Engineering, IEEE Transactions on, 2010. PP(99): p. 1-1.
[6]. Bollegala, D., N. Okazaki, and M. Ishizuka, A bottom-up approach to sentence ordering for multi-document summarization. Information Processing & Management, 2010. 46(1): p. 89-109.
[7]. Caviedes, J.E. and J.J. Cimino, Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics, 2004. 37(2): p. 77-85.
[8]. Chen, C.H., S.L. Hsieh, Y.C. Weng, W.Y. Chang, and F. Lai, Semantic similarity measure in biomedical domain leverage web search engine. Conf Proc IEEE Eng Med Biol Soc, 2010. 2010: p. 4436-9.
[9]. Chen, H.-H., M.-S. Lin, and Y.-C. Wei, Novel association measures using web search with double checking, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. 2006, Association for Computational Linguistics: Sydney, Australia.
[10]. Church, K.W. and P. Hanks, Word association norms, mutual information, and lexicography. Comput. Linguist., 1990. 16(1): p. 22-29.
[11]. Cilibrasi, R.L. and P.M.B. Vitanyi, The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 2007. 19(3): p. 370-383.
[12]. Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Harshman, Indexing by latent semantic analysis. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1990. 41(6): p. 391-407.
[13]. Hliaoutakis, A., Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline. Master's thesis, 2005.
[14]. Jiang, J.J. and D.W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. in International Conference Research on Computational Linguistics (ROCLING X). 1997.
[15]. Kleinberg, J.M., Authoritative sources in a hyperlinked environment. J. ACM, 1999. 46(5): p. 604-632.
[16]. Leacock, C. and M. Chodorow, Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: A Lexical Reference System and its Application, 1998: p. 265-283.
[17]. Li, M., X. Chen, X. Li, B. Ma, P. Vit, #225, and nyi, The similarity metric, in Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. 2003, Society for Industrial and Applied Mathematics: Baltimore, Maryland.
[18]. Lin, D., Automatic retrieval and clustering of similar words, in Proceedings of the 17th international conference on Computational linguistics - Volume 2. 1998, Association for Computational Linguistics: Montreal, Quebec, Canada.
[19]. Lin, D., Review of WordNet An Electronic Lexical Database, C. Fellbaum, Editor. 1998.
[20]. Lord, P.W., R.D. Stevens, A. Brass, and C.A. Goble, Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 2003. 19(10): p. 1275-83.
[21]. McCrae, J. and N. Collier, Synonym set extraction from the biomedical literature by lexical pattern discovery. BMC Bioinformatics, 2008. 9: p. 159.
[22]. Patwardhan, S. and T. Pedersen. Using {WordNet}-based Context Vectors to Estimate the Semantic Relatedness of Concepts. in EACL 2006 Workshop Making Sense of Sense---Bringing Computational Linguistics and Psycholinguistics Together. 2006.
[23]. Pedersen, T., S.V.S. Pakhomov, S. Patwardhan, and C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 2007. 40(3): p. 288-299.
[24]. Rada, R., H. Mili, E. Bicknell, and M. Blettner, Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 1989. 19(1): p. 17-30.
[25]. Resnik, P., Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130.
[26]. Rubenstein, H. and J.B. Goodenough, Contextual correlates of synonymy. Commun. ACM, 1965. 8(10): p. 627-633.
[27]. Sahami, M. and T.D. Heilman, A web-based kernel function for measuring the similarity of short text snippets, in Proceedings of the 15th international conference on World Wide Web. 2006, ACM: Edinburgh, Scotland.
[28]. Sch, H., and tze, Automatic word sense discrimination. Comput. Linguist., 1998. 24(1): p. 97-123.
[29]. Wilbur, W.J. and Y. Yang, An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine, 1996. 26(3): p. 209-222.
[30]. Wu, Z. and M. Palmer, Verbs semantics and lexical selection, in Proceedings of the 32nd annual meeting on Association for Computational Linguistics. 1994, Association for Computational Linguistics: Las Cruces, New Mexico.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2012-08-30起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2012-08-30起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw