進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0109201922563400
論文名稱(中文) 利用跨句強化關聯嵌入與文件注意力之關係擷取
論文名稱(英文) Inter-sentence enhanced Relation Extraction with dependency embedding and Document-level Attention
校院名稱 成功大學
系所名稱(中) 醫學資訊研究所
系所名稱(英) Institute of Medical Informatics
學年度 107
學期 2
出版年 108
研究生(中文) 許湘琪
研究生(英文) Hsiang-Chi Hsu
學號 Q56064022
學位類別 碩士
語文別 英文
論文頁數 44頁
口試委員 指導教授-高宏宇
口試委員-蔡宗翰
口試委員-王惠嘉
口試委員-謝孫源
中文關鍵字 關係擷取  卷積神經網路  注意力機制  生醫文獻文字探勘 
英文關鍵字 Relation extraction  Attention mechanism  Convolution Neural Network  Biomedical text mining 
學科別分類
中文摘要 提取生物醫學文獻中的關係是生物醫學研究中的一項重要任務,可以促進生物醫學研究或是擴增現有的知識庫。在關係提取任務中,關係提取可以分為句子內和句子間(跨句子)。大多數研究只關注句子內關係提取而忽略句子間關係提取。因此,除了句子內的關係提取,我們還希望提高在句子間的關係提取的效能。我們找到句子間關係的一些語言特徵,並讓這兩個句子通過這些特徵構建橋樑,以此來打破兩個句子的邊界。而句子間級別的關係的語言特徵包括共指解析和在兩個句子中都出現的相同的詞和相似的詞。同時,我們使用卷積神經網絡於兩個句子,此目的用於捕捉在長距離的生物醫學句子中有用的局部特徵。我們還發現了關係提取的新觀點。通常,我們想要判斷實體對的關係,只會考慮模型上的實體對的句子。但實際上,當我們自己想要理解實體對的關係時,我們也會考慮其他句子,像是會考慮句子的前後文或是全文來進行多次檢查才確定關係。因此,將此想法套用於我們的方法中,我們通過使用注意機制來增加額外的句子,以幫助我們的模型提高性能。在評估階段,我們選擇2015年BioCreative V中CDR資料集中的Chemical-induced Disease(CID)任務。我們的系統在CID 任務中,F度量(F-score)獲得最好的結果,F- score是62.1%。同時,我們將CID任務細分為兩種級別(句子內和句子間)的關係提取,並將系統單獨應用於句子內和句子間級別的關係,結果顯示在句子間級別的關係的效能優於目前最先進的系統,F- score為27.79%。
英文摘要 Extraction the relation in biomedical literature is an important task in biomedical research which can prompt the biomedical researches or extend the existing knowledge-based. In the relation extraction (RE) task, RE can be divided into the intra-sentence and inter-sentence. Most of the researches only focus on the intra-sentence relation extraction and ignore the inter-sentence relation extraction. Therefore, in addition to intra-sentence, we also want to improve the performance of the inter-sentence relation extraction. We find some linguistic features for the inter-sentence level relation and let the two sentences can build the bridge by these features to break two sentences boundaries. The linguistic features for the inter-sentence level include coreference resolution, the same words in the two sentences, and the similar words in the two sentences. At the same time, we use the Convolution Neural Network on two sentences to capture the useful local features on long-distance biomedical sentence. We also find out a significant view of the relation extraction. Usually, we want to judge the relationship of the entity pair only consider the sentence in which the entity pair on the model. However, when we want to understand the relation of the entity pair by ourselves, we also consider the other sentences to make multiple checks. So, in our method, we increase the extra sentences by using attention mechanism to assist our approach to improve the performance. In the evaluation stage, we chose the CDR corpus from BioCreative V Chemical-induced Disease task. In CDR corpus, our approach on the Recall and F-score are the best performance, the F-score is 62.1%, and the recall is 67.3%. Our model of the inter-sentence level outperforms the state-of-the-art system in the CDR corpus, and the F-score is 27.79%.
論文目次 中文摘要 II
ABSTRACT III
誌謝 IV
LIST OF TABLES VII
LIST OF FIGURES VIII
1. INTRODUCTION 1
1.1 Background 1
1.2 Related works 3
1.2.1 Intra-sentence relation extraction 3
1.2.2 Inter-sentence relation extraction 8
1.3 Motivation 9
1.4 Our approach 11
1.5 Paper structure 13
2. METHOD 14
2.1 Pre-processing 16
2.1.1 Build inter-sentence level instances 16
2.1.2 Build intra-sentence level instances 16
2.1.3 Linguistic features groups 17
2.2 CNN 19
2.2.1 Inter-sentence features 19
2.2.2 Sentence-CNN model 20
2.2.3 Near words-CNN model 24
2.3 Document-level attention mechanism 25
2.4 Concatenation layer and softmax layer 26
3. EXPERIMENTS AND RESUITS 28
3.1 Dataset description 28
3.1.1 Chemical Disease Relation Corpus 28
3.1.2 Bacteria Biotopes Task 30
3.2 Evaluation metrics 32
3.3 Evaluation of inter-sentence feature 32
3.4 Results 33
3.4.1 The result in CDR Corpus 33
3.4.2 The result in BB Task 34
4. DISCUSSION 36
4.1 Importance of document-level attention mechanism 36
4.2 Different sentences affect the performance 37
4.3 Compare the different attention mechanism on inter-sentence level relation 38
4.4 Different pre-trained word embedding effectiveness 40
5. CONCLUSIONS 41
REFERENCES 42
參考文獻 [1] Gu, Jinghang, et al. "Chemical-induced disease relation extraction via convolutional neural network." Database 2017 (2017)
[2] Zhou, Huiwei, et al. "Chemical-induced disease relation extraction with dependency information and prior knowledge." Journal of biomedical informatics 84 (2018): 171-178.
[3] Gupta, Pankaj, et al. "Neural relation extraction within and across sentence boundaries." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
[4] Xu, Jun, et al. "CD-REST: a system for extracting chemical-induced disease relation in literature." Database 2016 (2016).
[5] Zhou, Huiwei, et al. "Exploiting syntactic and semantics information for chemical–disease relation extraction." Database 2016 (2016).
[6] Li, Haodi, et al. "Chemical-induced disease extraction via recurrent piecewise convolutional neural networks." BMC medical informatics and decision making 18.2 (2018): 60.
[7] Verga, Patrick, Emma Strubell, and Andrew McCallum. "Simultaneously self-attending to all mentions for full-abstract biological relation extraction." arXiv preprint arXiv:1802.10569 (2018).
[8] Sahu, Sunil Kumar, et al. "Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network." arXiv preprint arXiv:1906.04684 (2019).
[9] Wei, Chih-Hsuan, et al. "Overview of the BioCreative V chemical disease relation (CDR) task." Proceedings of the fifth BioCreative challenge evaluation workshop. Vol. 14. 2015.
[10] Zheng, Guineng, et al. "Opentag: Open attribute value extraction from product profiles." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018.
[11] Sorokin, Daniil, and Iryna Gurevych. "Context-aware representations for knowledge base relation extraction." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
[12] Li, Fei, et al. "A neural joint model for entity and relation extraction from biomedical text." BMC bioinformatics 18.1 (2017): 198.
[13] Shi, Weiwei, and Sheng Gao. "Relation extraction via position-enhanced convolutional neural network." 2017 International Conference on Intelligent Environments (IE). IEEE, 2017.
[14] Peng, Yifan, Chih-Hsuan Wei, and Zhiyong Lu. "Improving chemical disease relation extraction with rich features and weakly labeled data." Journal of cheminformatics 8.1 (2016): 53.
[15] Pons, Ewoud, et al. "Extraction of chemical-induced diseases using prior knowledge and textual information." Database 2016 (2016).
[16] Grishman, Ralph, David Westbrook, and Adam Meyers. "NYU’s English ACE 2005 system description." ACE 5 (2005).
[17] Girju, Roxana, et al. "Semeval-2007 task 04: Classification of semantic relations between nominals." Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, 2007.
[18] Zeng, Daojian, et al. "Relation classification via convolutional deep neural network." (2014).
[19] Turian, Joseph, Lev Ratinov, and Yoshua Bengio. "Word representations: a simple and general method for semi-supervised learning." Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 2010.
[20] Pons, Ewoud, et al. "Extraction of chemical-induced diseases using prior knowledge and textual information." Database 2016 (2016).
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-10-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw