進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-3107201917160500
論文名稱(中文) 利用醫學實體識別對應醫學術語自動生成病人自訴摘要
論文名稱(英文) Summary Generation for Chinese Patient Complaint based on Medical Entity Recognition and Medical Terminology Mapping
校院名稱 成功大學
系所名稱(中) 醫學資訊研究所
系所名稱(英) Institute of Medical Informatics
學年度 107
學期 2
出版年 108
研究生(中文) 蔡佳純
研究生(英文) Jia-Chun Cai
電子信箱 u96424u@gmail.com
學號 Q56064030
學位類別 碩士
語文別 英文
論文頁數 62頁
口試委員 指導教授-盧文祥
口試委員-楊中平
口試委員-梁勝富
口試委員-黃明源
中文關鍵字 醫學實體識別  醫學術語對應  自動生成摘要  自然語言生成 
英文關鍵字 Medical Entity Recognition  Medical Terminology Mapping  Automatically Generate Summary  Natural Language Generation 
學科別分類
中文摘要 近年來,隨著健康意識的抬頭及個人上網率的逐漸攀升。越來越多用戶透過線上醫療諮詢平台、健康相關論壇及社群網站尋求專業醫療人員的協助。一般用戶在線上諮詢的內容相較於醫療人員較為冗長且較無結構,因此對於醫療人員來說,提供結構化的醫療實體識別可以輔助醫療人員進行醫療診斷及諮詢回覆。
因此,為了解決上述提到的問題,在本研究提出一醫學實體識別模型,針對病人主訴識別出症狀、疾病、檢查、器官、治療、藥物、健康資訊、時間、科別、縮寫這十個實體。此外,我們也提供一半自動醫學實體識別系統,對於未提取的實體,我們為用戶提供了自己添加實體的功能。實驗表明,我們提供的系統可以減少40%的醫學文本註釋時間。隨著標籤數量的增加,所需的標籤時間將越來越少。
本研究也提供一醫學術語映射方法將非專業術語術語映射到醫學術語。它可以減少非專業人士熟悉的語言與醫學實踐和研究中使用的術語之間的“詞彙差距”。最後,我們的醫療摘要生成中,基於提取的醫療實體生成醫療摘要。在實驗中,與原始醫學文本相比,生成的摘要具有減少的單詞數量,並且原始含義沒有太大差異。
英文摘要 In recent years, with the rise of health awareness and the gradual increase of personal Internet access rate. There are more and more online health consultation websites, and more and more users seek professional through online medical consultation platform, health related forums and social networking websites. The content of general user consultation on the Internet is more verbose and less structured than medical personnel. Therefore, for medical personnel, providing structured medical entity identification can assist medical personnel in medical diagnosis and consultation.
Therefore, to solve the above-mentioned problems, a medical entity recognition model was proposed in this study to identify the ten entities including Symptoms, Disease, Health Information, Department, Treatment, Examination, Medication, Organs, Time and Abbreviation. For entities that are not extracted, we provide the ability for users to add entities themselves. Experiments have shown that the system we provide can reduce 40% time for medical text annotation. As the number of tags increases, the required tag time will be less and less. We map the layperson terms to medical terms through the medical term mapping method. It can reduce the “vocabulary gap” between the language that laypersons are familiar with and the terminology used in medical practice and research.
In our medical summary generation, a medical summary is generated based on the extracted medical entity. In the experiment, the generated summary has a reduced number of words compared to the original medical text, and the original meaning is not much different.
論文目次 摘要 III
ABSTRACT V
誌謝 VII
TABLE OF CONTENTS VIII
LIST OF TABLES X
LIST OF FIGURES XII
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 MOTIVATION 4
1.3 METHOD 5
1.4 CONTRIBUTION 7
1.5 ORGANIZATION OF THIS DISSERTATION 7
CHAPTER 2 RELATED WORK 8
2.1 MEDICAL ENTITY RECOGNITION 8
2.2 STUDY ON MEDICAL TERMINOLOGY MAPPING 9
2.3 STUDY ON SUMMARY GENERATE 10
CHAPTER 3 METHOD 11
3.1 SYSTEM FRAMEWORK 11
3.2 PREPARATION 12
3.2.1 Medical Subject Heading Term 12
3.2.2 Tools: CKIP Chinese Word Segmentation 13
3.2.3 Datasets and Preprocessing Step 14
3.3 MEDICAL ENTITY RECOGNITION 23
3.3.1 Candidate Entity Generation 24
3.3.2 Medical Entity Decision 36
3.3.3 Semi-Automatic Medical Entity Extraction 37
3.4 MEDICAL TERMINOLOGY MAPPING 38
3.5 MEDICAL SUMMARY GENERATION 42
CHAPTER 4 EXPERIMENTS 44
4.1 EXPERIMENT SETUP 44
4.1.1 Dataset 44
4.1.2 Evaluation Metrics 47
4.2 EXPERIMENT ON SEMI-AUTOMATIC MEDICAL ENTITY EXTRACTION 48
4.3 EXPERIMENT ON MEDICAL ENTITY RECOGNITION MODEL 50
4.4 EXPERIMENT ON MEDICAL ENTITY TERMINOLOGY MAPPING 55
CHAPTER 5 CONCLUSIONS 58
5.1 CONCLUSIONS 58
5.2 FUTURE WORK 59
ACKNOWLEDGMENT 59
REFERENCE 60
參考文獻 Reference
[1] 5151線上健康照護網. Available from: http://www.5151.tw/dm.php.
[2] 快速問醫生有問必答. Available from: https://www.120ask.com.
[3] National Development Council, 103-year personal/family digital opportunity survey report. 2014.
[4] National Development Council, 107-year personal household opportunity survey report. 2018.
[5] National Development Council, 106-year digital development report. 2017.
[6] Taiwan Network Information Center, 2018 The Taiwan Internet Report. 2018.
[7] MEDIOT, 醫生創業打造「醫療版Uber」!鳴醫要讓專業人士不因地點受限.
[8] Keretna, S., C.P. Lim, and D. Creighton, A Hybrid Model for Named Entity Recognition Using Unstructured Medical Text. 2014 9th International Conference on System of Systems Engineering.,, 2014: p. 85 - 90.,.
[9] Kundeti, S.R., et al. Clinical named entity recognition: Challenges and opportunities. in 2016 IEEE International Conference on Big Data (Big Data). 2016. IEEE.
[10] Liu, K., et al. Named Entity Recognition in Chinese Electronic Medical Records Based on CRF. in 2017 14th Web Information Systems and Applications Conference (WISA). 2017. IEEE.
[11] Xu, J., et al., Unsupervised medical entity recognition and linking in Chinese online medical text. Journal of healthcare engineering, 2018. 2018.
[12] Boyle, C.M., Difference between patients' and doctors' interpretation of some common medical terms. Br Med J, 1970. 2(5704): p. 286-289.
[13] Zeng-Treitler, Q., et al., Estimating consumer familiarity with health terminology: a context-based approach. Journal of the American Medical Informatics Association, 2008. 15(3): p. 349-356.
[14] University of Kentucky. Medical Terms in Lay Language. Available from: https://hso.research.uiowa.edu/medical-terms-lay-language.
[15] Glossary of Lay Terminology Available from: https://www.med.upenn.edu/ocrobjects/PM/2_glossary.of.lay.terms.pdf.
[16] Padmakumar, A. and A. Saran, Unsupervised Text Summarization Using Sentence Embeddings. 2016.
[17] Chengzhang, X. and L. Dan. Chinese text summarization algorithm based on word2vec. in Journal of Physics: Conference Series. 2018. IOP Publishing.
[18] See, A., P.J. Liu, and C.D. Manning, Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368, 2017.
[19] Paulus, R., C. Xiong, and R. Socher, A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304, 2017.
[20] Aramaki, E., et al. Text2table: Medical text summarization system based on named entity recognition and modality identification. in Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing. 2009. Association for Computational Linguistics.
[21] Medical Subject Headings, T.U.S.N.L.o. Medicine, Editor.: USA.
[22] Chinese Knowledge Information Processing. CKIP chinese word segmentaion. Available from: http://ckip.iis.sinica.edu.tw/service/corenlp/.
[23] Taipei Veterans General Hospital, Organs of Human Body.
[24] National Health Insurance Administration, Health insurance payment project.
[25] Food and Drug Administration, Health insurance drug item.
[26] Fubon Insurance, Profession List.
[27] National Health Insurance Research Database, Department.
[28] National Taichung University of Science and Technology Department of Nursing, Abbreviation term.
[29] National Disaster Medical Assistance Team., 國家級災難醫療救護隊.
[30] MacKay Memorial Hospital. 馬偕紀念醫院. Available from: https://post.mmh.org.tw/english/?page_id=26.
[31] Chinese Knowledge Information Processing Group Academia Sinica Institute of Information Science, 中央研究院平衡語料庫的內容與說明. 1998.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-08-02起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-08-02起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw