進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1707201919544900
論文名稱(中文) 利用上下文引用辨識與推薦
論文名稱(英文) Context-aware Citation Recognition and Recommendation
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 107
學期 2
出版年 108
研究生(中文) 鄭人瑋
研究生(英文) Jen-Wei Cheng
學號 R76064077
學位類別 碩士
語文別 中文
論文頁數 67頁
口試委員 指導教授-王惠嘉
口試委員-高宏宇
口試委員-劉任修
口試委員-李偉柏
中文關鍵字 引用推薦  引用句偵測  空間向量模型 
英文關鍵字 Citation recommendation  citation contexts detection  vector space model 
學科別分類
中文摘要 引用(Citations)在論文中扮演著研究溝通的橋樑,學者可以透過論文之間的引用關係了解一個領域知識的演進脈絡。由於一個引用的背後其含有許多潛在的意義,包括動機、比較目的等等,學者必需要非常仔細審閱論文中的引用句內容才能夠了解其中的意義。然而隨著論文產量的遽增,現今的研究人員很難在有限時間內審閱所有的相關文獻並且在撰寫論文時找到最適當的引用,因此開始有學者嘗試將尋找引用文獻的過程轉為自動化的程序,並專注在引用推薦(Citation Recommendation)的研究領域。
引用推薦其目的是自動推薦適當的引用文獻給一個需要引用的句子,其對於學者論文寫作或是審閱文獻會有很大的幫助,並且也是朝向論文自動生成領域的重要的里程碑。然而,當前研究所提出的引用推薦模型其假設皆為輸入本身即是需要引用的句子,實際上學者寫完手稿很難知道手稿內需要引用的位置;此外在比對的步驟中,先前研究通常會直覺地將論文視為一個實體單位,並直接和句子做比較,然而引用句中的內容通常只會代表論文中的特定部分,而並非完全是論文的摘要。因此本研究將提出一個新的引用推薦模型(CiRec),其中含有一個引用句偵測的過程,是用來尋找輸入手稿中需要引用的句子;並且在比對的過程中,本研究分成論文的摘要、論文的內文以及論文中的引用句(In-link Contexts)進行向量的相似度比對,從中找出引用句和引用文獻最相似的部分。實驗結果顯示本研究提出的比對方式反映了學者尋找參考文獻的方式,比當前使用論文比對的方法在前10推薦結果的Recall、MAP、MRR和NDCG這四個指標中均有更好的表現。
英文摘要 With the increase in the publication volume of the paper, it is difficult for today's researchers to review all relevant literature in a limited time and find the most appropriate reference when writing a paper. Therefore, some researchers have begun to try to turn the process of finding citations into an automated procedure, and focus on the domain area of citation recommendation.
The purpose of citation recommendation is to automatically recommend the appropriate citations to a sentence that needs to be cited, which will be of great help to the researcher's thesis writing or reviewing the literature. However, previous studies assume that the input itself is the sentence that needs to be cited. In addition, previous studies usually calculate the similarity score by simply using the full-text of paper, which is not accurate. The content of the sentence usually only represents a specific part of the paper, not just a summary of the paper.
This study proposed a citation recommendation framework (CiRec), which contains a citation contexts detection process to find the sentences that need to be cited in the input manuscript; and for the process of comparison, this study is divided into abstracts, full-text, and the In-link contexts of the papers compared to a sentence, which is to find the most similar part between the sentence and the cited document. The experimental results showed that the comparison method proposed in this study reflects the behavior of citation search of the researchers, which is better than the current method.
論文目次 第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 3
1.3 研究限制 5
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 引用句分析(Citation Context Analysis) 8
2.1.1 引用句研究與定義 8
2.1.2 引用句偵測(Citation Contexts Detection) 9
2.1.3 引用意圖分類(Citation Purpose Classification) 10
2.2 向量空間模型(Vector Space Model) 12
2.2.1 詞表示法(Word Representation) 13
2.2.2 文件表示法(Document Representation) 15
2.3 引用推薦(Citation Recommendation) 18
2.4 小結 21
第3章 研究方法 23
3.1 研究架構與參數介紹 23
3.2 資料集前處理(Corpus Preprocess) 25
3.3 引用句偵測(Citation Contexts Detection) 28
3.4 比對推薦(Comparison & Recommendation) 29
3.4.1 計算複雜度探討 30
3.4.2 計算範例 32
3.5 小結 32
第4章 系統建置與驗證 33
4.1 系統實作 33
4.2 資料集選用 33
4.3 資料前處理 36
4.3.1 文獻前處理 36
4.3.2 引用句前處理 36
4.4 引用句偵測模型訓練 36
4.5 評估指標 37
4.5.1 Recall 38
4.5.2 Reciprocal Rank (RR) 38
4.5.3 Average Precision(AP) 38
4.5.4 Normalized Discounted Cumulative Gain (NDCG) 39
4.1 Cohen’s kappa coefficient(Cohen, 1960) 39
4.1.1 計算範例 40
4.2 參數調整 42
4.3 比較方法 43
4.4 實驗驗證 44
4.5 實驗結果探討 45
4.5.1 實驗一:各引用推薦方法比較 45
4.5.2 實驗二:針對不同引用數的句子推薦成效探討 51
4.5.3 實驗三:引用推薦合適程度探討 53
4.5.4 實驗四:引用句偵測精確程度探討 54
第5章 結論探討 58
5.1 研究成果 58
5.2 未來發展 60
參考文獻 62
參考文獻 Amjad, A., Jefferson, E., & Dragomir, R. (2013, June). Purpose and Polarity of Citation: Towards Nlp-based Bibliometrics. Paper presented at the Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, Atlanta, Georgia.
Bertin, M., & Atanassova, I. (2014). A Study of Lexical Distribution in Citation Contexts Through the IMRaD Standard. PloS Negl. Trop. Dis, 1(200,920), 83,402.
Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The Invariant Distribution of References in Scientific Articles. Journal of the Association for Information Science and Technology, 67(1), 164-177.
Bertin, M., Atanassova, I., Sugimoto, C., & Lariviere, V. (2016). The Linguistic Patterns and Rhetorical Structure of Citation Context: An Approach Using N-grams. Scientometrics, 109(3), 1417-1434.
Bhagavatula, C., Feldman, S., Power, R., & Ammar, W. (2018, June). Content-Based Citation Recommendation. Paper presented at the Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana.
Cao, Q., Duan, W., & Gan, Q. (2011). Exploring Determinants of Voting for the “Helpfulness” of Online User Reviews: A Text Mining Approach. Decision Support Systems, 50(2), 511-521.
Caragea, C., Bulgarov, F., Godea, A., & Gollapalli, S. (2014, October). Citation-enhanced Keyphrase Extraction from Research Papers: A Supervised Approach. Paper presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.
Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and psychological measurement, 20(1), 37-46.
Duma, D., & Klein, E. (2014). Citation Resolution: A Method for Evaluating Context-based Citation Recommendation Systems. Paper presented at the Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland.
Duma, D., Liakata, M., Clare, A., Ravenscroft, J., & Klein, E. (2016, May). Applying Core Scientific Concepts to Context-Based Citation Recommendation. Paper presented at the LREC.
Duma, D., Sutton, C., & Klein, E. (2016, June). Context Matters: Towards Extracting a Citation's Context Using Linguistic Features. Paper presented at the Digital Libraries (JCDL), 2016 IEEE/ACM Joint Conference on, Newark, NJ, USA.
Ebesu, T., & Fang, Y. (2017, August). Neural Citation Network for Context-aware Citation Recommendation. Paper presented at the Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan.
Färber, M., Thiemann, A., & Jatowt, A. (2018a, May). A High-Quality Gold Standard for Citation-based Tasks. Paper presented at the LREC, Miyazaki, Japan.
Färber, M., Thiemann, A., & Jatowt, A. (2018b, March). To Cite, or Not to Cite? Detecting Citation Contexts in Text. Paper presented at the European Conference on Information Retrieval, Grenoble, France.
Hamedani, M., Kim, S., & Kim, D. (2016). SimCC: A Novel Method to Consider both Content and Citations for Computing Similarity of Scientific Papers. Information Sciences, 334, 273-292.
Han, J., Song, Y., Zhao, W., Shi, S., & Zhang, H. (2018). hyperdoc2vec: Distributed Representations of Hypertext Documents. arXiv preprint arXiv:1805.03793.
Hargens, L. (2000). Using the Literature: Reference Networks, Reference Contexts, and the Social Structure of Scholarship. American sociological review, 846-865.
He, H., & Garcia, E. (2008). Learning from Imbalanced Data. IEEE Transactions on Knowledge & Data Engineering(9), 1263-1284.
He, Q., Kifer, D., Pei, J., Mitra, P., & Giles, C. (2011, February). Citation Recommendation without Author Supervision. Paper presented at the Proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, China.
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010, April). Context-aware Citation Recommendation. Paper presented at the Proceedings of the 19th international conference on World wide web, Raleigh, North Carolina, USA.
Huang, W., Wu, Z., Chen, L., Mitra, P., & Giles, C. (2015, January). A Neural Probabilistic Model for Context Based Citation Recommendation. Paper presented at the AAAI, Austin, Texas.
Jensen, S., Liu, X., Yu, Y., & Milojevic, S. (2016). Generation of Topic Evolution Trees from Heterogeneous Bibliographic Networks. Journal of informetrics, 10(2), 606-621.
Jinha, A. (2010). Article 50 Million: An Estimate of the Number of Scholarly Articles in Existence. Learned Publishing, 23(3), 258-263.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759.
Kataria, S., Mitra, P., & Bhatia, S. (2010, July). Utilizing Context in Generative Bayesian Models for Linked Corpus. Paper presented at the AAAI, Georgia, USA.
Khalid, A., Khan, F., Imran, M., Alharbi, M., Khan, M., Ahmad, A., & Jeon, G. (2018). Reference Terms Identification of Cited Articles as Topics from Citation Contexts. Computers & Electrical Engineering.
Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018, June ). Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles. Paper presented at the Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, Fort Worth, Texas, USA.
Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015, July). From Word Embeddings to Document Distances. Paper presented at the International Conference on Machine Learning, Lille, France.
Landis, J., & Koch, G. (1977). The Measurement of Observer Agreement for Categorical Data. biometrics, 159-174.
Lauscher, A., Glavaš, G., Ponzetto, S., & Eckert, K. (2017, December). Investigating Convolutional Networks and Domain-specific Embeddings for Semantic Classification of Citations. Paper presented at the Proceedings of the 6th International Workshop on Mining Scientific Publications, New York, NY, USA.
Le, Q., & Mikolov, T. (2014, June). Distributed Representations of Sentences and Documents. Paper presented at the International Conference on Machine Learning, Beijing, China.
Lutz, B., & Rüdiger, M. (2015). Growth Rates of Modern Science: A Bibliometric Analysis Based on the Number of Publications and Cited References. Journal of the Association for Information Science and Technology, 66(11), 2215-2222.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013, December). Distributed Representations of Words and Phrases and Their Compositionality. Paper presented at the Advances in neural information processing systems, Harrahs and Harveys, Lake Tahoe
Peirsman, Y. (2018). Comparing Sentence Similarity Methods. Retrieved from http://nlp.town/blog/sentence-similarity/
Pennington, J., Socher, R., & Manning, C. (2014, October). Glove: Global Vectors for Word Representation. Paper presented at the Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar.
Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth Mover's Distance as a Metric for Image Retrieval. International journal of computer vision, 40(2), 99-121.
Simone, T., Advaith, S., & Dan, T. (2006, July 22 - 23, 2006 ). Automatic Classification of Citation Function. Paper presented at the Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australia.
Son, J., & Kim, S. (2018). Academic Paper Recommender System Using Multilevel Simultaneous Citation Networks. Decision Support Systems, 105, 24-33.
Sugiyama, K., Kumar, T., Kan, M., & Tripathi, R. (2010, March). Identifying Citing Sentences in Research Papers Using Supervised Learning. Paper presented at the Information Retrieval & Knowledge Management,(CAMP), 2010 International Conference, Shah Alam, Selangor, Malaysia.
Tan, J., Wan, X., & Xiao, J. (2016, October). A Neural Network Approach to Quote Recommendation in Writings. Paper presented at the Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, USA.
Tang, J., & Zhang, J. (2009, April). A Discriminative Approach to Topic-based Citation Recommendation. Paper presented at the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand.
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C., & Hao, H. (2016). Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification. Neurocomputing, 174, 806-814.
Yousif, A., Niu, Z., & Nyamawe, A. (2018, August). Citation Classification Using Multitask Convolutional Neural Network Model. Paper presented at the International Conference on Knowledge Science, Engineering and Management, Changchun, China.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-07-26起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-07-26起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw