進階搜尋


 
系統識別號 U0026-0812200915040080
論文名稱(中文) 以PageRank演算法分析閱讀行為以達成數位文章摘要
論文名稱(英文) PageRank based Reading Pattern Analysis for eDocument Summarization
校院名稱 成功大學
系所名稱(中) 工程科學系碩博士班
系所名稱(英) Department of Engineering Science
學年度 97
學期 1
出版年 98
研究生(中文) 郭彥宏
研究生(英文) Yen-Hung Kuo
學號 n9894134
學位類別 博士
語文別 英文
論文頁數 73頁
口試委員 口試委員-黃國禎
指導教授-黃悅民
口試委員-楊鎮華
口試委員-陳年興
口試委員-陳俊良
口試委員-楊叔卿
口試委員-游寶達
口試委員-張國恩
中文關鍵字 閱讀式樣圖  PageRank  低連結性  數位文章摘要  閱讀行為 
英文關鍵字 Reading Pattern Graph (RPG)  Reading behavior  PageRank  Low connectivity  eDocument summarization 
學科別分類
中文摘要 相較於二十年前,現代人選擇性閱讀的行為更加明顯,為了由閱讀中獲取更多的資訊,人們通常花較多的時間在重要的內容上並略讀其他的內容,這種行為上的改變給了作者研究的靈感,並激發作者發展一套透過分析人們閱讀活動來提取數位文章重要片段的方法。為了有效率的分析閱讀活動,作者於研究中整理了一系列的閱讀行為,並利用其將閱讀活動表示為閱讀樣式圖,在閱讀樣式圖中點代表數位文章的片段,而邊則是閱讀的路徑,於所提出的方法中,每一個閱讀式樣圖都將透過PageRank演算法來計算其圖中各個點的重要性,之後所有的閱讀式樣圖中各片段的重要性將被平均以獲得一個綜合性的排名結果,最後,在綜合性排名結果的前幾名即代表數位文章重要的片段。然而PageRank演算法卻有可能高估數位文章片段的重要性而導致摘要的準確度降低,在研究中兩個主要造成高估數位文章片段問題的可能性首先被指出來:一、某些數位文章不容易造成記憶上的困難或選擇性閱讀;二、閱讀活動可以被視為一條往返於數位文章上的線。上述兩個理由造成閱讀樣式圖上點與點之間的低連結性,而其低連結性的現象就是造成高估數位文章片段的主因,根據上述兩個理由,三種解決方式在研究中被提出並且測試:一、考慮向後檢閱和反向往回檢閱的行為於閱讀樣式圖中;二、使用Site-Ranking演算法;三、使用AggregateRPG演算法。研究的實驗結果指出,任何單一的方式無法在實驗設定中造成大量摘要效果的改進,然而高估數位文章片段的問題可以透過同時採用三種方式來解決,此外以AggregateRPG為基礎的數位文章摘要的流程可以大大的減低以閱讀式樣圖為基礎的數位文章摘要所需要的時間,在文章最後,作者推薦同時使用所有的處理方式配合以閱讀樣式圖為基礎的數位文章摘要來達成未來所需的文章摘要工作。
英文摘要 Nowadays, people’s reading behaviors have become more selective than two decades ago. To acquire more information from eDocuments, people usually pay attention to significant contents while skimming through the remainder. This behavioral change motivated the author to develop a RPG based eDocument summarization approach to extract significant segments from an eDocument by analyzing people’s reading activities. To efficiently analyze reading activity, a set of reading behavior is categorized and adopted in this study to model a reading activity as a reading pattern graph (RPG), in which vertices are segments of an eDocument and edges are reading paths. For each RPG, the PageRank algorithm is applied to rank its vertices, and all ranking results are then aggregated as a synthetic ranking result. Consequently, the significant segments of an eDocument can be found in the top portion of the synthetic ranking result. However, there is a potential that some segments in an eDocument may be overrated by the PageRank analysis. In this work, two potential reasons, which cause the overrating problem, are identified: (1) Some eDocuments are difficult to cause memory difficulty or selective reading and (2) A reading activity can be treated as a traversed thread over an eDocument. The two reasons would cause a RPG’s vertices have a low connectivity, and the low connectivity phenomenon is the main reason that causes the overrating problem. According to the two reasons, three treatments are introduced and tested: (1) Adding the forward checking (FC) and the reverse backtracking (RBT) behaviors into a RPG, (2) Using the Site-Ranking algorithm, and (3) Using the AggregateRPG algorithm. The experimental result indicates that any single treatment cannot make a substantial improvement in the testing conditions. Nevertheless, the overrating problem would be properly dealt by simultaneously using all three treatments. In addition, the AggregateRPG based process can greatly reduce the required time of RPG based eDocument summarization. Finally, author recommends using all the treatments at a time to perform the future RPG based eDocument summarization tasks.
論文目次 Chapter 1 Introduction………………1
Chapter 2 Research Background……………………………………………5
2.1. The PageRank analysis……………………………………………5
2.2. Reading behaviors…………………………………………6
Chapter 3 Construction of Reading Pattern Graph…………………………..8
3.1. Notation definitions………………………………………8
3.2. Mapping reading behaviors to the RPG…………………………………9
Chapter 4 Summarizing eDocument by PageRank based Analysis……….………13
4.1. The RPG analysis by the PageRank algorithm….…………….……………13
4.2. Aggregating of ranking results………………………………………………….15
4.3. Evaluation……………………………………17
4.3.1. Experimental process…………………………………………19
4.3.2. Measurements………………………………25
4.3.3. Results………………………………27
4.4. Discussions………………………………33
Chapter 5 Improving the Effect and Efficiency of the RPG based eDocument
Summarization……………………39
5.1. The Site-Ranking algorithm……………………………42
5.2. Aggregating of RPGs………………………………….44
5.3. Evaluations………………………………47
5.3.1. Experiment I – Evaluating effectiveness of proposed treatments……47
5.3.2. Experiment II – Evaluation of eDocument summarization
efficiency....................58
5.4. Discussions……59
Chapter 6 Conclusions and Future Works……………………………………....64
References……………69
Appendix A Questionnaire for Surveying Backgrounds of Examinee………………71
Appendix B Questionnaire for Surveying Experiences of Examinee………………..72
Vita……………………………………73
參考文獻 Baeza-Yates, R. & Ribeiro-Neto, B., Modern Information Retrieval, Addison-Wesley, 1999.
Bazerman, C., Shaping written knowledge: The genre and activity of the experimental article in science, The University of Wisconsin Press, Wisconsin, 1988.
Bodner, R.C., Chignell, M.H., Charoenkitkarn, N., Golovchinsky, G., & Kopak, R.W., “The impact of text browsing on text retrieval performance,” Information Processing & Management, 37(3), 507-520, 2001.
Brin, S. & Page, L., “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN Systems, 30(1-7), 107-117, 1998.
Garner, R., “Strategies for reading and studying expository text,” Educational Psychologist, 22(3), 299-312, 1987.
Goldman, S.R. & Saul, E.U., “Flexibility in text processing,” Learning and Individual Differences, 2(2), 181-219, 1990.
Hornbæk, K. & Frøkjær, E., “Reading patterns and usability in visualizations of electronic documents,” ACM Transactions on Computer-Human Interaction, 10(2), 119-149, 2003.
Horney, M. & Anderson-Inman, L., “The ElectroText project: Hypertext reading patterns of middle school students,” Journal of Educational Multimedia and Hypermedia, 3(1), 71-91, 1994.
Kent, M.L., “Critical analysis of blogging in public relations,” Public Relations Review, 34(1), 32-40, 2008.
Kim, K.J., Kang, M.S., & Choi, Y.S., “A Site-Ranking algorithm for a small group of sites,” Lecture Notes in Computer Science, 4706, 397-405, 2007.
Liu, Z., “Reading behavior in the digital environment: Changes in reading behavior over the past ten years,” Journal of Documentation, 61(6), 700-712, 2005.
Liu, Z., “Print vs. electronic resources: A study of user perceptions, preferences, and use,” Information Processing & Management, 42(2), 583–592, 2006.
Piolat, A., Roussey, J.Y., & Thunin O., “Effects of screen presentation on text reading and revising,” International Journal of Human-Computer Studies, 47(4), 565-589, 1997.
Qayyum, M.A., “Capturing the online academic reading process,” Information Processing & Management, 44(2), 581-595, 2008.
Scardamalia, M. & Bereiter, C., “Development of strategies in text processing”, in Mandl, H., Stein, N. L., & Trabasso, T. (Eds.), Learning and comprehension of text, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc., 379-406, 1984.
Shih, W.C., Tseng, S.S., & Yang, C.T., “Wiki-based rapid prototyping for teaching-material design in e-Learning grids,” Computers & Education, 51(3), 1037-1057, 2008.
Shipman, F., Price, M., Marshall, C.C., & Golovchinsky, G., “Identifying useful passages in documents based on annotation patterns,” Lecture Notes in Computer Science, 2769, 101-112, 2004.
Wikipedia-Sigmoid function, http://en.wikipedia.org/wiki/Sigmoid_function. Data retrieved on November 25, 2008.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2011-01-06起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2014-01-06起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw