進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2904201317581100
論文名稱(中文) 數位資源之脈絡資訊探索及選用決策模式
論文名稱(英文) Contextual Information Exploration and Decision Model of Digital Document Resources
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 101
學期 2
出版年 102
研究生(中文) 郭俊良
研究生(英文) Jiunn-Liang Guo
學號 R78951018
學位類別 博士
語文別 英文
論文頁數 71頁
口試委員 指導教授-王惠嘉
口試委員-李健興
口試委員-陳偉凡
口試委員-高宏宇
口試委員-林懿貞
口試委員-劉任修
中文關鍵字 脈絡資訊  語意分析  語段分析法  權重式網頁存取評量法 
英文關鍵字 Contextual information  semantics  discourse analysis  weighted pagerank 
學科別分類
中文摘要 當人類進入資訊時代,文件資料的數位化逐漸改變資訊擷取的方式,同時使得知識的獲得更為便利。然而隨著網路資源的不斷累積,漸漸形成海量等級的數位資料,間接產生許多資源管理上的問題,例如:關鍵資訊的搜尋不易、文件自動化處理的難度增加以及資源管理效能降低的議題等。因此,近年來許多的研究人員投入相關的研究領域,希望運用自然語言處理(Natural Language Processing)、文件採礦(Text Mining)及資訊擷取(Information Retrieval)等技術從不同的層面針對數位文件資源進行分析,期能提出更有效率的方法來改善文件資源的運用及管理作為。
有關數位文件資源的研究範圍相當廣泛,其中文件內容分析及文件資源重要性評量等領域是近年來的重要研究議題。有關文件內容分析的研究方法,多數學者主要是針對文件中出現的字詞頻率與特性進行統計分析。然而,不論何種文件資源,文本內容及資源的選用行為具有許多面向,若單從文件的用字遣詞來探討其中的重要性,不但無法深入探究文件的意義,亦將忽略文章結構中的前後文意連貫性或前後文意的脈絡關聯所隱含的重要特性,進而將使得分析的結果在未來的應用上受到限制。另一方面,文件資源重要性評量的相關議題亦受到廣泛的重視。其中在區域學術資源(如電子期刋)選用的決策評量方面,網頁式的線上系統中所具備的超連結功能,間接提供研究人員在參考相關延伸資料時重要訊息的引導。該特性與研究論文中所引用參考資料的決策過程隱含了許多重要的前後脈絡關聯值得加以重視。
綜觀在資訊擷取領域中有關脈絡資訊(contextual information)的研究議題大致可分為兩個方向進行探討。第一個方向著重在文件內文的脈絡特徵,而另一個方向則是從數位資源運用行為的層面來進行探討。因此,本論文即針對這兩方向的議題提出探索式研究設計。首先,在第一部份的研究中所提出的方法主要是利用具語意考量的語段分析(discourse analysis)技術來檢視文本內容脈絡的連貫性及語意轉折,藉此決定全文的語意段落(discourse segment)。隨後則透過改良的特徵擷取法(feature selection),自語段中選取隱含的重要特徵 - 語段次主題 (discourse subtopics) 以形成特徵集,最後藉由自動化文件分類的實驗結果驗證該方法的成效。第二部份將檢視學術文件資源的運用模式,並建構核心期刋的評估決策模式,期能透過提出的權重式網頁存取評量法(weighted PageRank)檢視數位文件資源(如電子期刋)存取行為中的脈絡關聯性,同時結合研究者文獻引用的資訊,建構區域電子期刋評估指標 (Local Impact Factor, LIF),以協助資源使用者在引用相關學術資料及圖書資訊管理人員未來在進行電子期刋資料庫採購工作時的決策參考。
經由本研究相關實驗結果得知,文件內容以及文件資源的選用行為的確隱含重要的脈絡資訊。透過本研究所提出的方法證明,脈絡資訊可透過設計的方法萃取並進而應用於改善自動化文件分類工作及評估重要數位資源時的決策參考。
英文摘要 Digital document resources possess implicit contextual information, which raises many research challenges in the information retrieval discipline. Such information remained either in the discourse context of document or in the access of web-based resources has led to the need of deep investigation on the value of contextual attributes to the widespread application of information processing. For the content of document, the contextual information is believed to be existed in the discourse segments of text, which has long been treated as difficult issue because of the diversified document structure. On the other hand, the contextual information occurred in the access of web resources is even more difficult to be explored because it involves the unpredictable human behavior and the varied background knowledge. In addition, such a circumstance makes monitoring the user decision-making process even more complicated because the usage of resource is untraceable. However, contextual information has long been treated as important pattern which is believed to be a critical factor to improving the performance of information processing.
Regarding the analysis of contextual information, this work aims to propose two novel approaches on the exploration of contextual information existed in both textual level and web access aspects by means of adopting discourse structure analysis and designing a core decision model, respectively. For textual resources, this study designs a framework to detect the context by analyzing the discourse structure not only addressing the shifts and continuity of coherent subtopics but also exploiting the syntactic attributes, which are capable of enhancing the performance of text classification. To inspect the validation, the first model will implement to e-book classification task to testify the contribution of the explored contextual information. For the web access aspect, the second study focuses on the local access of digital library and proposes a novel system - the Local Impact Factor (LIF) to evaluate and rank the importance of digital resources. The system investigates the requirements of local user community as incorporating both the access rate of adopted journals and the weighted impact factor technique to capture the contextual information existed between the usages of resources and citation of thesis. And, by measuring the citation information from the local users’ articles, it helps reveal the relationship between the download of resources and the real application of the citation decision.
Both studies are fully implemented and tested on two real-world datasets together with a series of integrated experiments. As the result, the evaluations have demonstrated the vital role of contextual information existed in both textual and web resources and the significant improvement in performance is also revealed. Also, our proposed methods are proven to be feasible and beneficial for future information processing applications.
論文目次 中文摘要 I
Abstract II
誌謝 III
Chapter 1 Introduction 1
1.1 Research background and motivation 1
1.2 Research purposes 2
1.3 Research procedure 5
Chapter 2 Literature Review 6
2.1 Discourse analysis for text contextual information 6
2.2 Subtopics in discourse context 8
2.3 Literatures to evaluating digital journal resources 10
2.4 Contextual analysis of web-based resources 12
Chapter 3 Research Methodology and Framework 15
3.1 Research framework for study 1 15
3.1.1 Detection of discourse contextual information 16
3.1.2 Block similarity measurement in text continuity 18
3.1.3 Discourse segment identification 22
3.1.4 Subtopic feature selections from discourse segments 25
3.1.5 Classification of digital document recourse 28
3.2 Research framework for study 2 29
3.2.1 Citation analysis module 30
3.2.2 Local impact factor evaluation module 31
3.2.3 Academic journal recommendation module 33
Chapter 4 Results of Evaluations and Discussions 34
4.1 Evaluation of study 1 – contextual information in text 34
4.1.1 Dataset collection 34
4.1.2 Evaluation metrics 35
4.1.3 Experiments of study 1 36
4.2 Evaluation of study 2 – context in access of digital resources 46
4.2.1 Experimental environment and data collection 46
4.2.2 Experiments of study 2 47
4.2.3 Findings and discussions 50
Chapter 5 Conclusions 52
Reference 55
Appendix A 66
Appendix B 67
Appendix C 68
Appendix D 69
Appendix E 70
Appendix F 71
參考文獻 Arauzo-Azofra, A., Benitez, J., and Castro, J. (2008), “Consistency measures for feature selection”, Journal of Intelligent Information Systems, Vol. 30, No. 3, pp. 273-292.
Bauerly, R. J., and Johnson, D. T. (2005), “An evaluation of journals used in doctoral marketing programs”, Journal of the Academy Of Marketing Science, Vol. 33, No. 3, pp. 313-329.
Beattie, V. A., and Ryan, R. J. (1991), “The impact of non-serial publications on research in accounting and finance”, ABACUS, Vol. 27, No. 1, pp. 32-50.
Bolchini, C., Curino, C. A., Quintarelli, E., Schreiber, F. A., and Tanca, L. (2009), “Context information for knowledge reshaping”, Internatonal Journal of Web Engineering and Technology, Vol. 5, No. 1, pp. 88-103.
Bollen, J., Luce, R., Vemulapalli, S., and Xu, W. (2003), “Detecting research trends in digital library readership”, In Proceedings of the seventh European conference on digital libraries, Springer-Verlag, Trondheim, Norway, pp. 24-28.
Bollen, J., Rodriguez, M., and van de Sompel, H. (2006), “Journal status”, Scientometrics, Vol. 69 No. 3, pp. 669-687.
Bollen, J., van de Somple, H., Smith, J. A., and Luce, R. (2005), “Toward alternative metrics of journal impact: A comparison of download and citation data”, Information Processing and Management, Vol. 41, No. 6, pp. 1419-1440.
Bollen, J., and van de Sompel, H. (2008), “Access impact factor: The effects of sample characteristics on access-based impact metrics”, Journal of the American Society for Information Science and Technology, Vol. 59, No. 1, pp. 136-149.
Brin, S., and Page, L. (1998), “The anatomy of a large scale hypertextual web search engine”, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pp. 107-117.
Brown, G., and G. Yule (1983), Discourse Analysis, Cambridge: CUP.
Budd, J. M., and Raber, D. (1996), “Discourse analysis: method and application in the study of information. Information Processing and Management”, Vol. 32, No. 2, pp. 217-226.
Chan, S. W. K. (2004), “Automatic discourse structure detection using shallow textual continuity”, International Journal of Human-Computer Studies, Vol. 61, No. 1, pp. 138-164.
Chang, C. C., and Lin, C. J. (2011), “LIBSVM: A library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 1-27.
Chen, S. J., Li, S. T., Lin, H. W., Hung S. C., Chang, C. T., and Yeh, S. K. (2005), “Diversity in management journals in Taiwan: Ranking of journal quality”, Sun Yat-Sen Management Review, Vol. 13, No. 1, pp. 15-48.
Cheng, C. H., Kumar, A., Motwani, J. G., and Reisman, A. (1999), “A citation analysis of the technology innovation management journals”, IEEE Transactions on Engineering Management, Vol. 46, No. 1, pp. 4-13.
Choudhary, A. K., Harding, J. A., and Popplewell, K. (2006), “Knowledge discovery for moderating collaborative projects”, In Proceedings of the 4th IEEE International Conference on Industrial Informatics, Singapore, pp. 519-524.
Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., and Zarco, C. (2003), “A review on the application of evolutionary computation to information retrieval”, International Journal of Approximate Reasoning, Vol. 34, No. 2-3, pp. 241-264.
Darmoni, S. J., Roussel, F., Benichou, J., Thirion, B., and Pinhas, N. (2002), “Reading factor: A new bibliometric criterion for managing digital libraries”, Journal of the Medical Library Association, Vol. 90, No. 3, pp. 323–327.
Debes, M., Lewandowska, A., and Seitz, J. (2005), “Definition and Implementation of Context Information”, Paper presented at the Joint second Workshop on Positioning Navigation and Communication.
Declan Butler (2008), “Free journal-ranking tool enters citation market”, Nature, Vol. 451, No.6, pp. 6.
Diaz, J., Black, R. T., and Rabianski, J. (1996), “A note on the ranking of real estate research journals”, Real Estate Economics, Vol. 24, No. 4, pp. 551-563.
Do, T. D., Hui, S. C., and Fong, A. C. M. (2006), “Associative Feature Selection for Text Mining”, International Journal of Information Technology, Vol. 12, No. 4, pp. 59-68.
Extejt, M. M., and Smith, J. E. (1990), “The behavioral sciences and management: An evaluation of relevant journals”, Journal of Management, Vol. 16, No. 3, pp. 539-551.
Forgionne, G. A., and Kohli, R. (2001), “A multiple criteria assessment of decision technology system journal quality”, Information and Management, Vol. 38, pp. 421-435.
Forman, G. (2003), “An Experimental Study of Feature Selection Metrics for Text Categorization”, Journal of Machine Learning Research, Vol. 3, pp. 1289-1305.
Forrester, M. A., Ramsden, C., and Reason, D. (1997), “Conversation and Discourse Analysis in Library and Information Services”, Education for Information,1Vol. 5, No. 4, pp. 283-295.
Garfield, E. (1979), Citation indexing: Its theory and application in science, technology, and humanities. New York: John Wiley and Sons.
Gillenson, M. L., and Stutz, J. D. (1991), “Academic issues in MIS: journals and books”, MIS Quarterly, Vol. 15, No. 4, pp. 447-452.
Gillian, B., and Yule, G. (1983), Discourse Analysis. Cambridge, United Kingdom: Cambridge University Press.
González-Pereira, B., V.P. Guerrero-Bote and F. Moya-Anegón (2009), The SJR indicator: A new indicator of journals' scientific prestige, arXiv:0912.4141v1.
Grimes, J. E. (1972), The Thread of Discourse. Cornell University, Ithaca, NY.
Grosz, B. J., Weinstein, S., and K., J. A. (1995), “Centering: a framework for modelling the local coherence of discourse”, Computational Linguistics, Vol. 21, pp. 203-225.
Harding, J. A., M.Shahbaz, Kuisak, A., and Srinivas. (2006), “Data mining in manufacturing: a review”, Journal of Manufacturing Science and Engineering, Vol. 128, No. 4, pp. 969-976.
Hearst, M. A. (1997), “TextTiling: segmenting text into multi-paragraph subtopic passages”, Computational Linguistics, Vol. 23, No. 1, pp. 33-64.
Huang, K.-C., Geller, J., Halper, M., Perl, Y., and Xu, J. (2009), “Using WordNet synonym substitution to enhance UMLS source integration”, Artif. Intell. Med., Vol. 46, No. 2, pp. 97-109.
Holsapple, C. W., Johnson, L. E., Manakyan, H., and Tanner, J. T. (1994), “Business computer research journals: a normalized citation analysis”, Journal of Management Information Systems, Vol. 11, No. 1, pp. 131-140.
Hovy, E. H. (1994), “Automated discourse generation using discourse structure relations”, Artificial Intelligence, Vol. 63, pp. 341-385.
Joachims, T., Informatik, F., and Viii, L. (1997), Text categorization with Support Vector Machines: Learning with many relevant features.
Kaplan, N. R., and Nelson, M. L. (2000), “Determining the publication impact of a digital library”, Journal of the American Society for Information Science and Technology, Vol. 51, No. 4, pp. 324-339.
Kauchak, D., and Chen, F. (2005), “Feature-based segmentation of narrative documents”, In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor, Michigan, United States, pp. 32-39.
Komatsu, S. (1996), “JCR for citation analysis”, Joho Kanri, Vol. 39, No. 3, pp. 199-207.
Konchady, M. (2006), Text Mining Application Programming (Programming Series): Charles River Media, Inc.
Kovacevic, A., Devedzic, V., and Pocajt, V. (2010), “Enhancing a core journal collection for digital libraries”, Journal of Program: electronic library and information systems, Vol. 44, No. 2, pp. 132-148.
Kullback, S., and Leibler, R.A. (1951), “On Information and Sufficiency”, Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79–86.
Kupiec, J., Pedersen, J., and Chen, F. (1995), “A trainable document summarizer”, In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, United States, pp. 68-73.
Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M., and Murray, S. S. (2005), “The bibliometric properties of article readership information”, Journal of the American Society for Information Science and Technology, Vol. 56, No. 2, pp. 111-128.
Lautamatti, L. (1978), “Observations on the development of the topic in simplified discourse”, In V. Kohonen and N.E. Enkvist (eds.) 1978. Text Linguistics, Cognitive Learning and Language Teaching. Turku, Finland.
Lewis, D. D., and Ringuette, M. (1994), “A Comparison of Two Learning Algorithms for Text Categorization”, In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, United States, pp. 81-93.
Li, S., Xia, R., Zong, C., and Huang, C.-R. (2009), “A framework of feature selection methods for text categorization”, In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, Vol. 2, pp. 692-700.
Li, H., and Yamanishi K. (2003), “Topic analysis using a finite mixture model”, Information Processing and Management, Vol. 39, pp. 521-541.
Liang T. P., and Ku, Y. C. (2004), “Diversity in international information and management journals: Ranking by journal quality”, In Proceeding of 15th international conference on information management, Chungli, Taiwan.
Liebowitz, S. J., and Palmer, J. P. (1984), “Assessing the relative impacts of economic journals”, Journal of Economic Literature, Vol. 22, No. 1, pp. 77-88.
Line, M. (1997), “On the irrelevance of citation analyses to practical librarianship”, In Proceeding of European conference on the application of research in information services in libraries, London: Aslib, pp. 53-55.
Mann, W. C., and Thompson, S. A. (1988), “Rhetorical Structure Theory: Toward a functional theory of text organization”, Text - Interdisciplinary Journal for the Study of Discourse, Vol. 8, No. 3, pp. 243-281.
Maron, M. E. (1961), “Automatic Indexing: An Experimental Inquiry”, Journal of the ACM, Vol.8, No. 3, pp. 404-417.
Matthew E. Falagas et al (2008), “Comparison of SCImago journal rank indicator with journal impact factor”, The FASEB Journal, Vol.22, No.22, pp. 2623-2628.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1993). Five Papers on WordNet. Technical Report, Princeton University, Princeton, NJ, USA.
Mladenic, D. (1998), “Feature Subset Selection in Text-Learning”, In Proceedings of the 10th European Conference on Machine Learning, Glasgow, UK, pp. 95-100.
Mooney, D. J., Carberry, S., and McCoy, K. F. (1990), “The generation of high-level structure for extended explanations”, In Proceedings of the 13th conference on Computational linguistics, Helsinki, Finland, Vol. 2, pp. 276-281.
Neaga, E. I., and Harding, J. A. (2005), “An enterprise modelling and integration framework based on knowledge discovery and data mining”, International Journal of Production Research, Vol. 43, No. 6, pp. 1089 – 1108.
Noruzi, A. (2005), “The web impact factor: A survey of some Iranian university web sites”, Journal of Education and Psychology, Vol. 5, No. 2, pp. 105-119.
Noruzi, A. (2006), “The web impact factor: a critical review”, The Electronic Library, Vol. 24, No. 4, pp. 490-500.
Paice, C. D. (1990), “Constructing literature abstracts by computer: techniques and prospects”, Information Processing and Management, Vol. 26, No. 1, pp. 171-186.
Paradis, F., and Nie, J.-Y. (2007), “Contextual feature selection for text classification”, Information Processing and Management, Vol. 43, No. 2, pp. 344-352.
Peng, H., Long, F., and Ding, C. (2005), “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226-1238.
Pham, D. T., and Afify, A. A. (2005), “Machine learning techniques and their applications in manufacturing”, In Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, Vol. 219, No. 5, pp. 395-412.
Saeys Y., Inza I., and Larrañaga P. (2007), “A review of feature selection techniques in bioinformatics”, Bioinformatics, Vol. 23, No. 19, pp. 2507-2517.
Salton, G. (1968), Automatic information organization and retrieval. USA: McGraw-Hill.
Salton, G., and M. J. McGill (1983), Introduction to modern information retrieval. McGraw-Hill.
Salton, G., Wong, A., and Yang, C. S. (1975), “A vector space model for automatic indexing”, Communications of the ACM, Vol. 18, No. 11, pp. 613-620.
SCImago. (2007). SJR — SCImago Journal & Country Rank. Retrieved June 12, 2011, from http://www.scimagojr.com
Sidiropoulos, A., and Manolopoulos, Y. (2005), “A new perspective to automatically rank scientific conferences using digital libraries”, Information Processing and Management, Vol. 41, No. 2, pp. 289-312.
Soricut, R., and Marcu, D. (2003), “Sentence level discourse parsing using syntactic and lexical information”, In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, Vol. 1, pp. 149-156.
Soteriou, A. C. Hadjinicola, G. C., and Patsia, K. (1999), “Assessing production and operations management related journals: The European perspective”, Journal of Operations Management, Vol. 17, No. 2, pp. 225-238.
Stark, H. A. (1988), “What do paragraph markings do?”, Discourse Processes, Vol. 11, No. 3, pp. 275 - 303.
Tagarelli, A., and Karypis, G. (2008), “A Segment-based Approach To Clustering Multi-Topic Documents”, In the Text Mining Workshop, SIAM Datamining Conference 2008.
Torrance,. M., and Bouayad-Agha, N. (2001), “Rhetorical structure analysis as a method for understanding writing processes”, In Proceedings of the International Workshop on Multi-disciplinary Approaches of discourse.
Turban, E., Zhou D., and Ma, J. (2004), “A group decision support approach to evaluating journals”, Information and Management, Vol. 42, No. 1, pp. 31-44.
Voorhees, E. M. (1993), “Using WordNet to disambiguate word senses for text retrieval”, In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, Pittsburgh, ennsylvania, United States, pp. 171-180.
Walstrom, K. A., and Hardgrave, B. C. (2001), “Forums for information systems scholars: III”, Information and Management, Vol. 39, No. 2, pp. 117-124.
Wing, C. K. (1997), “The ranking of construction management journals”, Construction Management and Economics, Vol. 15, No. 4, pp. 387-398.
Xie, X. L., and Beni, G. (1991), “A Validity Measure for Fuzzy Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp. 841-847.
Xing, W., and Ghorbani, A. (2004), “Weighted pagerank algorithm”, In Proceeding of second annual conference on communication networks and services research, Fredericton, N.B., Canada, pp. 305-314.
Xu, Y., Wang, B., Li, J., and Jing, H. (2008), “An extended document frequency metric for feature selection in text categorization”, In Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China, pp. 71-82.
Yu, L., and Liu, H. (2003), “Efficiently handling feature redundancy in high-dimensional data”, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C., United States, pp. 685-690.
Zhang, Y., Xie, F., Huang, D., and Ji, M. (2010), “Support vector classifier based on fuzzy c-means and Mahalanobis distance”, Journal of Intelligent Information Systems, Vol. 35, No. 2, pp. 333-345.
Zinkhan, G. M. (2004), “Accessing academic research through an e-database: Issues of journal quality and knowledge use”, Journal of the Academy of Marketing Science, Vol. 32, No. 4, pp. 369-370.
Zhong, N., Dong, J., & Ohsuga, S. (2001), “Using Rough Sets with Heuristics for Feature Selection”, Journal of Intelligent Information Systems, Vol. 16, No. 3, pp. 199-214.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-05-06起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw