進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2401201300015300
論文名稱(中文) 結合階層式知識結構之文本分析
論文名稱(英文) Conceptual Text Mining with Hierarchical Knowledge Structures
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 101
學期 1
出版年 102
研究生(中文) 蔡馥璟
研究生(英文) Fu-Ching Tsai
學號 r78951042
學位類別 博士
語文別 英文
論文頁數 55頁
口試委員 口試委員-王朝煌
指導教授-李昇暾
口試委員-林清河
口試委員-耿伯文
口試委員-陳靜枝
口試委員-魏志平
中文關鍵字 概念性分析  模糊正規化概念分析  知識結構  文字探勘  資訊擷取  文件分類 
英文關鍵字 Formal concept analysis  Fuzzy formal concept analysis  Knowledge structure  Text mining  Information retrieval  Document classification 
學科別分類
中文摘要 隨著網路的興起,數位化文件已成為最主要的知識儲存媒體,因應此趨勢,文字探勘已被廣泛應用於快速自動化分析大量文件,但因文字本質即存在許多歧義字(例如﹕多義字、反義字等)及易受作者主觀寫作風格影響的特性,加上目前大量Web2.0之資料多為使用者所產生,因而較寫作品質無法統一且較無結構化,導致文件分析困難度提升,針對這個問題,許多研究指出知識結構能以系統化的方式組織大量的知識文件,並能從知識組成的架構中,推論其關鍵字之間的隱藏性關連,進而提升文件搜尋或分類的正確性。因此,本研究提出一個概念性文字探勘架構,針對網格及樹狀兩種階層性知識結構,分析其關鍵字的階層關係及從中擷取文件之文義,用以結合目前的文字探勘技術,解決現行文本分析所面臨的問題。
本研究所提出之網格知識結構概念分析法,以正規化概念輔佐文件分類,透過正規化概念可將關鍵字對類別的關係抽象化,並達到較佳的雜訊控制,實驗結果指出在路透社21578資料集中,本研究提出的方法顯示了卓越的分類精確度,另針對Web2.0資料集,因本方法具有抑制雜訊文字的特點,故從實驗數據可證實,本研究之文件抽象化對於Web2.0由使用者端產生的資料,具有較佳的分類正確性,另經由參數分析發現,網格概念分析法針對不同性質的資料集皆具有特定參數組合,故此法亦適合用於跨領域之文件分析。
另考量網格知識結構具有高運算複雜度,以致對該方法較不適合用於大量文件集,故本研究另提出了一個新的樹狀知識結構,其階層及單一繼承的特性,有助於減化複雜度,並提升使用者對領域知識的認知及理解力,實驗結果指出本研究所提出之樹狀結構可針對未結構化的知識文件,建構高品質的階層關係,以ACM CCS之文件分類結構標準與本研究所提出之結構進行驗證,兩者可達高度一致性,其文件中的文義亦可經由樹狀知識結構進行擷取,並從中獲得關鍵字之間正確的關連程度及相似度,用以結合文件分類技術,以達到較高的分類正確率。
英文摘要 Text mining is a critical technique to manage huge collections of documents. However, most existing text mining algorithms are easily affected by ambiguous terms. The ability to disambiguate for a classifier is thus as important as the ability to classify accurately. Knowledge structure (KS) has proven to be efficient in discovering the hidden structural relations and implications of knowledge, thus significant reasoning patterns are retrieved to enhance the efficiency of text analysis. In this research, we proposed a conceptual text mining framework based on two hierarchical KS model, lattice and tree, to discover the efficiency of incorporating hierarchical KS for retrieving context from corpus in text mining tasks.
The first model is based on fuzzy formal concept analysis to conceptualize documents into a more abstract form of concepts, and use these as the training examples to alleviate the arbitrary outcomes caused by ambiguous terms. The proposed model is evaluated on a benchmark testbed and two opinion polarity datasets. The experimental results indicate superior performance in all datasets. Applying concept analysis to opinion polarity classification is a leading endeavor in the disambiguation of Web 2.0 contents, and the approach presented in this paper offers significant improvements on current methods. The results of the proposed model reveal its ability to decrease the sensitivity to noise, as well as its adaptability in cross domain applications. However, the lattice-based model is suffered from highly computational complexity so as to limited in dealing with big data. To address this critical issue, we propose a new approach to construct a tree-based KS from corpus which can reveal the significant relations among knowledge objects and provide concise entity relations to avoid computation overload. The effectiveness of the second model is demonstrated with two representative public data sets. The evaluation results show that the method presented in this work achieves remarkable consistency with the domain-specific knowledge structure, and is capable of reflecting appropriate similarities among knowledge objects along with hierarchical implications in the document classification task.
論文目次 摘 要 I
ABSTRACT II
誌 謝 III
TABLE OF CONTENTS IV
LIST OF TABLES VI
LIST OF FIGURES VII
CHAPTER 1. INTRODUCTION 1
1.1 RESEARCH BACKGROUND AND MOTIVATION 1
1.2 RESEARCH OBJECTIVES 2
1.3 RESEARCH PROCESSES 3
1.4 ORGANIZATION OF DISSERTATION 5
CHAPTER 2. LITERATURE REVIEW 6
2.1 KNOWLEDGE REPRESENTATION 6
2.2 HIERARCHICAL KNOWLEDGE STRUCTURES 7
2.3 LATTICE-BASED KS 8
2.3.1 Formal Concept Analysis 8
2.3.2 Fuzzy formal concept analysis 11
2.3.3 FCA applications in information retrieval 12
2.4 TREE-BASED KS 14
CHAPTER 3. FUZZY FCA-BASED CONCEPTUALIZATION MODEL 16
3.1 RESEARCH ARCHITECTURE 16
3.1.1 Data Preprocessing Phase 17
3.1.2 Concept Reasoning Phase 18
3.1.3 Collaborative Recommendation Phase 21
3.2 EXPERIMENTS AND ANALYSIS 22
3.2.1 Data Collection 23
3.2.2 Evaluation of Accuracy 23
3.3 DISCUSSION 26
3.3.1 Analysis of Conceptualization 27
3.3.2 Analysis of Cross Domain Applications 28
CHAPTER 4. TREE-BASED CONCEPTUALIZATION MODEL 32
4.1 FUNDAMENTAL DEFINITIONS OF TKS 32
4.1.1 Hierarchical Features 32
4.1.2 Sibling Independence 33
4.2 CONSTRUCTION OF TREE-BASED KNOWLEDGE STRUCTURES 34
4.2.1 Knowledge Codification from Text Corpus 35
4.2.2 Similarity refinement 36
4.2.3 TKS construction algorithm 38
4.3 EXPERIMENTAL DESIGN AND ANALYSIS 40
4.3.1 Evaluation of the gold standards 40
4.3.2 Evaluation of Document Classification 43
CHAPTER 5 CONCLUSIONS AND FUTURE WORK 47
5.1 SUMMARY 47
5.2 LIMITATIONS 47
5.3 FUTURE WORKS 48
REFERENCES 49
參考文獻 1. Abbasi, A., H. Chen, and A. Salem, Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst., 2008. 26(3): p. 1-34.
2. Abbasi, A., Z. Zhang, D. Zimbra, H. Chen, and J.J.F. Nunamaker, Detecting fake websites: The contribution of statistical learning theory. MIS Quarterly, 2010. 34(3): p. 435-461.
3. Annoni, P. and R. Brüggemann, The dualistic approach of FCA: A further insight into Ontario Lake sediments. Chemosphere, 2008. 70(11): p. 2025-2031.
4. Aswani Kumar, C. and S. Srinivas, Concept lattice reduction using fuzzy K-Means clustering. Expert Systems With Applications, 2010. 37(3): p. 2696-2704.
5. Bollen, J., H. Mao, and X. Zeng, Twitter mood predicts the stock market. Journal of Computational Science, 2011. 2(1): p. 1-8.
6. Bradley, J.H., R. Paul, and E. Seeman, Analyzing the Structure of Expert Knowledge. Information & Management, 2006. 43(1): p. 77-91.
7. Carpineto, C. and G. Romano, Exploiting the potential of concept lattices for information retrieval with CREDO. Journal of Universal Computer Science, 2004. 10(8): p. 985-1013.
8. Chaovalit, P. and L. Zhou, Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches, in Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS '05)2005. p. 112.3.
9. Chen, R.C., J.Y. Liang, and R.H. Pan, Using recursive ART network to construction domain ontology based on term frequency and inverse document frequency. Expert Systems with Applications, 2008. 34(1): p. 488-501.
10. Chen, Y.J. and Y.M. Chen, Knowledge evolution course discovery in a professional virtual community. Knowledge-Based Systems, 2012. 33(0): p. 1-28.
11. Chi, Y.L., Elicitation Synergy of Extracting Conceptual Tags and Hierarchies. Expert Systems with Applications, 2007. 32(2): p. 349-357.
12. Chung, W., H. Chen, and J.F. Nunamaker Jr, A Visual Framework for Knowledge Discovery on the Web: An Empirical Study of Business Intelligence Exploration. Journal of Management Information Systems, 2005. 21(4): p. 57-84.
13. Cimiano, P., A. Hotho, and S. Staab, Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research, 2005. 24(1): p. 305-339.
14. Davidov, D., O. Tsur, and A. Rappoport, Semi-supervised recognition of sarcastic sentences in Twitter and Amazon, in Proceedings of the Fourteenth Conference on Computational Natural Language Learning2010, Association for Computational Linguistics: Uppsala, Sweden. p. 107-116.
15. De Maio, C., G. Fenza, V. Loia, and S. Senatore, Hierarchical web resources retrieval by exploiting Fuzzy Formal Concept Analysis. Information Processing & Management, 2012. 48(3): p. 399-418.
16. Eppler, M.J. and R.A. Burkhard, Visual Representations in Knowledge Management: Framework and Cases. Journal of Knowledge Management, 2007. 11(4): p. 112-122.
17. Everitt, B., Cluster Analysis1993, London: Edward Arnold.
18. Fellbaum, C., WordNet: An Electronic Lexical Database1998: MIT Press.
19. Fenza, G., V. Loia, and S. Senatore, A hybrid approach to semantic web services matchmaking. International Journal of Approximate Reasoning, 2008. 48(3): p. 808-828.
20. Formica, A. and M. Missikoff, Inheritance Processing and Conflicts in Structural Generalization Hierarchies. ACM Computing Surveys, 2004. 36(3): p. 263-290.
21. Formica, A., Ontology-based concept similarity in Formal Concept Analysis. Information Sciences, 2006. 176: p. 2624-2641.
22. Formica, A., Concept similarity in Formal Concept Analysis: An information content approach. Knowledge-Based Systems, 2008. 21(1): p. 80-87.
23. Formica, A., Semantic Web search based on rough sets and Fuzzy Formal Concept Analysis. Knowledge-Based Systems, 2012. 26(0): p. 40-47.
24. Ganter, B. and R. Wille, Formal Concept Analysis: Mathematical Foundations1999, New York: Springer-Verlag.
25. Garcia Esparza, S., M.P. O’Mahony, and B. Smyth, Mining the real-time web: A novel approach to product recommendation. Knowledge-Based Systems, 2012. 29(0): p. 3-11.
26. Glover, E., D.M. Pennock, S. Lawrence, and R. Krovetz. Inferring Hierarchical Descriptions. in Proceedings of the 20th International Conference on Information and Knowledge Management (CIKM). 2002. McLean, Virginia, USA: ACM.
27. Gupta, A., N. Kumar, and V. Bhatnagar, Incremental classification rules based on association rules using formal concept analysis, in Machine Learning and Data Mining in Pattern Recognition, P. Perner and A. Imiya, Editors. 2005, Springer: Berlin / Heidelberg. p. 635-635.
28. Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009. 11(1): p. 10-18.
29. Hsieh, T.C. and T.I. Wang, A mining-based approach on discovering courses pattern for constructing suitable learning path. Expert Systems With Applications, 2010. 37(6): p. 4156-4167.
30. Jaggia, S. and A. Kelly, Business Statistics: Communicating with Numbers2012: McGraw-Hill.
31. Joorabchi, A. and A.E. Mahdi, An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of Information Science, 2011. 37(5): p. 499-514.
32. Kang, X., D. Li, and S. Wang, Research on domain ontology in different granulations based on concept lattice. Knowledge-Based Systems, 2012. 27(0): p. 152-161.
33. Kim, M. and P. Compton, Evolutionary document management and retrieval for specialized domains on the web. International Journal of Human-Computer Studies, 2004. 60(2): p. 201-241.
34. Klir, G.J. and B. Yuan, Fuzzy sets and fuzzy logic: theory and applications1995, NJ: Prentice Hall.
35. Kwon, O.W. and J.H. Lee, Text categorization based on k-nearest neighbor approach for Web site classification. Information Processing & Management, 2003. 39 (1): p. 25-44.
36. Lammari, N. and E. Metais, Building and maintaining ontologies: a set of algorithms. Data & Knowledge Engineering, 2004. 48(2): p. 155-176.
37. Lee, C.S., Z.W. Jian, and L.K. Huang, A Fuzzy Ontology and its Application to News Summarization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2005. 35(5): p. 859-880.
38. Lee, C.S., Y.F. Kao, Y.H. Kuo, and M.H. Wang, Automated ontology construction for unstructured text documents. Data & Knowledge Engineering, 2007. 60(3): p. 547-566.
39. Li, S.T. and F.C. Tsai, Constructing tree-based knowledge structures from text corpus. Applied Intelligence, 2010. 33(1): p. 67-78.
40. Li, S.T., C.C. Chen, and F. Huang, Conceptual-driven classification for coding advise in health insurance reimbursement. Artificial Intelligence in Medicine, 2011. 51(1): p. 27-41.
41. Li, S.T. and F.C. Tsai, Noise control in document classification based on fuzzy formal concept analysis, in IEEE International Conference on Fuzzy Systems (FUZZ)2011. p. 2583-2588.
42. Li, Y., Z.A. Bandar, and D. McLean, An Approach for Measuring Semantic Similarity between Words Using Multiple Information Source. IEEE Transactions on Knowledge and Data Engineering, 2003. 15(4): p. 871-882.
43. Manning, C.D., P. Raghavan, and H. Schütze, Introduction to information retrieval2008: Cambridge University Press.
44. Meddouri, N. and M. Maddouri, Boosting formal concepts to discover classification rules, in Next-Generation Applied Intelligence, B.C. Chien, et al., Editors. 2009, Springer Berlin / Heidelberg. p. 501-510.
45. Min, B., J. Kim, C. Choe, H. Eom, and R.I. McKay, A compound framework for sports results prediction: A football case study. Knowledge-Based Systems, 2008. 21(7): p. 551-562.
46. Murphy, G.L. and M.E. Lassaline, Hierarchical Structure in Concepts and the Basic Level of Categorization, in Knowledge, Concepts and Categories, K. Lamberts and D. Shanks, Editors. 1997, Psychology Press.
47. Nolan, J.R., Computer systems that learn: an empirical study of the effect of noise on the performance of three classification methods. Expert Systems With Applications, 2002. 23(1): p. 39-47.
48. Novak, J.D., How do we learn our lesson? Taking students through the process. The Science Teacher, 1993. 60(3): p. 50-55.
49. Pan, S.J., X. Ni, J.T. Sun, Q. Yang, and Z. Chen, Cross-domain sentiment classification via spectral feature alignment, in Proceedings of the 19th international conference on World wide web2010, ACM: New York, USA. p. 751-760.
50. Pang, B. and L. Lee, A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics2004, Association for Computational Linguistics: Barcelona, Spain. p. 271.
51. Park, S., M. Ko, J. Kim, Y. Liu, and J. Song, The politics of comments: predicting political orientation of news stories with commenters' sentiment patterns, in Proceedings of the ACM 2011 conference on Computer supported cooperative work2011, ACM: Hangzhou, China. p. 113-122.
52. Pattaraintakorn, P., V. Boonjing, and J. Tadrat, A new case-based classifier system using rough formal concept analysis, in The Third International Convergence and Hybrid Information Technology (ICCIT '08)2008. p. 645-650.
53. Peter, J., Incorporating context in text analysis by interactive activation with competition artificial neural networks. Information Processing & Management, 2005. 41(5): p. 1081-1099.
54. Poelmans, J., P. Elzinga, S. Viaene, and G. Dedene, Formal concept analysis in knowledge discovery: A survey, in Conceptual Structures: From Information to Intelligence, M. Croitoru, S. Ferré, and D. Lukose, Editors. 2010, Springer: Berlin / Heidelberg. p. 139-153.
55. Priss, U., Formal concept analysis in information science. Annual Review of Information Science and Technology, 2006. 40(1): p. 521-543.
56. Priss, U., Formal Concept Analysis in Information Science, in Annual Review of Information Science and Technology (ARIST), B. Cronin, Editor 2006, Information Today Medford, New Jersey. p. 521-543.
57. Quan, T.T., S.C. Hui, and T.H. Cao, A Fuzzy FCA-Based Approach for Citation-based Document Retrieval, in IEEE Conference on Cybernetics and Intelligent Systems2004: Singapore. p. 578 - 583.
58. Quan, T.T., S.C. Hui, A.C.M. Fong, and T.H. Cao, Automatic fuzzy ontology generation for semantic Web. IEEE Transactions on Knowledge and Data Engineering, 2006. 18(6): p. 842-856.
59. Rajapakse, R.K. and M. Denham, Text retrieval with more realistic concept matching and reinforcement learning. Information Processing & Management, 2006. 42(5): p. 1260-1275.
60. Reformat, M. and C. Ly, Ontological approach to development of computing with words based systems. International Journal of Approximate Reasoning, 2009. 50(1): p. 72-91.
61. Rocha, R. and Á. Cobo, Feature selection strategies for automated classification of digital media content. Journal of Information Science, 2011. 37(4): p. 418-428.
62. Rodrı´guez, M.A. and M.J. Egenhofer, Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE Transactions on Knowledge and Data Engineering, 2003. 15(2): p. 442-456.
63. Ruiz-Primo, M.A., S.E. Schultz, M. Li, and R.J. Shavelson, Comparison of the Reliability and Validity of Sores from Two Concept-mapping Techniques. Journal of Research in Science Teaching, 2001. 38(2): p. 260-278.
64. Sanderson, M. and D. Lawrie, Build, Testing and Applying Concept Hierarchies, in Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval2000, Springer US: Netherlands. p. 235-266.
65. Schvaneveldt, R.W., Pathfinder Associative Networks: Studies in Organization1990, Norwood, NJ: Albex Publishing.
66. Sparrow, J., Knowledge in Organizations: Access to Thinking at Work1998, London: Sage.
67. Steinbach, M., G. Karypis, and V. Kumar, A Comparison of Document Clustering Techniques, in KDD Workshop on Text Mining2000: Boston, MA, USA.
68. Stumme, G., R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal, Computing Iceberg Concept Lattices with TITANIC. Data & Knowledge Engineering, 2002. 42(2): p. 189-222.
69. Stumme, G., Off to New Shores: Conceptual Knowledge Discovery and Processing. International Journal Human-Computer Studies, 2003. 59(3): p. 287-325.
70. Tadrat, J., V. Boonjing, and P. Pattaraintakorn, Building classification rules for case-based classifier using fuzzy sets and formal concept analysis, in Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology2008, ACM: Cergy-Pontoise, France. p. 13-18.
71. Tadrat, J., V. Boonjing, and P. Pattaraintakorn, A new similarity measure in formal concept analysis for case-based reasoning. Expert Systems With Applications, 2012. 39(1): p. 967-972.
72. Tang, J., J. Li, B. Liang, X. Huang, Y. Li, and K. Wang, Using Bayesian decision for ontology mapping. Web Semantics: Science, Services and Agents on the World Wide Web, 2006. 4(4): p. 243-262.
73. Tho, Q.T., S.C. Hui, Fong, and T.H. Cao, Automatic Fuzzy Ontology Generation for Semantic Web. IEEE Transactions on Knowledge and Data Engineering, 2006. 18(6): p. 842-856.
74. Treeratpituk, P. and J. Callan, Automatically Labeling Hierarchical Clusters, in Proceedings of the 6th National Conference on Digital Government Research2006: San Diego, USA. p. 167-176.
75. Wang, J., A Knowledge Network Constructed by Integrating Classification, Thesaurus, and Metadata in Digital Library. International Information & Library Review, 2003. 35(2-4): p. 383-397.
76. Wang, T.Y. and H.M. Chiang, Fuzzy support vector machine for multi-class text categorization. Information Processing & Management, 2007. 43(4): p. 914-929.
77. Wille, R., Formal concept analysis as mathematical theory of concepts and concept hierarchies. Lecture Notes in Computer Science, 2005. 3626: p. 1-33.
78. Wilson, D.R. and T.R. Martinez, Reduction Techniques for Instance-Based Learning Algorithms. Machine learning, 2000. 38(3): p. 257-286.
79. Wu, H.C., R.W.P. Luk, K.F. Wong, and K.L. Kwok, Interpreting TF-IDF Term Weights as Making Relevance Decisions. ACM Transactions on Information Systems, 2008. 26(3): p. 1-37.
80. Xu, J.J. and H. Chen, CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery. ACM Transactions on Information Systems, 2005. 23(2): p. 201-226.
81. Xue, G., W. Dai, Q. Yang, and Y. Yu, Topic-bridged PLSA for cross-domain text classification, in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval2008, ACM: Singapore, Singapore. p. 627-634.
82. Yates, R.B. and B.R. Neto, Modern Information Retrieval1999, New York: ACM Press.
83. Zárate, L.E., S. Mariano Dias, and M.A. Junho Song, FCANN: A new approach for extraction and representation of knowledge from ANN trained via Formal Concept Analysis. Neurocomputing, 2008. 71(13-15): p. 2670-2684.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2023-12-18起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw