進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0507201220230000
論文名稱(中文) 混合式學術會議資訊分類法
論文名稱(英文) A Hybrid Classification Method for Conference Information
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 100
學期 2
出版年 101
研究生(中文) 林柏安
研究生(英文) Po-An Lin
學號 R76004069
學位類別 碩士
語文別 中文
論文頁數 74頁
口試委員 指導教授-王惠嘉
口試委員-李昇暾
口試委員-盧文祥
口試委員-高宏宇
中文關鍵字 文字探勘  特徵選取  SVM  混合式分類 
英文關鍵字 text mining  feature selection  SVM  hybrid classification 
學科別分類
中文摘要 隨著資訊科技的快速發展,許多研究學者們為瞭解新的研究議題,並與其他研究學者進行交流,會上網搜尋學術會議資訊,並選擇適合的會議參加。儘管已有些許網站提供部分會議資訊,但這些網站因是被動輸入,不僅許多會議無法找到,也不易搜尋得到真正所需的資訊。且以人工方式進行過濾是一件費時費力的工作,因此,如何幫助研究學者從大量的會議資訊中找出合適的會議參加,是一個重要的議題。
為了要讓研究學者快速找尋到適合的會議資訊,本研究將利用文字探勘技術過濾,並將會議資訊分類,以期能蒐集完整的資訊並透過分類,讓使用者可以容易找到適合自己有興趣的會議。因過去文獻的傳統分類演算法像是Support Vector Machine、Decision Tree和Naïve Bayes Classifier,並未有專門針對學術會議資訊做處理,如只用傳統文字方式分類可能會造成分類錯誤的情形發生。故本研究的目的為設計一為以學術會議資訊的分類演算法。
考量學術會議資訊中,常常會出現該領域的專有名詞,而這類的專有名詞又以兩個字詞為一組居多,因此在分析字詞重要程度時,會把這類情形納入考慮,並提出適合學術會議類型資料的特徵選取方法。另外,不同的分類演算法有各自的優缺點,為此,本研究採用混合式分類方式,期望能透過整合傳統分類演算法,達到更好的分類效益,藉由本研究提出的學術會議分類演算法,能讓研究學者可以快速並精準地找出適合的會議。
英文摘要 There are many researchers who want to realize the latest research topic and exchange information with others. They will surf the Internet for scholar conference information, and choose some of them to attend. Some websites have provided part of conference information, but most of them cannot help users find the information users really want to explore; besides, it is a hard work to filter the searched conference information by human. Hence, it is an important issue to help reseachers find out the suited conference information from the huge dataset to attend.
To find out the suited conference information efficiently for researcher, this study will classify the conference by text mining. The previous references of traditional classification algorithm like Decision Tree, Naïve Bayes Classifier and Support Vector Machine are not designed to classify documents of conference information, so when we classify these academic documents, we may get some incorrect answers. Therefore, the goal of this study is designing a classification algorithm for conference information.
Because there are many terminology nouns or phrases which consist of two words in the conference, when we analyze the importance of the terms, we should take this situation into consideration. Moreover, there are pros and cons in different existing classification algorithms, so the hybrid classification is adopted to integrate the traditional algorithm. We expect the new method designing for conference information can help researchers find the suited conferences efficiently and exactly.
論文目次 目錄
第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 研究範圍與限制 4
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 資料檢索 8
2.2 自然語言處理 10
2.2.1 詞性標註 10
2.2.2 字根還原 11
2.3 特徵選取 11
2.3.1 文件頻率 12
2.4 文件分類方法 12
2.4.1 簡單貝氏分類器(Naïve Bayes Classifier) 14
2.4.2 支援向量機(SVM) 15
2.4.3 決策樹(Decision Tree) 17
2.5 會議資訊網站 19
2.5.1 All Conference 19
2.5.2 Conference Alert 20
2.5.3 DBWorld 21
2.6 混合式分類法 22
2.7 小結 22
第3章 研究方法 23
3.1 研究架構 23
3.2 資料蒐集與前處理 25
3.3 訓練資料特徵選取 26
3.4 混合式分類模組 31
3.4.1 測試資料過濾 31
3.4.2 混合式分類 32
3.5 小結 35
第4章 系統建置與驗證 36
4.1 系統建置環境 36
4.2 實驗設計 37
4.2.1 實驗資料來源 37
4.2.2 前處理階段 38
4.2.3 特徵選取 38
4.2.4 Classifier 38
4.2.5 評估指標 38
4.3 實驗結果分析 39
4.3.1 實驗一:特徵選取方法比較 39
4.3.2 實驗二:混合式分類方法與傳統分類方法之比較 53
第5章 結論及未來研究方向 62
5.1 結論 62
5.2 未來研究方向 64
參考文獻 65
參考文獻 英文文獻
Bonnie Jean, D. (2001). Review of Natural Language Processing in R.A. Wilson and F.C. Keil (Eds.), The MIT Encyclopedia of the Cognitive Sciences. Artificial Intelligence, 130(2), 185-189.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, United States.
Cheng, M. Y., Peng, H. S., Wu, Y. W., & Chen, T. L. (2010). Estimate at Completion for construction projects using Evolutionary Support Vector Machine Inference Model. Automation in Construction, 19(5), 619-629.
Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., & Zarco, C. (2003). A review on the application of evolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34(2-3), 241-264.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory: Wiley-Interscience.
Debska, B., & Guzowska-Swider, B. (2011). Decision trees in selection of featured determined food quality. Analytica Chimica Acta, 705(1-2), 261-271.
Duchrow, T., Shtatland, T., Guettler, D., Pivovarov, M., Kramer, S., & Weissleder, R. (2009). Enhancing navigation in biomedical databases by community voting and database-driven text classification. Bmc Bioinformatics, 10.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. . Paper presented at the Paper presented at the Proceedings of the seventh international conference on Information and knowledge management, Bethesda, Maryland, United States.
Fan, C.-Y., Chang, P.-C., Lin, J.-J., & Hsieh, J. C. (2011). A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Applied Soft Computing, 11(1), 632-644.
Frakes, W. B., & Baeza-Tates, R. (1992). Information Retrieval: Data Structures and Algorithms: Englewood Cliffs, N.J. : Prentice Hall.
Galavotti, L., Nardi, V., Sebastiani, F., & Simi, M. (2000). Feature Selection and Negative Evidence in Automated Text Categorization. Paper presented at the Proceedings of the 4 th European Conference on Research and Advanced Technology for Digital Libraries, ECDL-00.
Ganiz, M. C., George, C., & Pottenger, W. M. (2011). Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification. Ieee Transactions on Knowledge and Data Engineering, 23(7), 1022-1034.
Garbarine, E., DePasquale, J., Gadia, V., Polikar, R., & Rosen, G. (2011). Information-theoretic approaches to SVM feature selection for metagenome read classification. Computational Biology and Chemistry, 35(3), 199-209.
Gonzalez-Albo, B., & Bordons, M. (2011). Articles vs. proceedings papers: Do they differ in research relevance and impact? A case study in the Library and Information Science field. Journal of Informetrics, 5(3), 369-381.
Govindarajan, M., & Chandrasekaran, R. M. (2011). Intrusion detection using neural based hybrid classification methods. Computer Networks, 55(8), 1662-1671.
Harding, J. A., Shahbaz, M., Srinivas, & Kusiak, A. (2006). Data mining in manufacturing: a review American Society of Mechanical Engineers (ASME). Journal of Manufacturing Science and Engineering 128(4), 969–976.
Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. Lecture Notes in Computer Science, 1398, 137-142.
Kauchak, D., & Chen, F. (2005). Feature-based segmentation of narrative documents. Paper presented at the Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor, Michigan.
Kim, S. B., Han, K. S., Rim, H. C., & Myaeng, S. H. (2006). Some effective techniques for naive Bayes text classification. [Article]. Ieee Transactions on Knowledge and Data Engineering, 18(11), 1457-1466.
Kumar, M. A., & Gopal, M. (2010). A hybrid SVM based decision tree. Pattern Recognition, 43(12), 3977-3987.
Larkey, L. S., & Croft, W. B. (1996). Combining classifiers in text categorization. Paper presented at the Paper presented at the Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. , Zurich, Switzerland.
Lewis, D. D., & Ringuette, M. (1994). A Comparison of Two Learning Algorithms for Text Categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval 81-93.
Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Paper presented at the Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore.
Lu, S.-H., Chiang, D.-A., Keh, H.-C., & Huang, H.-H. (2010). Chinese text classification by the Naive Bayes Classifier and the associative classifier with multiple confidence threshold values. Knowledge-Based Systems, 23(6), 598-604.
McLachlan, G. J., Do, K.-A., & Ambroise, C. (2004). Analyzing Microarray Gene Expression Data Wiley-Interscience.
Maron, M. E. (1961). Automatic Indexing: An Experimental Inquiry. Journal of the ACM (JACM), 8(3), 404 - 417.
Middleton, S. E., Shadbolt, N. R., & De Roure, D. C. (2004). Ontological user profiling in recommender systems. Acm Transactions on Information Systems, 22(1), 54-88.
Moisl, H. (2011). Finding the Minimum Document Length for Reliable Clustering of Multi-Document Natural Language Corpora. Journal of Quantitative Linguistics, 18(1), 23-52.
Ng, H. T., Goh, W. B., & Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. SIGIR Forum, 31(SI), 67-73.
Oezguer, L., & Geungoer, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607.
Pai, P. F., Hsu, M. F., & Wang, M. C. (2011). A support vector machine-based model for detecting top management fraud. [Article]. Knowledge-Based Systems, 24(2), 314-321.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. Paper presented at the Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10.
Pham, D. T., & Afify, A. A. (2005). Machine learning techniques and their applications in manufacturing. Proceedings of the Institution of Mechanical Engineers, Journal of Engineering Manufacture: Part B 219, 395–412.
Rak, R., Kurgan, L. A., & Reformat, M. (2007). Multilabel associative classification categorization of MEDLINE articles into MeSH keywords - An intelligent data mining technique to more accurately classify large volumes of documents. Ieee Engineering in Medicine and Biology Magazine, 26(2), 47-55.
Ren, N., Zargham, M., & Rahimi, S. (2006). A decision tree-based classification approach to rule extraction for security analysis. International Journal of Information Technology & Decision Making, 5(1), 227-240.
Robertson, S. E., & Jones, K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129-146.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613-620.
Sun, A., Lim, E.-P., & Liu, Y. (2009). On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 48(1), 191-201.
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining: Addison Wesley.
Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval - An example of conferences and journals. Expert Systems with Applications, 36(10), 12151-12166.
Vapnik, V. N. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc.
Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., & Hampp, T. (1999). Maximizing Text-Mining Performance. IEEE Intelligent Systems Retrieved 4, 14
Wu, C.-H., Ken, Y., & Huang, T. (2010). Patent classification system using a new hybrid genetic algorithm support vector machine. Applied Soft Computing, 10(4), 1164-1177.
Xie, X. L., & Beni, G. (1991). A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 841 - 847
Xu, Y., Wang, B., Li, J., & Jing, H. (2008). An extended document frequency metric for feature selection in text categorization. Paper presented at the Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China.
Yang, Y. (1994). Expert network: effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the Paper presented at the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland.
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Paper presented at the Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,, Berkeley, California, United States.
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States.
Zaghloul, W., Lee, S. M., & Trimi, S. (2009). Text classification: neural networks vs support vector machines. Industrial Management & Data Systems, 109(5-6), 708-717.
中文文獻
林卓彥(2005)。 自動分類方法之比較。 國立中正大學資訊工程研究所,嘉義市。
賴銘偉(2010)。 基於文件分段之電子書特徵選取。 國立成功大學資訊管理研究所,台南市。
網站
All Conference:http://www.allconferences.com/
Conference Alert:http://www.conferencealerts.com/
DBWorld:http://research.cs.wisc.edu/dbworld/
THOMSON REUTERS (ISI) WEB OF KNOWLEDGE:http://apps.webofknowledge.com/
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2022-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw