進階搜尋


下載電子全文  
系統識別號 U0026-2207201515150500
論文名稱(中文) 不同分類器的混合型離散化方法之一致性分析
論文名稱(英文) Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 103
學期 2
出版年 104
研究生(中文) 黃柏翰
研究生(英文) Bo-Han Huang
學號 R76024035
學位類別 碩士
語文別 中文
論文頁數 50頁
口試委員 指導教授-翁慈宗
口試委員-王維聰
口試委員-劉任修
口試委員-陳榮泰
中文關鍵字 混合型離散化方法  一致性  一致性測度  分類器 
英文關鍵字 Classifier  consistency  consistency measure  hybrid discretization method 
學科別分類
中文摘要 分類是資料探勘領域中處理資料的一種方法,根據資料的屬性,經過運算處理而得到每筆資料的分類結果。大多數資料檔內的屬性都包含了連續型屬性,在適用於離散型屬性的分類器中,一般會先將連續型屬性進行離散化處理,將資料轉換為離散型屬性。因此,離散化方法的挑選有可能影響到分類器的預測結果。混合型離散化將連續型屬性個別進行離散化處理,來搜尋最適合的離散化方法。相較於將同一資料檔內的屬性皆採用同一種離散化方法來說,混合型離散化方法更能提升分類正確率。在混合型離散化的文獻中,已經建立出一個適用於多種處理離散型屬性分類器的混合型離散化方法,且在資料前置處理步驟時即可完成所有的離散化動作。然而,在決策樹的分類結果上,使用混合型離散化方法與統一離散化方法的結果持平。因此本研究的目的在於探討不同分類器最佳混合型離散化方法的一致性,希望在了解一致性高低後,能提供混合型離散化方法修正的方向,改善適用於不同分類器的混合型離散化方法。本研究將利用交替採用最佳混合型離散化方法與新提出的一致性測度,來衡量不同分類器最佳混合型離散化方法的一致程度。本研究使用30個資料檔分別於決策樹、簡易貝氏分類器、與基於規則分類器進行分類驗證,相較於交替採用最佳混合型離散化方法,分類器本身的最佳混合型離散化方法已能達不錯的正確率,但仍有些結果優於本身的最佳混合型離散化方法,且一致性測度結果明顯偏低,表示不同分類器的混合型離散方法不一致。因此若想求解適用不同分類器的最佳混合型離散化組合,可能需重新考量各別分類器之特性,並將特性加入計算當中,才有機會於前置處理階段即求得適用於不同分類器的最佳混合型離散化方法。
英文摘要 Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was proposed recently, and it can generally achieve a better performance for naïve Bayesian classifier than unified discretization. A study has developed a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step. However, the results of that study demonstrated that it cannot improve the performance of decision trees. Therefore, the objective of this study is to investigate the consistency of hybrid discretization results among classification algorithms. This study proposes two approaches to perform consistency analysis. The first approach is to identify whether the best hybrid discretization results for a classification algorithm can improve the performance of the others. A new measure is also proposed to evaluate the consistency of the best hybrid discretization results of two classification algorithms. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 30 data sets show that the best hybrid discretization results for an algorithm seldom improve the performance of the others. Moreover, most of the values of the consistency measure are low. These results suggest that the characteristics of a classification algorithm should be considered in designing a hybrid discretization method in data preprocessing.
論文目次 摘要I
誌謝V
目錄VI
表目錄VIII
圖目錄IX
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究架構 2
第二章 文獻回顧 3
2.1 分類器 3
2.2 離散化方法 3
2.3 屬性與屬性之關係 8
2.4 動態規劃 10
2.5 小結 12
第三章 混合型離散化方法一致性之衡量辦法 13
3.1 研究流程 13
3.2 連續型屬性排序 15
3.3 離散化方法 16
3.3.1 等寬度離散化方法 17
3.3.2 等頻率離散化方法 17
3.3.3 比例式離散化方法 18
3.3.4 最小熵值離散化方法 18
3.4 混合型離散化方法 19
3.5 分類器 20
3.5.1. 決策樹 20
3.5.2. 簡易貝氏分類器 21
3.5.3. 基於規則的分類器 22
3.6 效能評估 23
3.6.1 K等分交叉驗證法 23
3.6.2 交替採用最佳混合型離散化方法 23
3.6.3 一致性衡量 24
第四章 實證研究 25
4.1 資料檔特性 26
4.2 資料前置處理 28
4.2.1 屬性對類別值的相關程度排序 28
4.2.2 離散化後屬性的可能值個數 31
4.3 統一離散化與最佳混合型離散化方法之結果 34
4.4 效能評估 36
4.5 小結 44
第五章 結論與未來發展 46
5.1 結論 46
5.2 未來發展 47
參考文獻 48
參考文獻 伍碧那, (2014)。適用於不同分類器的混合型離散化方法。國立成功大學資訊管理研究所碩士論文。
Ahmed, P. (2014). A Hybrid-Based Feature Selection Approach for IDS Networks and Communications (NetCom2013) (pp. 195-211): Springer.
Bache, K. and Lichman, M. (2013). UCI machine learning repository http://www.ics.uci.edu/~mlearn/MLRepository.html.
Bellman, R. (1957). Dynamic Programming, Princeton. NJ: Princeton UP, 18.
Cannas, L. M., Dessi, N., & Pes, B. (2013). Assessing similarity of feature selection techniques in high-dimensional domains. Pattern Recognition Letters, 34(12), 1446-1453.
Cao, F., Ge, Y., & Wang, J. F. (2014). Spatial data discretization methods for geocomputation. International Journal of Applied Earth Observation and Geoinformation, 26, 432-440.
Engle, K. M., & Gangopadhyay, A. (2010). An Efficient Method for Discretizing Continuous Attributes. International Journal of Data Warehousing and Mining, 6(2), 1-21.
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. The 13th International Joint Coference on Artificial Intelligence (IJCAI), 1022-1029.
Garcia, S., Luengo, J., Saez, J. A., Lopez, V., & Herrera, F. (2013). A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.
Hu, Q., Pedrycz, W., Yu, D., & Lang, J. (2010). Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics─Part B: Cybernetics, 40(1), 137-150.
Jiang, S.-y., Li, X., Zheng, Q., & Wang, L.-x. (2009). Approximate equal frequency discretization method. Intelligent Systems, GCIS ’09. WRI Global Congress, 3, 514-518.
Jung, Y.-G., Kim, K. M., & Kwon, Y. M. (2012). Using Weighted Hybrid Discretization Method to Analyze Climate Changes. Computer Applications for Graphics, Grid Computing, and Industrial Environment. Springer Berlin Heidelberg, Communications in Computer and Information Science, 351, 189-195.
Li, M., Deng, S. B., Feng, S. Z., & Fan, J. P. (2011). An effective discretization based on Class-Attribute Coherence Maximization. Pattern Recognition Letters, 32(15), 1962-1973.
Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339.
Lustgarten, J. L., Visweswaran, S., Gopalakrishnan, V., & Cooper, G. F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12, 15.
Mitchell, T. M. (1997). Machine learning : McGraw-Hill
Nelwamondo, F. V., Golding, D., & Marwala, T. (2013). A dynamic programming approach to missing data estimation using neural networks. Information Sciences, 237, 49-58.
Park, C. H., & Lee, M. (2009). A SVM-based discretization method with application to associative classification. Expert Systems with Applications, 36(3), 4784-4787.
Pisica, I., Taylor, G., & Lipan, L. (2013). Feature selection filter for classification of power system operating states. Computers & Mathematics with Applications, 66(10), 1795-1807.
Sakar, C. O., Kursun, O., & Gurgen, F. (2012). A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method. Expert Systems with Applications, 39(3), 3432-3437.
Sang, Y., Jin, Y. W., Li, K. Q., & Qi, H. (2013). UniDis: a universal discretization technique. Journal of Intelligent Information Systems, 40(2), 327-348.
Shen, C. C., & Chen, Y. L. (2008). A dynamic-programming algorithm for hierarchical discretization of continuous attributes. European Journal of Operational Research, 184(2), 636-651.
Wong, D. F., Chao, L. S., & Zeng, X. D. (2014). A Supportive Attribute-Assisted Discretization Model for Medical Classification. Bio-Medical Materials and Engineering, 24(1), 289-295.
Wong, T. T. (2012). A hybrid discretization method for naive Bayesian classifiers. Pattern Recognition, 45(6), 2321-2325.
Yan, D. Q., Liu, D. S., & Sang, Y. (2014). A new approach for discretizing continuous attributes in learning systems. Neurocomputing, 133, 507-511.
Yang, Y., & Webb, G. I. (2009). Discretization for naive-Bayes learning: managing discretization bias and variance. Machine Learning, 74(1), 39-74.
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Advanced Materials Research, 403-408, 1834-1838.
Zhao, J., Han, C. Z., Wei, B., & Han, D. Q. (2012). A UMDA-based discretization method for continuous attributes. Advanced Materials Research, 403, 1834-1838.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-07-27起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2020-07-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw