進階搜尋


下載電子全文  
系統識別號 U0026-0807201416020400
論文名稱(中文) 適用於不同分類器的混合型離散化方法
論文名稱(英文) A hybrid discretization method for classification algorithms
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 伍碧那
研究生(英文) Bi-Na Wu
學號 R76014072
學位類別 碩士
語文別 中文
論文頁數 49頁
口試委員 指導教授-翁慈宗
口試委員-蔡青志
口試委員-謝佩璇
口試委員-劉任修
中文關鍵字 混合型離散化  網路最佳化模型  動態規劃  分類器 
英文關鍵字 Classifier  dynamic programming  hybrid discretization method  network optimization model 
學科別分類
中文摘要 分類是資料探勘領域處理資料的一種方法,根據資料的屬性,經過運算處理而得到每筆資料的分類結果。大多數資料檔內的屬性都包含了連續型屬性,在適用於離散型屬性的分類器中,一般會先將連續型屬性進行離散化動作,將資料轉換為離散型屬性。因此,離散化方法的挑選有可能影響到分類器的分類預測的效果。混合型離散化將連續型屬性個別進行離散化動作,來搜尋最適合的離散化方法,相較於將同一資料檔內的屬性皆採用同一種離散化方法來說,更能提升分類正確率。在混合型離散化的文獻中,主要研究適用於簡易貝氏分類器上,並且須採用分類結果來判定最適合的離散化方法,無法在資料前置處理步驟立即完成所有的離散化動作,因此本研究的目的在於建立出一個適用於其它處理離散型屬性的分類器的混合型離散化方法,且在資料前置處理步驟時即可完成所有的離散化動作。本研究將結合作業研究中的網路最佳化問題,並將混合型離散化問題轉換成網路最佳化模型圖,再根據屬性之間以及屬性與類別值的相關性作為評估指標,使用動態規劃來找出一條最佳的路徑,此路徑亦代表著最適合的混合離散化方法。本研究使用20個資料檔分別使用決策樹、簡易貝氏分類器與基於規則分類器進行分類驗證,相較於使用統一離散化方法,混合型離散化方法在放入簡易貝氏分類器與基於規則分類器時,大部分的資料檔的分類正確率皆有所提升,在決策樹的分類結果則是混合型離散化方法與統一離散化方法的結果差不多,因此本研究之研究方法在挑選混合離散化組合上是可行的。
英文摘要 Discretization is one of the major approaches for processing continuous attributes for classifiers. Hybrid discretization sets the method for discretizing each continuous attribute individually. A previous study found that hybrid discretization method is a better approach to improve the performance of naïve Bayesian classifier than unified discretization. That approach determines the discretization method for each attribute based on whether accuracy can be improved or not. The objectives of this study is to develop a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step instead of using accuracy. This study will first build a network optimization models based on the association among attributes and the class. Dynamic programming is then employed to find the optimal solution for the network, and this solution indicates the discretization method for each continuous attribute. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 20 data sets show that the computational cost of our method is low, and that in general, the hybrid discretization method have a better performance in naïve Bayesian classifiers and rule-based classifiers, but not in decision trees.
論文目次 摘要 I
誌謝 V
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究架構 2
第二章 文獻回顧 3
2.1離散化方法 3
2.2屬性與類別值之關係 6
2.3 動態規劃 9
第三章 研究方法 11
3.1 研究流程 11
3.2 連續型屬性排序 12
3.3 離散化方法 14
3.3.1 等寬度離散化方法 15
3.3.2 等頻率離散化方法 15
3.3.3 比例式離散化方法 16
3.3.4 最小熵值離散化方法 16
3.4 網路最佳化模型 17
3.4.1建構網路模型圖 17
3.4.2相關性衡量 18
3.5 動態規劃 21
3.6 分類器 25
3.6.1 決策樹 25
3.6.2 簡易貝氏分類器 25
3.6.3 基於規則的分類器 26
3.6.4 K等分交叉驗證法 27
第四章 實證研究 28
4.1 資料檔介紹 28
4.2 動態規劃求解之結果 29
4.3 分類結果之驗證 31
4.3.1 決策樹 32
4.3.2 簡易貝氏分類器 34
4.3.3 基於規則分類器 36
4.3.4 分類正確率驗證小結 38
4.4 統一離散化方法之分類驗證 39
4.4.1決策樹 39
4.4.2 簡易貝氏分類器 40
4.4.3 基於規則分類器 41
4.5 小結 43
第五章 結論與未來發展 45
5.1 結論 45
5.2 未來發展 46
參考文獻 47
參考文獻 Ballesteros, A. J. T., Martínez, C. H., Riquelme, J. C., and Ruiz, R. (2013). Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing, 114, 107–117.
Bellman, R. (1957). Dynamic Programming. Princeton, Princeton University Press.
Cannas, L. M., Dessi, N., and Pes, B. (2013). Assessing similarity of feature selection techniques in high-dimensional domains. Pattern Recognition Letters, 34, 1446–1453.
Concepción, M. Á. Á. D. L., Abril, L. G., Morillo, L. M. S., and Ramírez, J. A. O. (2013). An adaptive methodology to discretize and select features. Artificial Intelligence Research, 2( 2), 77-86.
Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-Valued attributes for classification learning. The 13th International Joint Conference on Artificial Intelligence (IJCAI), 1022-1029.
García, S., Luengo, J., Sáez, J. A., López, V., and Herrera, F. (2013). A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.
Golding, D., Nelwamondo, F. V., and Marwala, T. (2013). A dynamic programming approach to missing data estimation using neural networks. Information Sciences, 237, 49–58.
Gu, Q., Li, Z., and Han, J. (2012). Generalized fisher score for feature selection. The 27th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, arXiv preprint arXiv,1202.3725.
Hu, Q., Pedrycz, W., Yu, D., and Lang, J. (2010). Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 40(1), 137-150.
Jiang, S. Y., Li, X., Zheng, Q., and Wang, L. X. (2009). Approximate equal frequency discretization method. Intelligent Systems, GCIS '09. WRI Global Congress, 3, 514-518.
Li, M., Deng, S. B., Feng, S., and Fan, J. (2011). An effective discretization based on Class-Attribute Coherence Maximization. Pattern Recognition Letters, 32, 1962–1973.
Liu, H., Sun, J., Liu, L., and Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42, 1330-1339.
Jung, Y. G., Kim, K. M., and Kwon, Y. M. (2012). Using weighted hybrid discretization method to analyze climate changes. Computer Applications for Graphics, Grid Computing, and Industrial Environment. Springer Berlin Heidelberg, Communications in Computer and Information Science, 351, 189–195.
Lustgarten, J. L, Visweswaran, S., Gopalakrishnan1, V., and Cooper, G. F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12, 309.
Park, C. E. and Lee, M. (2009). A SVM-based discretization method with application to associative classification. Expert Systems with Applications, 36, 4784–4787
Pisica, I., Taylor, G., and Lipan, L. (2013). Feature selection filter for classification of power system operating states. Computers and Mathematics with Applications, 66, 1795–1807.
Sakar, C. O., Kursun, O., and Gurgen, F. (2012). A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy–Maximum Relevance filter method. Expert Systems with Applications, 39, 3432–3437.
Sang, Y., Jin, Y., Li, K., and Qi, H. (2013). UniDis: a universal discretization technique. Journal of Intelligent Information Systems, 40, 327–348.


Shen, C. C. and Chen, Y. L. (2008). A dynamic-programming algorithm for hierarchical discretization of continuous attributes. European Journal of Operational Research, 184, 636–651.
Tian D., Zeng, X. J., and Keane, J. (2011). Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification. International Journal of Approximate Reasoning, 52 , 863–880.
Wong, T. T. (2012). A hybrid discretization method for naive Bayesian classifiers. Pattern Recognition, 45, 2321–2325.
Yu, L. and Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Proceedings of the Twentieth International Conference on Machine Learning, Washington DC, 856-863.
Zhao, J., Han, C. Z., Wei, B., and Han, D. Q. (2012). A UMDA-based discretization method for continuous attributes. Advanced Materials Research, 403-408, 1834-1838.
Zou, L., Yan, D., Karimi, H. R., and Shi, P. (2013). An algorithm for discretization of real value attributes based on interval similarity. Journal of Applied Mathematics, 1-8.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2014-07-18起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2016-07-18起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw