進階搜尋


下載電子全文  
系統識別號 U0026-2707201712231200
論文名稱(中文) 以資料二元分割方式為基礎的混合分類方法
論文名稱(英文) A Hybrid Classification Method Based on Binary Partition of Instances
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 105
學期 2
出版年 106
研究生(中文) 陳國鴻
研究生(英文) Guo-Hong Chen
電子信箱 ro60507@gmail.com
學號 r76041176
學位類別 碩士
語文別 中文
論文頁數 51頁
口試委員 指導教授-翁慈宗
口試委員-劉任修
口試委員-王維聰
口試委員-陳榮泰
中文關鍵字 資料前處理  決策樹  混合分類方法  簡易貝氏分類方法 
英文關鍵字 Data preprocess  decision tree learning  hybrid classification  naïve Bayesian classifier 
學科別分類
中文摘要 在資料探勘技術中,資料分類是一項十分重要的探勘方法,而由於現今網路資訊量過於龐雜,所蒐集到的資料品質並不穩定,因此採用資料前處理技術適當過濾資料時,能夠改善分類結果品質。目前常使用的資料前處理技術,可以分為針對資料檔屬性特徵,或是針對訓練樣本進行過濾兩種方式。針對屬性特徵的部分,除了特徵選取方法外,許多學者結合不同分類方法或是分群方法,透過重複測試許多屬性特徵子集以得到最佳分類結果;針對訓練樣本的部分,有學者透過對訓練樣本的隨機重取樣,建構許多個分類預測模型。另外也有學者提出樣本減少方法,將採用最近鄰居法時被分類錯誤的訓練樣本直接移除。本研究則提出一種針對訓練樣本的混合分類方法,利用分類方法將全訓練樣本進行適當重分組,同時保留分類資訊較多的全訓練樣本,最後在預測階段中,透過比較測試樣本與每一組訓練樣本之近似程度來決定預測模型。本研究採用20個資料檔,並分別針對分類正確率及花費的計算時間,與基礎分類方法和其他學者提出之混合分類方法進行比較。根據實證結果可以知道,採用本研究混合分類方法時,雖然相較基礎分類方法需要花費較多的計算時間,然而將近一半的資料檔中都可以顯著提升基礎分類方法之分類正確率,且在所有資料檔中皆等於或大於基礎分類方法得到的正確率;與混合分類方法正確率之統計檢定結果的整體比較下,本研究亦能夠得到顯著較佳之結果。除此之外,針對訓練樣本執行K等分交叉驗證及單一預測模型的設計,本研究方法亦能夠維持一定的計算時間,同時保有相對較好的解釋能力。
英文摘要 Classification is an essential task in data mining. Preprocess techniques are generally used to improve data quality for enhancing the performance of class prediction. The techniques for data preprocessing can be categorized as on attributes or on instances. A classification algorithm is trained by the data that have been processed by another, and this is called hybrid classification. This study presents a hybrid classification algorithm that first divides a training set into two subsets by a classification algorithm. Then a model is learned from not only each of the two subsets, but also from the whole training set by another algorithm. Every test instance will be classified by one of the three models. The proposed hybrid classification algorithm is tested on 20 data sets for analyzing its prediction accuracy and computational efficiency. The experimental results show that our hybrid algorithm significantly outperforms naïve Bayesian classifier and decision tree learning in most data sets, while it needs more time to learn models. With respect to two hybrid classification algorithms proposed by other studies, our hybrid algorithm can have not only a significantly higher accuracy, but also a relatively lower computational cost.
論文目次 摘要 I
誌謝 VI
目錄 VII
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究架構 3
第二章 文獻探討 4
2.1 分類方法 4
2.1.1 簡易貝氏分類方法 5
2.1.2 決策樹 6
2.1.3 最近鄰居法 7
2.1.4 類神經網路 8
2.1.5 支持向量機 10
2.2 針對屬性特徵的混合分類方法 12
2.3 針對訓練樣本的混合分類方法 14
2.4 小結 17
第三章 研究方法 18
3.1 研究流程 18
3.2 分組前處理 19
3.3 混合預測模型 21
3.4 選擇預測模型 22
3.5 評估方法 26
第四章 實證研究 27
4.1 資料檔介紹 28
4.2 基礎分類方法與本研究混合分類方法之比較 29
4.2.1 門檻值設定之必要性 29
4.2.2 基礎與本研究分類方法正確率之比較 30
4.3 樂觀與悲觀預測模型之分類結果 33
4.4 混合分類方法之比較 39
4.5 小結 44
第五章 結論與未來發展 48
5.1 結論 48
5.2 未來發展 49
參考文獻 50
參考文獻 陳育生(2015),多個資料檔下比較兩分類方法表現之有母數統計方法,國立成功大學資訊管理研究所碩士學位論文。
Ahmed, A.-I. & Hasan, M. M. (2014). "A hybrid approach for decision making to detect breast cancer using data mining and autonomous agent based on human agent teamwork." Proceedings of the 17th International Conference on Computer and Information Technology Dhaka, Bangladesh, 320-325.
Bahety, A. (2014). "Extension and Evaluation of ID3–Decision Tree Algorithm." Entropy (S) 2(1).
Bermejo, P., Gámez, J. A., & Puerta, J. M. (2014). "Speeding up incremental wrapper feature subset selection with Naive Bayes classifier." Knowledge-Based Systems 55: 140-147.
Breiman, L. (1996). "Bagging predictors." Machine Learning 24(2): 123-140.
Breiman, L. (2001). "Random forests." Machine Learning 45(1): 5-32.
Cao, F., Liu, B., & Park, D. S. (2013). "Image classification based on effective extreme learning machine." Neurocomputing 102: 90-97.
Chen, S. F. & Goodman, J. (1999). "An empirical study of smoothing techniques for language modeling." Computer Speech & Language 13(4): 359-394.
Cortes, C. & Vapnik, V. (1995). "Support-vector networks." Machine Learning 20(3): 273-297.
Dietterich, T. G. (2000). "Ensemble methods in machine learning." Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 1857: 1-15.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). "Supervised and unsupervised discretization of continuous features."Proceedings of the 12th International Conference on Machine Learning, 194-202.
Farid, D. M., Zhang, L., Rahman, C. M., Hossain, M. A., & Strachan, R. (2014). "Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks." Expert Systems with Applications 41(4): 1937-1946.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). "From data mining to knowledge discovery in databases." AI Magazine 17(3): 37.
Jamjoom, M. & El Hindi, K. (2016). "Partial instance reduction for noise elimination." Pattern Recognition Letters 74: 30-37.
Kuo, B.-C., Ho, H.-H., Li, C.-H., Hung, C.-C., & Taur, J.-S. (2014). "A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7(1): 317-326.
Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (2013). Machine Learning: An Artificial Intelligence Approach: Springer Science & Business Media.
Quinlan, J. R. (1986). "Induction of decision trees." Machine Learning 1(1): 81-106.
Ralescu, A., Díaz, I., & Rodríguez-Muñiz, L. J. (2015). "A classification algorithm based on geometric and statistical information." Journal of Computational and Applied Mathematics 275: 335-344.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). "Learning representations by back-propagating errors." Cognitive Modeling 5(3): 1.
Tan, P.-N. (2006). Introduction to Data Mining: Pearson Education India.
Taravat, A., Del Frate, F., Cornaro, C., & Vergari, S. (2015). "Neural networks and support vector machine algorithms for automatic cloud classification of whole-sky ground-based images." IEEE Geoscience and Remote Sensing Letters 12(3): 666-670.
Wilson, D. R. & Martinez, T. R. (2000). "Reduction techniques for instance-based learning algorithms." Machine Learning 38(3): 257-286.
Zaidi, N. A., Cerquides, J., Carman, M. J., & Webb, G. I. (2013). "Alleviating naive Bayes attribute independence assumption by attribute weighting." Journal of Machine Learning Research 14(1): 1947-1988.
Zaremotlagh, S. & Hezarkhani, A. (2016). "A geochemical modeling to predict the different concentrations of REE and their hidden patterns using several supervised learning methods: Choghart iron deposit, bafq, Iran." Journal of Geochemical Exploration 165: 35-48.
Zheng, B., Yoon, S. W., & Lam, S. S. (2014). "Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms." Expert Systems with Applications 41(4): 1476-1482.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2017-08-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2017-08-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw