進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2406201421260700
論文名稱(中文) 藉由盒鬚圖產生之虛擬樣本提升拔靴集成法的分類正確率
論文名稱(英文) Improving the Accuracies of Bootstrap Aggregating with Virtual Samples generated by Box Plot
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 陳誌瑋
研究生(英文) Jhih-Wei Chen
學號 R36004061
學位類別 碩士
語文別 中文
論文頁數 77頁
口試委員 指導教授-利德江
口試委員-李賢得
口試委員-吳植森
中文關鍵字 虛擬樣本  盒鬚圖  拔靴集成法 
英文關鍵字 Virtual Sample  Box-Whisker Plot  Bagging 
學科別分類
中文摘要 如何將資料轉換成有意義的資訊,統計理論在過往扮演著重要的角色,然囿於其基本假設限制,已無法因應現實世界中各種不同面向的資料,因此類神經網路以及資料探勘等機械學習方法於近二十年來有著長足的蓬勃發展。其中在分類問題方面,相較於單一分類器的學習程序,集成法的提出可以有效減少過度配適問題的發生,如拔靴集成法、多模激發法等,其藉由拔靴法生成多個子訓練樣本集以建構多個子分類器,並將結果進行整合,雖能增進單一分類器的分類正確率,但其改善效果仍屬有限,乃因此些子分類器係針對屬性值與訓練樣本相同的子訓練樣本集進行重複性的學習之故。為使子分類器能夠對拔靴樣本以外的屬性值進行學習,本研究採用盒鬚圖進行訓練樣本的值域推估並藉以生成虛擬樣本以充實子訓練樣本集。本論文使用公開資料庫UCI上所取得之資料進行測試,經取得之資料集實驗證實,本研究之方法確能有效提升拔靴集成法之分類正確率,並能增進其分類方法的穩定性。
英文摘要 Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data. The technology applied to classification will have errors sometimes, like that in overfitting and underfitting problems. Ensemble methods, such as Bagging (Bootstrap Aggregating) or Boosting, manipulate training sets to reduce the happening of overfitting problems. Bagging does not focus on any particular instance of training data, and is therefore less susceptible to model overfitting when applied to noisy data. Bagging uses bootstrap to generate samples repeatedly, but doesn’t generate sample sets out of the underlined space of training samples. Because the sampling is done with replacement, some instances may appear several times in the same training set, while others may be omitted from the training set. This study intends to use box-whisker plot to generate virtual samples for training data to substitute bootstrap approach. Further, this paper uses the datasets on public database UCI to prove that this study could improve the accuracies of Bagging.
論文目次 摘要 I
Abstract II
誌謝 VI
目錄 VII
圖目錄 X
表目錄 XII
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究目的 6
1.4 研究流程 7
第二章 文獻探討 9
2.1 虛擬樣本學習方法 9
2.1.1 資訊擴散 10
2.1.2 其他虛擬樣本產生方法 12
2.2 拔靴集成法與盒鬚圖 14
2.2.1 拔靴集成法 14
2.2.2 盒鬚圖簡介 18
2.3 分類工具 19
2.3.1 倒傳遞類神經網路 19
2.3.2 支援向量機 24
第三章 研究方法 29
3.1 符號定義 29
3.2 盒鬚圖推估虛擬樣本產生 30
3.2.1 值域推估 31
3.2.2 模糊三角形建構 31
3.2.3 虛擬值生成 32
3.2.4 樣本形成 33
3.3 分類模式 34
3.3.1 倒傳遞類神經網路 34
3.3.2 支援向量機 37
3.4 本研究方法流程 39
第四章 實例驗證 42
4.1 實驗環境 42
4.1.1 分類模式建構軟體 42
4.1.2 實驗方式與評估指標 43
4.1.3 實驗結果之假設檢定 44
4.2 實驗資料說明 44
4.3 實驗結果 46
4.4 實驗發現與結果探討 67
第五章 結論與建議 69
5.1 結論 69
5.2 未來研究方向 70
參考文獻 72
參考文獻 Bühlmann, P. (2003). Bagging, Subagging and Bragging for Improving some Prediction Algorithms. In G. A. Michael & N. P. Dimitris (Eds.), Recent Advances and Trends in Nonparametric Statistics (pp. 19-34). Amsterdam: JAI.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Breiman, L. (1999). Using adaptive bagging to debias regressions: Technical Report 547, Statistics Dept. UCB.
Bryll, R., Gutierrez-Osuna, R., & Quek, F. (2003). Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition, 36(6), 1291-1302.
Chen, H. Q., & Zeng, Z. G. (2013). Deformation Prediction of Landslide Based on Improved Back-propagation Neural Network. Cognitive Computation, 5(1), 56-62.
Chen, T. (2003). A fuzzy back propagation network for output time prediction in a wafer fab. Applied Soft Computing, 2(3), 211-222.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Dietterich, T. G. (2000). Ensemble methods in machine learning Multiple Classifier Systems (pp. 1-15): Springer.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 1-67.
Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 12(10), 993-1001.
Hothorn, T., & Lausen, B. (2003). Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognition, 36(6), 1303-1309.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivanescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J.-S. (1993). ANFIS: adaptive-network-based fuzzy inference system. Systems, Man and Cybernetics, IEEE Transactions on, 23(3), 665-685.
Jayadeva, Khemchandani, R., & Chandra, S. (2007). Twin Support Vector Machines for Pattern Classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(5), 905-910.
Jia, J. (1993). Pattern classification of RGB color images using a BP neural network classifier. Proc. SPIE 1989, Computer Vision for Industry, 248
Joshi, S., Jayadeva, Ramakrishnan, G., & Chandra, S. (2012). Using Sequential Unconstrained Minimization Techniques to simplify SVM solvers. Neurocomputing, 77(1), 253-260.
Kotsiantis, S., & Pintelas, P. (2004). Combining bagging and boosting. International Journal of Computational Intelligence, 1(4), 324-333.
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems, 231-238.
Kuo, R. J., Shieh, M. C., Zhang, J. W., & Chen, K. Y. (2013). The application of an artificial immune system-based back-propagation neural network with feature selection to an RFID positioning system. Robotics and Computer-Integrated Manufacturing, 29(6), 431-438.
Lee, S., & Choi, W. S. (2013). A multi-industry bankruptcy prediction model using back-propagation neural network and multivariate discriminant analysis. Expert Systems with Applications, 40(8), 2941-2946.
Lin, C. F., & Wang, S. D. (2002). Fuzzy support vector machines. Neural Networks, IEEE Transactions on, 13(2), 464-471.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., & Lin, Y. S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175(1), 413-434.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., Wu, C., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. The International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Chen, C. C., Chang, C. J., & Lin, W. K. (2012a). A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems with Applications, 39(1), 1575-1581.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012b). Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Louzada, F., & Ara, A. (2012). Bagging k-dependence probabilistic networks: An alternative powerful fraud detection tool. Expert Systems with Applications, 39(14), 11583-11592.
Luo, S. T., & Cheng, B. W. (2012). Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods. Journal of Medical Systems, 36(2), 569-577.
Nanni, L., & Lumini, A. (2006). FuzzyBagging: A novel ensemble of classifiers. Pattern Recognition, 39(3), 488-490.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Osawa, T., Mitsuhashi, H., Uematsu, Y., & Ushimaru, A. (2011). Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data. Ecological Informatics, 6(5), 270-275.
Peng, X., & Xu, D. (2013). A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Sciences, 221(0), 12-27.
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.
Qiao, Y. H., Liu, J. L., Zhang, C. G., Xu, X. H., & Zeng, Y. J. (2005). SVM classification of human intergenic and gene sequences. Mathematical Biosciences, 195(2), 168-178.
Rad, S. J. M., Tab, F. A., & Mollazade, K. (2011, 16-17 Nov. 2011). Classification of Rice Varieties Using Optimal Color and Texture Features and BP Neural Networks. Paper presented at the Machine Vision and Image Processing (MVIP), 2011 7th Iranian.
Rumelhart, D. E., Hintont, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Sartakhti, J. S., Zangooei, M. H., & Mozafari, K. (2012). Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Computer Methods and Programs in Biomedicine, 108(2), 570-579.
Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5(2), 197-227.
Song, H. S., Xu, R. S., Ma, Y. L., & Li, G. F. (2013). Classification of ETM+ Remote Sensing Image Based on Hybrid Algorithm of Genetic Algorithm and Back Propagation Neural Network. Mathematical Problems in Engineering, 8.
Song, X. F., Chen, W. M., Chen, Y. P. P., & Jiang, B. (2009). Candidate working set strategy based SMO algorithm in support vector machine. Information Processing & Management, 45(5), 584-592.
Sugeno, M., & Kang, G. (1988). Structure identification of fuzzy model. Fuzzy Sets and Systems, 28(1), 15-33.
Suykens, J. A. K., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9(3), 293-300.
Tao, D., Tang, X., Li, X., & Wu, X. (2006). Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(7), 1088-1099.
Tukey, J. W. (1977). Exploratory data analysis. Reading, Ma, 231.
Van Ooyen, A., & Nienhuis, B. (1992). Improving the convergence of the back-propagation algorithm. Neural Networks, 5(3), 465-471.
Yang, B., Liu, Z., Xing, Y., & Luo, C. (2011, August). Remote sensing image classification based on improved BP neural network. In Image and Data Fusion (ISIDF), 2011 International Symposium on (pp. 1-4). IEEE.
Zhu, X., & Yang, Y. (2008). A lazy bagging approach to classification. Pattern Recognition, 41(10), 2980-2992.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw