進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1706201418073900
論文名稱(中文) 以虛擬樣本產生法為基礎的隨機林預測模式
論文名稱(英文) Accomplishing Random Forest on the Basis of Virtual Sample Construction
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 陳弘基
研究生(英文) Hong-Ji Chen
學號 R76011016
學位類別 碩士
語文別 中文
論文頁數 72頁
口試委員 指導教授-利德江
口試委員-李賢得
口試委員-吳植森
中文關鍵字 虛擬樣本  拔靴法  整體趨勢擴散技術  整合分類法 
英文關鍵字 ensemble  bootstrap process  virtual samples  mega-trend-diffusion (MTD) 
學科別分類
中文摘要 如何將資料轉換為有意義的資訊,以提供予企業決策者以為參考,具有其實質上應用的價值。在過往針對分類資料的學習問題上,為了改善單一模式之分類正確率而有著如裝袋法、激發法、以及隨機林等整合分類學習程序,然而此些方法在建構子模式時,其所需的子樣本集資料係藉由拔靴法生成,使得此些子模式只能針對相似的資料從事重覆性的訓練,並產生出僅具些微差異的學習結果,雖能改善單一模式的分類準確度,而其結果仍屬有限。為使子模式能夠針對非訓練的樣本資料範圍進行學習,本研究使用虛擬樣本取代拔靴法,並選擇以整體趨勢擴散技術(mega-trend-diffusion, MTD)來進行子樣本集的生成,此種虛擬樣本產生法於近十年的研究中,已被確認能有效增進學習工具對於少量樣本的訓練穩定性以及預測準確度。於資料取得方面,本研究使用公開資料庫UCI上所取得之資料,針對隨機林整合分類法,將拔靴法改使用MTD進行測試,冀能增進隨機林對於測試樣本的分類準確度。實驗結果顯示,本研究之方法能有效提升分類正確法之準確率。
英文摘要 In order to improve the classification accuracy of single-model, certain ensemble approaches, such as the bagging, the boosting, and the random forest, are developed based on the bootstrap process to generate training subsets to implement their learning procedures. Nevertheless, there only exists slight difference between the training subsets, since these are created by sampling from the same data with replacement. Indeed, integrating the results of the sub-models that built with the training subsets can facilitate improve the classification accuracy of single-model, there still has the possibility to further achieve this by generating the training subsets that have different sample values between them. Therefore, this study employs another sample generation approach that called the mega-trend-diffusion (MTD) technique as a substitute for the bootstrap process in the learning procedures of the random forest, where this kind of the sample generation approaches has been demonstrated to enhance the robustness and the preciseness of learning tools when sample sizes are small in the past decade. In the experiment, the results show that the classification accuracy of the random forest is significantly improved when training subsets are created by the MTD rather than the bootstrap process.
論文目次 摘要 I
英文摘要 II
誌謝 VIII
目錄 IX
圖目錄 XI
表目錄 XIII
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 5
1.4 研究架構與流程 6
第二章 文獻探討 8
2.1 決策樹分類預測模式 8
2.2 整合分類法 11
2.2.1 整合分類法之原理 11
2.2.2 袋外資料錯誤率估計 13
2.2.3 常見的整合分類法 14
2.3 樣本產生法 20
2.3.1 拔靴抽樣法 20
2.3.2 資訊擴散技術 24
2.4 小結 27
第三章 研究方法 28
3.1 整體趨勢擴散技術 28
3.1.1 參考點與擴散係數修改 28
3.1.2 偏態設定 29
3.1.3 隸屬函數值制定 30
3.2 樣本生成機制 31
3.3 建構預測模式 33
3.4 本研究方法流程與步驟 40
第四章 實例驗證 42
4.1 實驗環境 42
4.1.1 預測模式建構軟體 42
4.1.2 實驗方式與評估指標 42
4.2 本研究實驗資料說明 44
4.3 實驗結果 46
4.4 實驗發現與結果探討 65
第五章 結論與建議 67
5.1 結論 67
5.2 未來研究建議 68
參考文獻 69
參考文獻 Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Byon, E., Shrivastava, A. K., & Ding, Y. (2010). A classification procedure for highly imbalanced class sizes. IIE Transactions, 42(4), 288-303.
Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap (Vol. 57). CRC press.
Friedman, Milton. (1980). Free to choose (1st ed.): Harcourt Brace Jovanovich, (New York :).
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Huang, C. J., Wang, H. F., Chiu, H. J., Lan, T. H., Hu, T. M., & Loh, E. W. (2010). Prediction of the Period of Psychotic Episode in Individual Schizophrenics by Simulation-Data Construction Approach. Journal of Medical Systems, 34(5), 799-808.
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 119-127.
Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript, December.
Li, D. C., Wu, C. S., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., & Yeh, C. W. (2008). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34(1), 391-398.
Li, D. C., Tsai, T. I., & Shi, S. (2009b). A prediction of the dielectric constant of multi-layer ceramic capacitors using the mega-trend-diffusion technique in powder pilot runs: case study. International Journal of Production Research, 47(1), 51-69.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012c). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
Li, D. C., Chang, C. C., & Liu, C. W. (2012). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems, 52(3), 748-756.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C.(2012). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., & Liu, C. W. (2012). Extending Attribute Information for Small Data Set Classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012b). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
Lin, Y. S., & Li, D. C. (2010). The Generalized-Trend-Diffusion modeling algorithm for small data sets in the early stages of manufacturing systems.European Journal of Operational Research, 207(1), 121-130.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4.5: programs for machine learning (Vol. 1). Morgan Kaufmann.
Schapire, R. E. (1999, January). Theoretical views of boosting and applications. In Algorithmic Learning Theory (pp.13-25). Springer Berlin Heidelberg.
Schwenk, H. and Y. Bengio (2000), Boosting Neural Networks, Neural Computation, 12(8),1869-1887.
Tsai, T. I., & Li, D. C. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications,35(3), 1293-1300.
Zhen-Rong, L., & Chong-Fu, H. (1990). Information distribution method relevant in fuzzy information analysis. Fuzzy Sets and Systems, 36(1), 67-76.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2016-06-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw