進階搜尋


下載電子全文  
系統識別號 U0026-0301201822423800
論文名稱(中文) 重建小樣本資料之樣本分配
論文名稱(英文) Rebuilding Sample Distributions for Small Dataset Learning
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 106
學期 1
出版年 106
研究生(中文) 林武國
研究生(英文) Wu-Kuo Lin
學號 R38001077
學位類別 博士
語文別 英文
論文頁數 46頁
口試委員 指導教授-利德江
召集委員-吳植森
口試委員-蔡長鈞
口試委員-黃信豪
口試委員-王維聰
中文關鍵字 小樣本學習  虛擬樣本  資料前處理 
英文關鍵字 Small data  virtual sample  data preprocessing 
學科別分類
中文摘要 在過去數十年間,雖然有許多學習方法被開發以擷取資料的知識,但他們大多是基於訓練樣本可以完整呈現母體特性的假設前提下進行。如果訓練樣本無法完整表達母體特性時,此些方法所學習的知識對於決策者而言可能是不充足的,或甚至是偏誤的。因此針對小樣本的學習問題,本研究提出一個基於模糊理論的方法,藉由重建小樣本資料可能的樣本分配而產生新的訓練樣本以供演算法進行充分的學習。本方法包含一組新的值域估算函式以及一個樣本產生法。為了驗證方法之效果,本研究從一家在薄膜液晶顯示器產業中居於領導地位的公司內取得兩筆真實案例,採用倒傳導類神經網路和支持向量迴歸兩種學習演算法進行建模,此外並使用Bagging (bootstrap aggregating)和SMOTE (synthetic minority over-sampling technique)兩種樣本生成法進行效果比較。實驗結果顯示,當兩種學習演算法使用本研究所產生之新訓練樣本建模後,對於測試樣本的預測誤差,比使用Bagging與SMOTE所產生之新訓練樣本去建構之模型具有統計顯著性的低。
英文摘要 Over the past few decades, numerous learning algorithms have been proposed to extract knowledge from data. The majority of these algorithms have been developed with the assumption that training sets can denote populations. When the training sets contain only a few properties of their populations, the algorithms may extract minimal and/or biased knowledge for decision makers. This study develops a systematic procedure based on fuzzy theories to create new training sets by rebuilding the possible sample distributions, where the procedure contains new functions that estimate domains and a sample generating method. In this study, two real cases of a leading company in the thin film transistor liquid crystal display (TFT-LCD) industry are examined. Two learning algorithms, a back-propagation neural network and support vector regression, are employed for modeling, and two sample generation approaches, bootstrap aggregating (bagging) and the synthetic minority over-sampling technique (SMOTE), are employed to compare the accuracy of the models. The results indicate that the proposed method outperforms bagging and the SMOTE with the greatest amount of statistical support.
論文目次 摘要 I
Abstract II
誌謝 III
CONTENTS I
LIST OF FIGURES III
LIST OF TABLES V
1 Introduction 1
1.1 Backgrounds 1
1.2 Motivation and Objective 3
1.3 Organization 4
2 Literature Review 6
2.1 Related studies 6
2.2 The bootstrapping procedure and bagging 10
2.3 The SMOTE 12
3 Methodology 13
3.1 Definitions of notations 13
3.2 Estimating sample distributions 14
3.2.1 Determining location centers 14
3.2.2 Deriving domain bounds 15
3.2.3 Building sample distributions 19
3.3 Generating samples 20
3.3.1 Generating corresponding attribute values 20
3.3.2 Generating given attribute values 23
3.3.3 Generating high-dimensional data 25
4 Two examples and computational results 29
4.1 Experimental designs 29
4.2 Case description 32
4.3 Experimental results and discoveries 34
5 Conclusions 40
References 42
參考文獻 Abdi, L., & Hashemi, S. (2015). To combat multi-class imbalanced problems by means of over-sampling and boosting techniques. Soft Computing, 19(12), 3369-3385.
Ali, O. A. M., Ali, A. Y. A., & Sumait, B. S. (2015). Comparison between the Effects of Different Types of Membership Functions on Fuzzy Logic Controller Performance. International Journal of Emerging Engineering Research and Technology, 3(3), 76-83.
Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Chao, G.Y., Tsai, T.I., Lu, T.J., Hsu, H.C., Bao, B.Y., Wu, W.Y., et al. (2011). A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Systems with Applications, 38(7), 79637969.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
Dag, A., Oztekin, A., Yucel, A., Bulur, S., & Megahed, F. M. (2017). Predicting heart transplantation outcomes through data analytics. Decision Support Systems, 94, 42-52.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
Forrester, J. W. (1961). Industrial Dynamics: MIT Press: Cambridge, Massachusetts.
Gomez-Vallejo, H. J., Uriel-Latorre, B., Sande-Meijide, M., Villamarin-Bello, B., Pavon, R., Fdez-Riverola, F., et al. (2016). A case-based reasoning system for aiding detection and classification of nosocomial infections. Decision Support Systems, 84, 104-116.
Gosset, W. S. (1908). The probable error of a mean. Biometrika, 6(1), 1-25. doi: 10.1093/biomet/6.1.1
Guo, G., & Dyer, C. R. (2005). Learning from examples in the small sample case: face expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(3), 477-488.
Holland, J. (1975). Adaptation in Natural and Artificial Systems: The University of Michigan Press.
Huang, C.J., Wang, H.F., Chiu, H.J., Lan, T.H., Hu, T.M., & Loh, E.W. (2010). Prediction of the period of psychotic episode in individual schizophrenics by simulation-data construction approach. Journal of medical systems, 34(5), 799-808.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Huang, Z., Li, J., Su, H., Watts, G. S., & Chen, H. (2007). Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems, 43(4), 1207-1225.
Jin, Z., & Bose, B. K. (2002, 5-8 Nov. 2002). Evaluation of membership functions for fuzzy logic controlled induction motor drive. Paper presented at the IEEE 2002 28th Annual Conference of the Industrial Electronics Society. IECON 02.
Li, D.C., & Lin, L.S. (2013). A new approach to assess product lifetime performance for small data sets. European Journal of Operational Research, 230(2), 290-298.
Li, D.C., Lin, W.K., Lin, L.S., Chen, C.C., & Huang, W.T. (2016). The attribute-trend-similarity method to improve learning performance for small datasets. International Journal of Production Research, 55(7), 1898-1913.
Li, D.C., Wu, C.S., Tsai, T.I., & Lina, Y.S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., Huang, W. T., Chen, C. C., & Chang, C. J. (2013). Employing virtual samples to build early high-dimensional manufacturing models. International Journal of Production Research, 51(11), 3206-3224.
Li, D. C., & Lin, L. S. (2014). Generating information for small data sets with a multi-modal distribution. Decision Support Systems, 66, 71-81.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Quinlan, J. R. (1992). Learning with continuous classes. Paper presented at the 5th Australian joint conference on artificial intelligence.
Sezer, E. A., Nefeslioglu, H. A., & Gokceoglu, C. (2014). An assessment on producing synthetic samples by fuzzy C-means for limited number of data in prediction models. Applied Soft Computing, 24, 126-134.
Shao, C., Song, X., Yang, X., & Wu, X. (2016). Extended minimum-squared error algorithm for robust face recognition via auxiliary mirror samples. Soft Computing, 20(8), 3177-3187.
Song, X., Shao, C., Yang, X., & Wu, X. (2016). Sparse representation-based classification using generalized weighted extended dictionary. Soft Computing, 1-14.
Tang, D., Zhu, N., Yu, F., Chen, W., & Tang, T. (2014). A novel sparse representation method based on virtual samples for face recognition. Neural Computing and Applications, 24(3-4), 513-519.
Yang, J., Yu, X., Xie, Z.-Q., & Zhang, J.-P. (2011). A novel virtual sample generation method based on Gaussian distribution. Knowledge-Based Systems, 24(6), 740-748.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.
Zhou, J., Duan, B., Huang, J., & Li, N. (2015). Incorporating prior knowledge and multi-kernel into linear programming support vector regression. Soft Computing, 19(7), 2047-2061.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-01-12起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-01-12起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw