進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0707201602593000
論文名稱(中文) 基於主成份分析修正小樣本學習之虛擬樣本產生過程
論文名稱(英文) Improving Virtual Sample Generation Process of Small Sample Learning Based on Principal Component Analysis
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 104
學期 2
出版年 105
研究生(中文) 潘致維
研究生(英文) Chih-Wei Pan
學號 R36034074
學位類別 碩士
語文別 中文
論文頁數 66頁
口試委員 指導教授-利德江
口試委員-李賢得
口試委員-王清正
口試委員-林耀三
中文關鍵字 小樣本學習  主成份分析  虛擬樣本  盒鬚圖 
英文關鍵字 small sample learning  principal component analysis  virtual sample 
學科別分類
中文摘要 由於現今科技發展迅速,導致許多產品的生命週期縮短,企業便將更多資源投入產品的研發與設計,往往每過幾季便要將新產品上市。因此,如何在短時間內以現有的技術大量生產新產品成為企業不容忽視的問題。而產品在試產階段取得的樣本資料往往不足,如有有效運用少量樣本資料解決生產問題便是小樣本學習問題。過往小樣本學習之相關研究大部分是以產生虛擬樣本來提升學習效率,且已有多篇文獻顯示若在訓練資料加入虛擬樣本,確實能提高分類正確率或是預測準確度。為了使虛擬樣本產生時,使用整體趨勢擴散技術或盒鬚圖法對屬性母體值域推估能更加合理化,本研究將透過主成份分析轉變座標主軸的概念,對資料屬性進行轉換,確保屬性間滿足彼此獨立的研究假設。由於資料屬性存在相依性時,會導致虛擬樣本無法各別產生,必需額外考量屬性間的關聯性,若在此情況下仍忽略此限制使用整體趨勢擴散技術進行虛擬值的生成,則可能會造成值域推估出現偏誤。因此本研究將設法使資料屬性間保持獨立,並以此生成虛擬樣本並投入訓練樣本,提高分類表現之穩定度。
英文摘要 Owing to rapid development of science and technology, life cycle of modern products has become shorter. Therefore, companies should make decisions quickly before new products are launched. However, it is difficult to collect enough samples in such a short period of time. Thus, it is important to extract more information from the lack of collected data, which is resulting in small sample learning problems. In past studies, researchers usually generated extra virtual samples to increase the sample sizes when facing with the issues of small dataset learning. Additionally, the virtual samples are generated by the method, mega-trend-diffusion technique, which has been confirmed that could enhance learning effectiveness and eventually improve the classification accuracy. Our research is devoted to revising the error caused by the mega-trend-diffusion technique, which means the attributes are not fully independent in the dataset. Therefore, employing the concept of the principal component analysis to the dataset in advance will be the core notion of the proposed method. In order to verify the proposed method, we compare the performance not only with raw data, but also with the original mega-trend-diffusion technique, which was used in the past studies. The results show that the proposed method improves the average classification accuracy rate with a lower standard deviation, which means much more reliable.
論文目次 摘要...I
目錄...VII
圖目錄...IX
表目錄...X
第一章 緒論...1
1.1研究背景...1
1.2研究動機...2
1.3研究目的...4
1.4研究假設...5
1.5研究流程與架構 ...5
第二章 文獻探討...8
2.1小樣本學習方法之演進...8
2.2主成份分析...18
2.2.1縮減資料維度 ...21
2.3分類模式...24
2.3.1倒傳遞類神經網路...24
2.3.2支援向量機...28
第三章 研究方法...32
3.1基本符號設定...32
3.2轉換座標主軸...33
3.3盒鬚圖法生成虛擬樣本...37
3.3.1虛擬樣本值域推估...38
3.3.2建構模糊三角隸屬函數...39
3.3.3產生及評估虛擬樣本...40
3.3.4虛擬樣本生成 ...42
3.4分類模型 ...42
3.5研究方法流程 ...43
第四章 實例驗證 ...46
4.1實驗環境 ...46
4.2實驗案例 ...46
4.3實驗結果 ...49
第五章 結論與建議 ...55
參考文獻 ...57
附錄 ...61

參考文獻 陳順宇 (2005)。多變量分析(第四版),華泰文化事業股份有限公司。
黃文定 (2014)。使用基於屬性趨勢相似度生成之虛擬樣本建構液晶面板廠之高維度資料製造模式。博士論文,國立成功大學工業與資訊管理學系。
Adler, N., & Golany, B. (2001). Evaluation of deregulated airline networks using data envelopment analysis combined with principal component analysis with an application to Western Europe. European Journal of Operational Research, 132(2), 260-273.
Chao, G. Y., Tsai, T. I., Lu, T. J., Hsu, H. C., Bao, B. Y., Wu, W. Y., Lin, M. T., & Lu, T. L. (2011). A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Systems with Applications, 38(7), 7963-7969.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
Dietterich, T. G. (2000). Ensemble methods in machine learning. Multiple Classifier Systems, Springer, 1-15.
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Hervé, A., & Lynne, J. W. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417-441.
Huang, C. F. (1997). Principle of information diffusion. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivanescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J.P.C. (2006). Bootstrapping to solve the limited data problem in production control: an application
in batch process industries. Journal of the Operational Research Society, 57(1), 2-9
Jang, J. S. (1993). ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Transactions on Systems, Man, and Cybernetics, 23(3), 665-685.
Li, D. C., Chang, C. C., & Liu, C. W. (2012a). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems, 52, 748-756.
Li, D. C., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. The International Journal of Advanced Manufacturing Technology, 27(3), 321-328.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012b). Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., Chen, C. C., Chang, C. J. and Lin, W. K. (2012c). A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems with Applications, 39(1), 1575-1581.
Li, D. C., Chen, C. C., Chen, C. W., & Chang, C. C. (2012d). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009a). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
Li, D. C., Hsu, H. C., Tsai, T. I., Lu, T. J., & Hu, S. C. (2007a). A new method to help diagnose cancers. Expert Systems with Applications, 33(2), 420-424.
Li, D. C., Lin, Y. S., & Huang, Y. C. (2009b). Constructing marketing decision support systems using data diffusion technology: A case study of gas station diversification.
Expert Systems with Applications, 36, 2525-2533.
Li, D. C., & Liu , C. W. (2010). A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Systems with Applications, 37, 3104-3110.
Li, D. C., & Liu , C. W. (2009c). A neural network weight determination model designed uniquely for small data set learning. Expert Systems with Applications, 36, 9853–9858.
Li, D. C., & Liu , C. W. (2012e). Extending attribute information for small dataset classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
Li, D. C., & Liu , C. W. (2013). A new approach for manufacturing forecast problems with insufficient data: the case of TFT–LCDs. Journal of Intelligent Manufacturing, 24(2), 225-233.
Li, D. C., Wu, C. S., Tong, K. Y. (1997). Using an unsupervised neural network and decision tree as knowledge acquisition tools for FMS scheduling. International Journal of Systems Science, 28(10), 977-985.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers and Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007b). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., Yeh, C. W., & Chang, C. J. (2009d). An improved grey-based approach for early manufacturing data forecasting. Computers & Industrial Engineering, 57, 1161-1167.
Luukka, P. (2008). Similarity classifier in diagnosis of bladder cancer. Computer Methods and Programs in Biomedicine, 89(1), 43-49.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space,Philosophical Magazine , 2(6), 559-572.
Rasmus, B., & Age, K. S. (2014). Principal component analysis. Analytical Methods, 6, 2812-2831.
Rumelhart, D. E. (1986). Learning representations by back-propagating errors. Nature, 323(9), 533-536.
Tukey, J. W. (1977). Exploratory data analysis: Reading(Ma): Addison-Wesley.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems, 2(1), 37-52.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw