進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0208201623440500
論文名稱(中文) 建立一個新的虛擬樣本產生技術學習小樣本資料
論文名稱(英文) Constructing a new virtual sample generation technique for small dataset learning
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系碩士在職專班
系所名稱(英) Department of Industrial and Information Management (on the job class)
學年度 104
學期 2
出版年 105
研究生(中文) 凌偉珊
研究生(英文) Wei-Shan Ling
學號 r37031265
學位類別 碩士
語文別 中文
論文頁數 52頁
口試委員 指導教授-利德江
口試委員-林耀三
口試委員-葉俊吾
口試委員-林良憲
中文關鍵字 小樣本資料  虛擬樣本產生法  軟性DBSCAN  整體趨勢擴散法 
英文關鍵字 small data  virtual sample generation  soft DBSCAN  Mega-trend diffusion 
學科別分類
中文摘要 由於網路世代的興起,資訊傳遞快速且更多元,大數據成為這幾年最熱門的討論話題,很多學者提出不同面向的研究,除了大數據問題外,數據缺乏的小樣本問題也常在日常周圍發生,例如新產品導入工程階段、新機台新製程參數制定、傳染病的流行、毀滅性災難的發生、預估氣候變化等,歸納以上的問題都有一些共通的特性,像是資料取得不容易或者取得成本過高,導致讓專家難以做進一步相關的分析與預測。因此在數據缺乏的情況下,如何從取得不易的資料中擷取出更多有意義的資訊提供參考,在近幾年已成為另一個研究的議題。
而虛擬樣本產生法已被驗證是一種有效解決小樣本問題的方法,其中主要的技術為整體趨勢擴散法(Mega-trend diffusion, MTD),其主要的定義是假設資料是一個單峰的分佈並考量偏態的狀況,但真實的資料母體分佈可能為多峰態型態,且資料分佈並非都是簡單分佈。為了解決以上所提到的問題,本研究提出一個無母數多峰態虛擬樣本產生法,利用軟性DBSCAN群集法先對小樣本的數據做資料的前置處理,從中擷取出最大量且有用的前處理資訊,接著利用MTD演算法估計每群資料的範圍,藉以產生虛擬樣本提供後續資料預測時使用。
英文摘要 Since the rise of Generation Network, big data has become the hottest topic issue even small data recently. It is difficult to do further analysis and prediction due to small data is not easy to obtain and high cost. Virtual sample generation method proved an effective way to solve small data problem. The main technique is Mega-trend diffusion (MTD) that defined database on status of uniform distribution and skewness. These studies propose a non-parametric multi-modal virtual sample generation for multi-modal population. After running data preprocess, it will capture the maximum and useful data by using soft DBSCAN cluster method. Using estimated data range by MTD Algorithm and generate virtual sample for prediction.
論文目次 摘要 I
Extended Abstract II
誌謝 VI
目錄 VII
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機及目的 4
1.3 研究流程 7
第二章 文獻探討 9
2.1 資訊擴散樣本產生法 9
2.2 虛擬樣本產生法 15
2.3 群集分析 17
2.3.1 K-mean演算法 20
2.3.2 DBSCAN演算法 21
2.4 預測模型 23
2.4.1 倒傳遞類神經網路(Back propagation network, BPN) 23
第三章 研究方法 26
3.1 符號定義 26
3.2 軟性DBSCAN 27
3.3 虛擬樣本生成方法 29
3.3.1 MTD值域推估 29
3.3.2 MTD樣本生成流程 30
3.4 本研究方法流程 32
第四章 實例驗證 34
4.1 實驗環境 34
4.2 個案說明 35
4.2.1 被動元件積層陶瓷電容(MLCC)特性預測 35
4.3 實驗流程 38
4.3.1 實驗流程 38
4.3.2 小結 46
第五章 結論與建議 47
5.1 結論 47
5.2 研究建議 48
參考文獻 49

參考文獻 Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In proceeding of 2nd International Conference on Knowledge Discovery, 8(3), 338-353.
Efron, B. (1979). Bootstrap Methods:Another Look at the Jackknife. The Annals of Statistics, 7,1-26.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivănescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Li, D. C., & Liu, C. W. (2010). A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Systems with Applications, 37(4), 3104-3110.
Li, D. C., Chang, C. C., & Liu, C. W. (2012a). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems, 52, 748-756.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012b). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using Functional Virtual Population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
Li, D. C., Wu, C. S., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., & Lin, L. S. (2014a). Generating Information for Small Data Sets with a Multi-modal Distribution. Decision Support Systems, 66, 71–81.
Li, D. C., & Wen, I. H., (2014b). A Genetic Algorithm-Based Virtual Sample Generation Technique to Improve Small Data Set Learning. Neurocomputing, 143(2), 222–230.
Li, D. C., Chen, W. C., Chang, C. J., Chen, C. C., & Wen, I. H., (2015). Practical Information Diffusion Techniques to Accelerate New Product Pilot Runs. International Journal of Production Research, 53(7), 5310-5319
Li, D. C., Wen, I. H., & Chen, W. C., (2016), A Novel Data Transformation Model for Small Dataset Learning. International Journal of Production Research (In press).
MacQueen, J. B. (1967). Some methods for classfication and analysis of multivariate observations, Proceeding of the fifth Berkley Symposium on Mathematical Statistics and Probability, University of California Press, 281-297.
Martin, C. A. & Witt, S.F., (1989b). Accuracy of econometric forecasts of Tourism. Annals of Tourism Research, 16, 407-428.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Smiti, A., & Eloudi, Z. (2013). Soft DBSCAN: Improving DBSCAN Clustering Method using fuzzy set theory. Paper presented at the Human System Interaction (HSI), 2013 The 6th International Conference, Sopot.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining (1st ed.): Addison-Wesley.
Tukey, J. W. (1977). Exploratory data analysis: Reading (MA): Addison-Wesley.
Wang, H. F., & Huang, C. J. (2009). Data construction method for the analysis of the spatial distribution earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 188-212.
Wu, C. W., Shu, M. H., Pearn, W. L., & Liu, K. H. (2008). Bootstrap approach for supplier selection based on production yield. International Journal of Production Research, 46(18), 5211-5230.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-08-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw