系統識別號 U0026-1607201416244400
論文名稱(中文) 使用基於屬性趨勢相似度生成之虛擬樣本建構液晶面板廠之高維度資料製造模式
論文名稱(英文) Employing Virtual Samples Created based on the Trend Similarity between Attributes to Build High-dimensional Manufacturing Models in TFT-LCD Plants
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 黃文定
研究生(英文) Wen-Ting Huang
學號 R38991086
學位類別 博士
語文別 中文
論文頁數 79頁
口試委員 指導教授-利德江
中文關鍵字 小樣本學習  屬性趨勢相似度  虛擬樣本 
英文關鍵字 small dataset learning  trend similarities between attributes  virtual samples 
中文摘要 小樣本學習問題因逐年增劇的全球化競爭所導致之極短產品生命週期而愈顯其重要性。過往的統計及機械學習演算法,係基於樣本足以充分呈現母體特性之前提而開發,對於僅提供部分母體特性之少量樣本並無法從中擷取有意義的資訊。在小樣本學習方法中,屬於資料前處理範疇的虛擬樣本產生法已被驗證其效果,然過往之虛擬樣本產生法,並未有妥善考量樣本屬性間之關係者,故本研究基於相關係數之延伸概念提出一個系統性的樣本生成程序,其係使用無母數方式擷取屬性間之趨勢相似度,再經由各屬性之模糊三角隸屬函數依序推估各屬性值之可能落點區間而衍生虛擬樣本。於實驗驗證階段,本研究以三筆從業界取得之個案資料進行探討,結果除顯示M5'模式樹、多元線性迴歸、支撐向量迴歸以及倒傳遞類神經網路對於小樣本資料之預測準確度獲得顯著改善外,並較其他樣本生成法有更佳之效果。
英文摘要 The importance of small dataset learning problems has arisen in past decades because of the short product lifecycles caused by the increasing pressure of global competition. Although statistical approaches and the machine learning algorithms are widely applied to extract information from data, these are basically developed based on the assumption that training samples can afford to represent the whole population properties. Consequently, as the properties the training samples contain are limited, the knowledge that the algorithms extract may be confined. The virtual sample generation approaches, taken as one kind of data pretreatment methods, have proved their effectiveness when handling small datasets. Further, uniquely considering the occurrence relationship between attributes in the value generation procedure, this research proposes a non-parametric process to learn the trend similarities between attributes, and then based on which to estimate the corresponding ranges that attribute values may locate when other attributes’ value are given. Through the triangular membership functions which represent the attribute sample distributions, stepwise estimating the ranges for attribute values generation between, virtual samples are then formed. In the experiment, three real cases taken from related works are examined with the modeling tools including the M5' model tree, the multiple linear regression, the support vector regression, and the back-propagation neural network. The results show that the forecasting accuracies of the four modeling tools are improved when training sets contain virtual samples. In addition, the outcomes of the proposed procedure show significantly lower predictive errors than those of other approaches.
論文目次 摘要 I
Abstract II
誌謝 XVI
表目錄 XX
圖目錄 XXI
第一章 緒論 1
 1.1 研究背景 1
 1.2 研究動機 3
 1.3 研究目的 5
 1.4 研究架構與流程 5
第二章 文獻探討 8
 2.1 小樣本學習方法 8
 2.2 虛擬樣本學習方法 10
  2.2.1 資訊擴散技術 10
  2.2.2 其他虛擬樣本演算法 17
 2.3 預測模型 19
  2.3.1 M5'模式樹 19
  2.3.2 多元線性迴歸 22
  2.3.3 支撐向量迴歸 22
  2.3.4 倒傳遞類神經網路 23
第三章 研究方法 26
 3.1 符號定義 27
 3.2 立論基礎 27
 3.3 分配推估 28
  3.3.1 盒鬚圖定義 29
  3.3.2 分配值域推估 30
  3.3.3 分配建構 31
 3.4 屬性趨勢相似度 32
 3.5 樣本生成 33
  3.5.1 落點區間推估 33
  3.5.2 屬性虛擬值生成 36
  3.5.3 多維資料的學習 37
第四章 實例驗證 40
 4.1 實驗環境 40
  4.1.1 實驗進行方式 40
  4.1.2 預測誤差衡量指標 41
  4.1.3 假設檢定方法 42
  4.1.4 建模軟體 42
 4.2 個案說明 43
  4.2.1 個案I:Cell製程位偏問題 44
  4.2.2 個案II:CF製程之感光型柱狀間隙物高度問題 47
  4.2.3 個案III:MLCC被動元件特性預測問題 49
 4.3 實驗結果 52
  4.3.1 個案I:實驗結果 52
  4.3.2 個案II:實驗結果 58
  4.3.3 個案III:實驗結果 66
第五章 結論與建議 73
 5.1 結論 73
 5.2 建議 74
參考文獻 75
參考文獻 Anthony, M., & Biggs, N. (1997). Computational Learning Theory: Cambridge University Press.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth and Brooks.
Chan, K. Y., Kwong, C. K., & Tsim, Y. C. (2010). A genetic programming based fuzzy regression approach to modelling manufacturing processes. International Journal of Production Research, 48(7), 1967-1982.
Chao, G. Y., Tsai, T. I., Lu, T. J., Hsu, H. C., Bao, B. Y., Wu, W. Y., . . . Lu, T. L. (2011). A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Systems with Applications, 38(7), 7963-7969.
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297.
Dobra, A., & Gehrke, J. E. (2002). SECRET: A Scalable Linear Regression Tree Algorithm. Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 481-487.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Guo, G. D., & Dyer, C. R. (2005). Learning from examples in the small sample case: Face expression recognition. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 35(3), 477-488.
Hong, T. P., Tseng, L. H., & Chien, B. C. (2010). Mining from incomplete quantitative data by fuzzy rough sets. Expert Systems with Applications, 37(3), 2644-2653.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Huang, C. J., Wang, H. F., Chiu, H. J., Lan, T. H., Hu, T. M., & Loh, E. W. (2010). Prediction of the Period of Psychotic Episode in Individual Schizophrenics by Simulation-Data Construction Approach. Journal of Medical Systems, 34(5), 799-808.
Ivănescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Jennrich, R. I., & Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42(4), 805-820.
Karalic, A. (1992). Employing linear regression in regression tree leaves. Paper presented at the Proceedings of the 10th European Conference on Artificial Intelligence, Vienna, Austria.
Kuo, Y., Yang, T., Peters, B. A., & Chang, I. (2007). Simulation metamodel development using uniform design and neural networks for automated material handling systems in semiconductor wafer fabrication. Simulation Modelling Practice and Theory, 15(8), 1002-1015.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963-974.
Lanouette, R., Thibault, J., & Valade, J. L. (1999). Process modeling with neural networks using small experimental datasets. Computers & Chemical Engineering, 23(9), 1167-1176.
Li, D., Gu, H., & Zhang, L. Y. (2010). A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Systems with Applications, 37(10), 6942-6947.
Li, D. C., Chang, F. M. M., & Chen, K. C. (2010a). Building reliability growth model using sequential experiments and the Bayesian theorem for small datasets. Expert Systems with Applications, 37(4), 3434-3443.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012a). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., Chen, C. C., Chang, C. J., & Lin, W. K. (2012b). A Tree-based-Trend-Diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems with Applications, 39(1), 1575-1581.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012c). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using Functional Virtual Population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009a). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
Li, D. C., Hsu, H. C., Tsai, T. I., Lu, T. J., & Hu, S. C. (2007a). A new method to help diagnose cancers for small sample size. Expert Systems with Applications, 33(2), 420-424.
Li, D. C., & Lin, Y. S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175(1), 413-434.
Li, D. C., Liu, C. W., Fang, Y. H., & Chen, C. C. (2010b). A yield forecast model for pilot products using support vector regression and manufacturing experience-the case of large-size polariser. International Journal of Production Research, 48(18), 5481-5496.
Li, D. C., Liu, C. W., & Hu, S. C. (2010c). A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine, 40(5), 509-518.
Li, D. C., Huang, W. T., Chen, C. C., & Chang, C. J. (2013). Employing virtual samples to build early high-dimensional manufacturing models. International Journal of Production Research, 51(11), 3206-3224.
Li, D. C., Huang, W. T., Chen, C. C., & Chang, C. J. (2014). Employing box plots to build high-dimensional manufacturing models for new products in TFT-LCD plants. Neurocomputing, 142(0), 73-85.
Li, D. C., Tsai, T. I., & Shi, S. (2009b). A prediction of the dielectric constant of multi-layer ceramic capacitors using the mega-trend-diffusion technique in powder pilot runs: case study. International Journal of Production Research, 47(1), 51-69.
Li, D. C., Wu, C. S., & Chang, F. M. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007b). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Loh, W. Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12(2), 361-386.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Oniśko, A., Druzdzel, M. J., & Wasyluk, H. (2001). Learning Bayesian network parameters from small data sets: application of Noisy-OR gates. International Journal of Approximate Reasoning, 27(2), 165-182.
Papari, M. M., Yousefi, F., Moghadasi, J., Karimi, H., & Campo, A. (2011). Modeling thermal conductivity augmentation of nanofluids using diffusion neural networks. International Journal of Thermal Sciences, 50(1), 44-52.
Quinlan, J. R. (1992). Learning with Continuous Classes. Paper presented at the Proceedings Australian Joint Conference on Artificial Intelligence, World Scientific, Singapore.
Thomas, M., Kanstein, A., & Goser, K. (1997). Rare fault detection by possibilistic reasoning. Paper presented at the In Proceedings of Fuzzy Days, Reusch, Bernd, Berlin.
Tsai, T. I., & Li, D. C. (2008). Approximate modeling for high order non.-linear functions using small sample sets. Expert Systems with Applications, 34(1), 564-569.
Tukey, J. W. (1977). Exploratory data analysis: Reading (MA): Addison-Wesley.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory: Springer, New York.
Wang, H. F., & Huang, C. J. (2009). Data construction method for the analysis of the spatial distribution of disastrous earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 189-212.
Wang, Y., Song, Q. B., MacDonell, S., Shepperd, M., & Shen, J. Y. (2009). Integrate the GM(1,1) and Verhulst Models to Predict Software Stage Effort. Ieee Transactions on Systems Man and Cybernetics Part C-Applications and Reviews, 39(6), 647-658.
Wang, Y., & Witten, I. (1997). Inducing Model Trees for Continuous Classes. Paper presented at the Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague, Czech Republic.
Wang, Y. F. (2003). On-demand forecasting of stock prices using a real-time predictor. IEEE Transactions on Knowledge and Data Engineering, 15(4), 1033-1037.
Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5(2), 241-259.
  • 同意授權校內瀏覽/列印電子全文服務,於2017-07-28起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2017-07-28起公開。

  • 如您有疑問,請聯絡圖書館