進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0707201412315300
論文名稱(中文) 生成多峰性虛擬樣本以評估小資料集之產品壽命性能
論文名稱(英文) Generating Multi-modal Virtual Samples to Assess Product Lifetime Performance for Small Data Sets
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 林良憲
研究生(英文) Liang-Sian Lin
學號 r38991052
學位類別 博士
語文別 英文
論文頁數 66頁
口試委員 指導教授-利德江
召集委員-吳植森
口試委員-蔡長鈞
口試委員-黃信豪
口試委員-王維聰
中文關鍵字 最大P值  多峰態屬性  小資料集  虛擬樣本生成  虛擬樣本數 
英文關鍵字 Maximal P-Value  Multi-modality attribute  Small data set  Virtual sample generation  Virtual sample size 
學科別分類
中文摘要 在許多研究報告中,虛擬樣本生成法經常被用於提高小資料學習性能。適當地估計資料的分佈在虛擬樣本生成過程中扮演一個重要的角色,通常面對具有簡單分佈的資料則該方法假定資料為一個簡單的分佈確實可以獲得較佳性能,但是資料可能是一個複雜的分佈。通常混合的資料集具有多峰分佈,也就是資料的分佈並不是一個簡單且單峰的分佈。為了解決這個問題,本研究假設資料來自一個兩參數型的韋伯分佈並且提出最大P值法來估計兩參數值以用來建構一個非線性且非對稱形狀的小資料分佈。更進一步地,本研究提出新的方法來偵測多峰資料集,以避免不當地假設資料為單峰分佈的問題。本研究利用常見的k-means分群方法來找出可能的群集,並且針對每個群內的樣本使用已估計韋伯變量來產生多峰性虛擬樣本。在提出的方法提出一個準則來決定虛擬樣本數的大小,該準則為測量原本樣本和虛擬樣本之間的Weibull偏斜之誤差的變化程度。本研究提供模擬的資料集與兩個實例來驗證最大P值法在小樣本數量下是一個更適當的技術來提升資料分佈估計的正確性。此外,本研究運用六個資料集來驗證提出所提出方法的性能,並在不同的訓練資料數量下比較分類的正確性。最後的實驗結果使用一個無母數檢定法來檢定所提出的方法比整體趨勢擴散法具有更佳的分類性能。
英文摘要 Virtual sample generation approaches have been used with small data sets to enhance learning performance in a number of reports. The appropriate estimation of the data distribution plays an important role in this process, and the resulting performance is usually better for data sets that have a simple distribution rather than a complex one. However, mixed-type data sets often have a multi-modal distribution instead of a simple, uni-modal one. In order to solve this problem, this study assumes that a data set follows a two-parameter Weibull distribution, and proposes the Maximal P-Value method to estimate two parameters of a Weibull distribution to construct a nonlinear and asymmetrical small data distribution. Further, this study thus proposes a new approach to detect multi-modality in data sets, to avoid the problem of inappropriately using a uni-modal distribution. This work utilizes the common k-means clustering method to detect possible clusters, and, based on the clustered sample sets, a Weibull variate is estimated for each of these to produce multi-modal virtual data. In this approach, the degree of error variation in the Weibull skewness between the original and virtual data is measured and used as the criterion for determining the sizes of virtual samples. This study provides simulated data sets and two practical examples to demonstrate that the Maximal P-Value method is a more appropriate technique to increase estimation accuracy of data distribution with small sample sizes. In addition, six data sets with different training data sizes are employed to check the performance of the proposed method, and comparisons are made based on the classification accuracy. Finally, the experimental results using non-parametric testing show that the proposed method has better classification performance than that of the Mega-Trend-Diffusion method.
論文目次 摘要 I
ABSTRACT II
誌謝 III
CONTENTS IV
LIST OF TABLES VI
LIST OF FIGURES VII
1. INTRODUCTION 1
1.1 Research Background 1
1.2 Research Motivation 2
1.3 Research Purposes 4
1.4 Research Structure 5
2. LITERATURE REVIEW 6
2.1 Related Studies 6
2.1.1 Virtual Sample Generation 6
2.1.2 The Mega-Trend-Diffusion Method 7
2.1.3 Least-squares Estimation for a Weibull Distribution 8
2.1.4 The Lifetime Performance Testing Procedure 8
2.2 Modality Tests 13
2.2.1 The Dip Test 13
2.2.2 The Excess Mass Test 15
2.3 Related Techniques for Clustering and Classification 17
2.3.1 K-means Clustering 17
2.3.2 Linear Discriminant Analysis 18
2.3.3 K-nearest Neighbors 19
2.3.4 Support Vector Machine 20
3. METHODOLOGY 23
3.1 The Scheme for Virtual Sample Generation 23
3.2 The Maximal P-Value Method 25
3.3 The Proposed Modality Test 26
3.3.1 The Relationship between PDF and CDF 26
3.3.2 The Procedure of Modality Test 28
3.4 The Decision of Virtual Sample Size 30
3.5 Multi-modal Virtual Sample Generation 31
3.5.1 Virtual Sample Generation 32
3.5.2 The Inversion Method 32
3.5.3 K-modality Selection for Attributes 33
3.6 The Detailed Steps of the Proposed Method 34
4. EXPERIMENTS 36
4.1 The Performance of Maximal P-Value Method 36
4.1.1 Simulated Data Sets 36
4.1.2 Two Types of Real Numerical Data 43
4.1.3 Experimental Results 46
4.2 The Six Data Sets 46
4.3 An Example of the Proposed Method 48
4.4 The Experiment Design 50
4.5 The Results for the Selection of Classifiers 51
4.6 The Results of the Experiment to Compare Methods 54
4.7 Summary 58
5. CONCLUSIONS AND SUGGESTIONS 59
5.1 Conclusions 59
5.2 Suggestions 60
REFERENCES 61
參考文獻 Abernethy, R.B. (2004), The New Weibull Handbook (5th ed.), 536 Oyster Road, North Palm Beach, Florida: Robert B Abernethy.
Amari, S.-i. & Wu, S. (1999), “Improving support vector machine classifiers by modifying kernel functions.” Neural Networks, 12 (6), pp. 783-789.
Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]
Aydin, I., Karakose, M. & Akin, E. (2011), “A multi-objective artificial immune algorithm for parameter optimization in support vector machine.” Applied Soft Computing, 11 (1), pp. 120-129.
Benard, A. & Bos-Levenbach, E.C. (1953), “The plotting of observations on probability paper.” Statistica, 7, pp. 163-173.
Bowman, K.O. & Shenton, L.R. (2001), “Weibull distributions when the shape parameter is defined.” Computational Statistics & Data Analysis, 36 (3), pp. 299-310.
Chan, Y.-b. & Hall, P. (2010), “Using evidence of mixed populations to select variables for clustering very high-dimensional data.” Journal of the American Statistical Association, 105 (490), pp. 798-809.
Chang, C.C. & Lin, C.J. (2011), “LIBSVM: A library for support vector machines.” ACM Transactions on Intelligent Systems and Technology, 2 (3), pp. 1-27.
Chang, Y. & Wu, C.W. (2008), “Assessing process capability based on the lower confidence bound of Cpk for asymmetric tolerances.” European Journal of Operational Research, 190 (1), pp. 205-227.
Chen, J.P. & Chen, K. (2004), “Comparing the capability of two processes using Cpm.” Journal of Quality Technology, 36 (3), pp. 329-335.
Cheng, M.Y. & Hall, P. (1999), “Mode testing in difficult cases.” The Annals of Statistics, 27 (4), pp. 1294-1315.
Cho, S., Jang, M. & Chang, S. (1997), “Virtual sample generation using a population of networks.” Neural Processing Letters, 5 (2), pp. 21-27.
Cortes, C. & Vapnik, V. (1995), “Support-vector networks.” Machine learning, 20 (3), pp. 273-297.
Das, K. & Nenadic, Z. (2009), “An efficient discriminant-based solution for small sample size problem.” Pattern Recognition, 42 (5), pp. 857-866.
Davies, P.L. & Kovac, A. (2004), “Densities, spectral densities and modality.” Annals of Statistics, 32 (3), pp. 1093-1136.
Demšar, J. (2006), “Statistical comparisons of classifiers over multiple data sets.” The Journal of Machine Learning Research, 7, pp. 1-30.
Denoeux, T. (1995), “A k-nearest neighbor classification rule based on Dempster-Shafer theory.” IEEE Transactions on Systems, Man and Cybernetics, 25 (5), pp. 804-813.
Dodson, B. (2006), The Weibull Analysis Handbook (2nd ed.), Milwaukee: American Society for Quality, Quality Press.
Durbin, J., Knott, M. & Taylor, C. (1975), “Components of Cramer-von Mises statistics. II.” Journal of the Royal Statistical Society. Series B (Methodological), 37 (2), pp. 216-237.
Estabrooks, A., Jo, T. & Japkowicz, N. (2004), “A multiple resampling method for learning from imbalanced data sets.” Computational Intelligence, 20 (1), pp. 18-36.
Gail, M.H. & Gastwirth, J.L. (1978), “A scale-free goodness-of-fit test for the exponential distribution based on the Gini statistic.” Journal of the Royal Statistical Society. Series B (Methodological), 40 (3), pp. 350-357.
Good, I. & Gaskins, R. (1980), “Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data.” Journal of the American Statistical Association, 75 (369), pp. 42-56.
Hartigan, J.A. & Hartigan, P. (1985), “The dip test of unimodality.” The Annals of Statistics, 13 (1), pp. 70-84.
Iman, R.L. & Davenport, J.M. (1980), “Approximations of the critical region of the fbietkan statistic.” Communications in Statistics-Theory and Methods, 9 (6), pp. 571-595.
Kapur, K.C. & Lamberson, L.R. (1977), Reliability in Engineering Design, New York: John Wiley and Sons, Inc.
Knott, M. (1974), “The distribution of the Cramér-von Mises statistic for small sample sizes.” Journal of the Royal Statistical Society. Series B (Methodological), 36 (3), pp. 430-438.
Lehmann, E.L. & Scheffé, H. (1950), “Completeness, similar regions, and unbiased estimation: Part I.” Sankhyā: The Indian Journal of Statistics (1933-1960), 10 (4), pp. 305-340.
Li, D.C., Chang, C.C. & Liu, C.W. (2012), “Using structure-based data transformation method to improve prediction accuracies for small data sets.” Decision Support Systems, 52 (3), pp. 748-756.
Li, D.C., Chen, L.S. & Lin, Y.S. (2003), “Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments.” International Journal of Production Research, 41 (17), pp. 4011-4024.
Li, D.C., Fang, Y.H. & Fang, Y.M.F. (2010), “The data complexity index to construct an efficient cross-validation method.” Decision Support Systems, 50 (1), pp. 93-102.
Li, D.C. & Lin, L.S. (2013), “A new approach to assess product lifetime performance for small data sets.” European Journal of Operational Research, 230 (2), pp. 290-298.
Li, D.C., Lin, L.S. & Peng, L.J. (2014), “Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency.” Decision Support Systems, 59, pp. 286-295.
Li, D.C. & Lin, Y.S. (2006), “Using virtual sample generation to build up management knowledge in the early manufacturing stages.” European Journal of Operational Research, 175 (1), pp. 413-434.
Li, D.C. & Liu, C.W. (2012), “Extending attribute information for small data set classification.” IEEE Transactions on Knowledge and Data Engineering, 24 (3), pp. 452-464.
Li, D.C., Liu, C.W. & Hu, S.C. (2010), “A learning method for the class imbalance problem with medical data sets.” Computers in Biology and Medicine, 40 (5), pp. 509-518.
Li, D.C., Wu, C.S., Tsai, T.I. & Lina, Y.S. (2007), “Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge.” Computers & Operations Research, 34 (4), pp. 966-982.
Lin, Y.S. & Li, D.C. (2010), “The Generalized-Trend-Diffusion modeling algorithm for small data sets in the early stages of manufacturing systems.” European Journal of Operational Research, 207 (1), pp. 121-130.
Little, S.N. (1983), “Weibull diameter distributions for mixed stands of western conifers.” Canadian Journal of Forest Research, 13 (1), pp. 85-88.
Liu, P.H. & Chen, F.L. (2006), “Process capability analysis of non-normal process data using the Burr XII distribution.” The International Journal of Advanced Manufacturing Technology, 27 (9), pp. 975-984.
Müller, D.W. & Sawitzki, G. (1991), “Excess mass estimates and tests for multimodality.” Journal of the American Statistical Association, 86 (415), pp. 738-746.
Mannino, M., Yang, Y. & Ryu, Y. (2009), “Classification algorithm sensitivity to training data with non representative attribute noise.” Decision Support Systems, 46 (3), pp. 743-751.
Montgomery, D.C. (1985), Introduction to Statistical Quality Control, New York: John Wiley & Sons Inc.
Niyogi, P., Girosi, F. & Poggio, T. (1998), “Incorporating prior information in machine learning by creating virtual examples.” Proceedings of the IEEE, 86 (11), pp. 2196-2209.
Pearn, W.L., Hung, H. & Cheng, Y.C. (2009), “Supplier selection for one-sided processes with unequal sample sizes.” European Journal of Operational Research, 195 (2), pp. 381-393.
Poggio, T. & Vetter, T. (1992). Recognition and structure from one (2D) model view: observations on prototypes, object classes, and symmetries. In AIM-1347 (Ed.). Massachusetts Institute of Technology: Artificial Intelligence Laboratory.
Polonik, W. & Wang, Z. (2005), “Estimation of regression contour clusters—an application of the excess mass approach to regression.” Journal of Multivariate Analysis, 94 (2), pp. 227-249.
Proschan, F. (1963), “Theoretical explanation of observed decreasing failure rate.” Technometrics, 5 (3), pp. 375-383.
Qi, Z., Tian, Y. & Shi, Y. (2013), “Robust twin support vector machine for pattern classification.” Pattern Recognition, 46 (1), pp. 305-316.
Silverman, B.W. (1981), “Using kernel density estimates to investigate multimodality.” Journal of the Royal Statistical Society. Series B (Methodological), 43 (1), pp. 97-99.
Tong, L.I., Chen, K. & Chen, H. (2002), “Statistical testing for assessing the performance of lifetime index of electronic components with exponential distribution.” International Journal of Quality & Reliability Management, 19 (7), pp. 812-824.
Wahed, A.S., Luong, T.M. & Jeong, J.H. (2009), “A new generalization of Weibull distribution with application to a breast cancer data set.” Statistics in Medicine, 28 (16), pp. 2077-2094.
Wu, C.W. & Pearn, W.L. (2008), “A variables sampling plan based on Cpmk for product acceptance determination.” European Journal of Operational Research, 184 (2), pp. 549-560.
Wu, C.W., Pearn, W.L. & Kotz, S. (2009), “An overview of theory and practice on process capability indices for quality assurance.” International Journal of Production Economics, 117 (2), pp. 338-359.
Xu, P., Brock, G.N. & Parrish, R.S. (2009), “Modified linear discriminant analysis approaches for classification of high-dimensional microarray data.” Computational Statistics & Data Analysis, 53 (5), pp. 1674-1687.
Yang, J., Yu, X., Xie, Z.Q. & Zhang, J.P. (2011), “A novel virtual sample generation method based on Gaussian distribution.” Knowledge-Based Systems, 24 (6), pp. 740-748.
Zhang, L.F., Xie, M. & Tang, L.C. (2007), “A study of two estimation approaches for parameters of Weibull distribution based on WPP.” Reliability Engineering & System Safety, 92 (3), pp. 360-368.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-07-07起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-07-07起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw