進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2507201713530800
論文名稱(中文) 套索迴歸變數挑選於資料包絡分析法及凸性無母數最小平方法
論文名稱(英文) LASSO Variable Selection in Data Envelopment Analysis and Convex Nonparametric Least Squares
校院名稱 成功大學
系所名稱(中) 製造資訊與系統研究所
系所名稱(英) Institue of Manufacturing Information and Systems
學年度 105
學期 2
出版年 106
研究生(中文) 蔡佳盈
研究生(英文) Jia-Ying Cai
學號 P96041071
學位類別 碩士
語文別 英文
論文頁數 57頁
口試委員 指導教授-李家岩
口試委員-孔令傑
口試委員-王逸琳
口試委員-楊大和
中文關鍵字 資料包絡分析法  套索迴歸變數挑選  效率估算  凸性無母數最小平方法  維度縮減 
英文關鍵字 data envelopment analysis  LASSO variable selection  efficiency estimation  convex nonparametric least squares  dimension reduction 
學科別分類
中文摘要 在資料包絡分析法中,投入與產出變數之多寡對於生產函數估算上有顯著影響,意即當我們使用較少的觀測值估算較高維度生產函數時,我們會面臨維度的詛咒。

本研究建構一資料生產過程(Data Generation Process, DGP),用以探討在資料包絡分析法中典型的經驗法則(例如:觀測值數目須至少兩倍的投入加上產出變數之數量)是一含糊之用法且可能導致估計之生產函數與真實之生產函數造成重大偏離之現象, 因此我們需要變數挑選以改善偏離之現象。

本研究可分為兩大部分,在第三章探討單一產出與多投入之情形,而第四章則研究在多產出與多投入之情況下之變數挑選情形。

套索迴歸(Least Absolute Shrinkage and Selection Operator, LASSO)是一變數挑選技巧,常用於資料探勘(data mining)萃取重要變數(因子)。本研究將套索迴歸應用於資料包絡分析法(Data Envelopment Analysis, DEA)及符號限制之凸性無母數最小平方法(Sign-Constrained Convex nonparametric least square, SCNLS)中,藉以挑選重要變數。在第三章中,本研究建議使用套索迴歸結合符號限制之凸性無母數最小平方法(LASSO SCNLS)之模型,其研究結果亦顯示此方法有助於資料包絡分析法中之變數挑選。在第四章中,本研究建議結合主成分分析(Principle Component Analysis, PCA)與group LASSO之概念於凸性無母數最小平方法中(PCA Group-LASSO SCNLS),其研究結果顯示此模型亦有助於資料包絡分析法中之變數挑選。
英文摘要 The number of inputs and outputs factors has significant impacts on the production function estimated by data envelopment analysis (DEA). That is, “curse of dimensionality” is an issue when using a small number of observations for estimating the high-dimensional frontier. The study conducts a data generating process (DGP) to argue that the typical “rule of thumbs”, e.g. the number of observations should be at least larger than twice of the number of inputs and outputs, used in DEA is ambiguous and may lead to large deviations in technical efficiency estimation. Hence, this study proposes variable selection technique to address this issue.

This study can be separated into two parts: single-output and multiple-inputs scenario (Chapter 3) and multiple-outputs and multiple-inputs scenario (Chapter 4).

In Chapter 3, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique usually used in data mining for extracting significant factors in the formulation of sign-constrained convex nonparametric least squares (SCNLS) regarded as DEA, and the results show that the proposed LASSO-SCNLS method is useful to give guidelines of dimension reduction in DEA. In Chapter 4, we suggest Principle Component Analysis (PCA) Group-LASSO SCNLS method for variable selection, and the result shows that is performs well for dimension reduction.
論文目次 中文摘要 I
Abstract II
Acknowledgements III
Table of Contents V
List of Figures VII
List of Tables VIII
Terminology and Notations IX
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Problem Statement and Research Purpose 3
1.3 Research Overview 4
Chapter 2 Literature Review 5
2.1 Data Envelopment Analysis (DEA) 5
2.1.1 Rule of Thumb about DEA 8
2.1.2 DEA as Sign-Constrained CNLS (SCNLS) 9
2.2 Variable Selection Techniques 11
2.2.1 Least Absolute Shrinkage and Selection Operator (LASSO) 11
2.2.2 Group LASSO 14
2.2.3 Principle Component Analysis (PCA) 14
2.3 Summary and Discussion 16
Chapter 3. Variable Selection Techniques in Single Output and Multiple Inputs 17
3.1 Research Framework 17
3.2 Research Methods 19
3.2.1 Data Generation Process (DGP) 19
3.2.2 A Validation of Insufficient DMUs 21
3.2.3 Models for Variable Selection 24
3.3 MSE Comparison 28
3.4 Statistical Test 32
3.5 Summary and Discussion 34
Chapter 4. Variable Selection Techniques in Multiple Outputs and Multiple Inputs 36
4.1 Research Methods 36
4.1.1 Data Generation Process (DGP) 36
4.1.2 Models for Variable Selection 37
4.2 MSE Comparison 43
4.3 Statistical Test 47
4.4 Summary and Discussion 50
Chapter 5. Conclusion and Future Research 51
5.1 Conclusion 51
5.2 Future Study 53
References 55
Appendix 57
參考文獻 Aigner, D., Lovell, C. K., & Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. journal of Econometrics, 6(1), 21-37.
Aigner, D. J., & Chu, S.-F. (1968). On estimating the industry production function. The American Economic Review, 58(4), 826-839.
Bakin, S. (1999). Adaptive regression and model selection in data mining problems.
Boussofiane, A., Dyson, R. G., & Thanassoulis, E. (1991). Applied data envelopment analysis. European Journal of Operational Research, 52(1), 1-15.
Bowlin, W. F. (1998). Measuring performance: An introduction to data envelopment analysis (DEA). The Journal of Cost Analysis, 15(2), 3-27.
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429-444.
Cobb, C. W., & Douglas, P. H. (1928). A theory of production. The American Economic Review, 18(1), 139-165.
Daraio, C., & Simar, L. (2005). Introducing environmental variables in nonparametric frontier models: a probabilistic approach. Journal of productivity analysis, 24(1), 93-121.
Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., & Shale, E. A. (2001). Pitfalls and protocols in DEA. European Journal of Operational Research, 132(2), 245-259.
Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A (General), 120(3), 253-290.
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1): Springer series in statistics Springer, Berlin.
Golany, B., & Roll, Y. (1989). An application procedure for DEA. Omega, 17(3), 237-250.
Greene, W. H. (1980). Maximum likelihood estimation of econometric frontier functions. journal of Econometrics, 13(1), 27-56.
Hanson, D., & Pledger, G. (1976). Consistency in concave regression. The Annals of Statistics, 1038-1050.
Hildreth, C. (1954). Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49(267), 598-619.
Kuosmanen, T., Johnson, A., & Saastamoinen, A. (2015). Stochastic nonparametric approach to efficiency analysis: A unified framework Data Envelopment Analysis (pp. 191-244): Springer.
Kuosmanen, T., & Johnson, A. L. (2010). Data envelopment analysis as nonparametric least-squares regression. Operations Research, 58(1), 149-160.
Kuosmanen, T., & Kortelainen, M. (2012). Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints. Journal of productivity analysis, 38(1), 11-28.
Lee, C.-Y., & Johnson, A. L. (2015). Measuring efficiency in imperfectly competitive markets: An example of rational inefficiency. Journal of Optimization Theory and Applications, 164(2), 702-722.
Meeusen, W., & van Den Broeck, J. (1977). Efficiency estimation from Cobb-Douglas production functions with composed error. International economic review, 435-444.
Peason, K. (1901). On lines and planes of closest fit to systems of point in space. Philosophical Magazine, 2(11), 559-572.
Qin, Z. T., & Song, I. (2014). Joint Variable Selection for Data Envelopment Analysis via Group Sparsity.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
Timmer, C. P. (1971). Using a probabilistic frontier production function to measure technical efficiency. journal of Political Economy, 79(4), 776-794.
Winsten, C. (1957). Discussion on Mr. Farrell’s paper. Journal of the Royal Statistical Society, 120, 282-284.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2022-01-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2022-01-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw