進階搜尋


下載電子全文  
系統識別號 U0026-0708201919062600
論文名稱(中文) 在稀少事件下邏輯式迴歸於三種懲罰項的變數篩選能力之初步探討
論文名稱(英文) A preliminary study of variable selection in penalized logistic regression with rare events data
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 107
學期 2
出版年 108
研究生(中文) 林鼎晃
研究生(英文) Ding-Huang Lin
學號 R26051016
學位類別 碩士
語文別 中文
論文頁數 33頁
口試委員 指導教授-嵇允嬋
口試委員-溫敏杰
口試委員-張升懋
中文關鍵字 懲罰邏輯式迴歸  最小絕對緊縮與選擇算子  平滑修剪絕對離差  適應性最小絕對緊縮與選擇算子  懲罰最大概似估計法 
英文關鍵字 LASSO  SCAD  Adaptive LASSO 
學科別分類
中文摘要 在研究基因表現量資料時,通常因為做為解釋變數的基因數量遠大於樣本數,造成模型之迴歸係數無法估計。有鑑於此,變數篩選就是一個在建模前必要的步驟。學者們陸續提出懲罰線性迴歸(penalized linear regression)來進行變數篩選。常見的變數篩選方法有Tibshirani(1996)的最小絕對緊縮與選擇算子(Least Absolute Shrinkage and Selection Operator, LASSO)、Fan 和Li(2001)的平滑修剪絕對離差(Smoothly Clipped Absolute Deviation, SCAD)、Zou 和Hastie (2005) 的elastic net,及Zou(2006)的適應性最小絕對緊縮與選擇算子(Adaptive LASSO)。後來多數學者已經將上述方法,從連續型反應變數推廣至二元反應變數,而形成懲罰邏輯式迴歸(penalized logistic regression)。
在稀少事件下,學者King and Zeng(2001)提出方法修正模型參數的最大概似估計量(Maximum Likelihood Estimator, MLE)之偏誤會增大的問題。他們也提供修正此偏誤的方法。近年,多數學者採用Firth(1993)所提出的懲罰最大概似估計法(penalized maximum likelihood method)降低MLE 的偏誤。學者Leitgöb(2013)的模擬比較了兩者,結果顯示,建議使用後者的懲罰最大概似估計法降低MLE 的偏誤。
本論文將於高維度資料且事件發生機率很小時,用模擬方式探討 LASSO、SCAD以及Adaptive LASSO 三種懲罰項(penalty)建立的邏輯式迴歸,在篩選解釋變數上的表現。 經由模擬發現,LASSO 的預測結果較其它兩者差。建議研究者,在稀少事件下,應用懲罰邏輯式迴歸進行變數篩選及建模預測,使用Adaptive LASSO。
英文摘要 It's well known that the accuracy of MLE of the regression coefficient in logistic regression model is seriously affected by rare events. Less attention is given to the performance of variable selection in logistic regression with rare events. Therefore, this thesis studies the performance of three variable selection methods, LASSO (Least Absolute Shrinkage and Selection Operator), SCAD (Smoothly Clipper Absolute Deviation), and Adaptive LASSO, when event rate is low and the number of explanatory variables is much larger than sample sizes.
A simulation study is conducted to compare the accuracy in selecting important explanatory variables of logistic regression model. Based on limited simulation scenarios, when event rate is as low as 0.05, the simulation results recommended using Adaptive LASSO to select important explanatory variables. Consequently, Adaptive LASSO is recommended for variable selection and prediction with rare events data.
論文目次 第一章 緒論 ......................................... 1
第二章 文獻回顧 ..................................... 4
第一節 邏輯式迴歸模型 ............................... 4
第二節 懲罰項介紹 ................................. 6
第三節 Firth 邏輯式迴歸模型 ............................ 8
第三章 模擬探討 ..................................... 9
第一節 模擬設計 ................................... 9
第二節 模擬結果 ................................... 11
第四章 實例分析 ..................................... 17
第一節 資料集介紹 ................................. 17
第二節 由1020 基因中篩選出基因的預測能力 .................. 20
第三節 由TOP 100 基因中篩選出基因的預測能力 ................ 23
第四節 Firth 邏輯式迴歸模型 ............................ 25
第五章 結論與建議 ................................... 31
參考文獻 ........................................... 32
參考文獻 1. Austin, E., Pan, W. and Shen, X. (2013). “ Penalized regression and risk prediction in
Genome-Wide association studies.” Statistical Analysis and Data Mining, Vol. 6, pp.
315-328.
2. Fan and Li. (2001). “Variable selection via nonconcave penalized likelihood and its
oracle properties.” Journal of the American Statistical Association, Vol. 96, No. 456,
pp. 1348-1360.
3. Firth, D. (1993). “Bias reduction of maximum likelihood estimates” Biomelrika, Vol.
80, No. 1, pp. 27-38.
4. Geeleher, P., Cox, N. and Huang, R. (2014). “Clinical drug response can be predicted
using baseline gene expression levels and in vitrodrug sensitivity in cell lines.”
Genome Biology, pp. 1-12.
5. Heinze, G., Wallisch, C. and Dunkler, D. (2018). “Variable selection – A review and
recommendations for the practicing statistician.” Biometrical Journal, Vol. 60, pp.
431–449.
6. Holland, P. and Welsch, R. (1977). “Robust regression using iteratively reweighted
least-squares.” Communications in Statistics - Theory and Methods, Vol. 6, pp.
813-827.
7. Kim, S. and Halabi, S. (2016). “High dimensional variable selection with error
control.” BioMed Research International, Vol. 2016. pp. 1-11.
8. King, G. and Zeng, L. (2001). “Logistic regression in rare events data.” Political
analysis, Vol. 9, No. 2, pp. 137-163.
9. Kyung, M., Gill, J., Ghosh, M. and Casella, G. (2010). “Penalized Regression, Standard Errors, and Bayesian Lassos.” Bayesian Analysis, Vol. 5, No. 2, pp. 369-411.
10. Leitgöb, H. (2013). “The problem of modeling rare events in ML-based logistic
regression.” European Survey Research, pp. 1-19.
11. Pavlou, M., Ambler, G., Seaman, S., Guttmann, O., Elliott, P., King, M. and Omar, R.
(2015). “How to develop a more accurate risk prediction model when there are few
events.” Research Methods & Reporting, pp. 1-5.
12. Shieh, G., Lok, M. and Chang, J. (2018). “Prediction of Cancer Drug Response.” The
27th South Taiwan Statistics Conference and 2018 Chinese Institute of Probability and
Statistics Annual Meeting and Chung-hwa Data Mining Society Annual Meeting, pp.
1-36.
13. Tibshirani, R. (1996). “Regression shrinkage and selection via the Lasso.” Journal of
the Royal Statistical Society. Series B, Vol. 58, No. 1, pp. 267-288.
14. Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic
net.” Journal of the Royal Statistical Society. Series B, Vol. 67, Part 2, pp. 301–320.
15. Zou, H. (2006). “The Adaptive Lasso and its oracle properties.” Journal of the
American Statistical Association, Vol. 101, pp. 1418-1429.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-08-20起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2019-08-20起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw