進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0907201123545900
論文名稱(中文) 使用線性一致估計於連續性狀基因組關聯研究
論文名稱(英文) Linear Consistent Estimator for Continuous Trait Genome-wide Association Studies
校院名稱 成功大學
系所名稱(中) 統計學系碩博士班
系所名稱(英) Department of Statistics
學年度 99
學期 2
出版年 100
研究生(中文) 洪啓豪
研究生(英文) Chi-Hao Hong
學號 R2697102
學位類別 碩士
語文別 中文
論文頁數 45頁
口試委員 指導教授-張升懋
口試委員-馬瀰嘉
口試委員-杜宜軒
中文關鍵字 線性一致估計量  Adaptive Lasso  Local False Discovery Rate  Generalized cross validation 
英文關鍵字 Linear Consistent Estimator  Adaptive Lasso  Local False Discovery Rate  Generalized cross validation 
學科別分類
中文摘要 本研究欲發展出一套程序以找出可能影響疾病的基因。我們常使用線性迴歸來解釋反應變數和解釋變數間的關係。根據文獻結果顯示,使用簡單線性迴歸,容易因為和其他變數的相關性造成偽陽性的判斷;相較之下,使用複迴歸則不易出現此種問題。然而,一旦遇到大量解釋變數時,卻可能受限於有限的樣本數,而無法得到一個最佳線性不偏估計量(BLUE)。因此,我們希望透過一個簡單線性迴歸和複迴歸間的近似關係,得到解釋變數的參數估計值。而此近似方式的一部份為解釋變數間相關係數矩陣的反矩陣。當樣本數大於解釋變數個數的時候,相關係數矩陣為一個可逆的正定矩陣;但當樣本數少於解釋變數個數時,其為一個不可逆的矩陣,使得此轉換方式受到限制。除了樣本數之外,使用複迴歸還有一個常見的問題就是變數的選取,雖然過去發展了很多的指標來判定變數選取的合理性,但這些方式容易受到資料的變化而有很大的改變。
本研究的整個過程包含兩個部分:第一部分是提出一個線性一致估計量來解決相關係數矩陣不可逆性的問題。使用Adaptive Lasso來估計一個基因間的稀疏相關係數矩陣,並使其具有可逆性。;第二部分是估計出的複迴歸係數中,在指定Local False Discovery Rate下找出可能影響疾病的基因。過程中,兩個未知參數,Adaptive Lasso之調整參數λ與Local False Discovery Rate之門檻值q,使用Generalized cross validation(GCV)來決定最適當的數值。本研究將使用模擬的方式來探討整個過程的成果,其中包含樣本數的影響、複迴歸中R-square的影響以及真實顯著變數之位置的影響。
英文摘要 In this thesis, a novel procedure is proposed to identify disease-causing genes. Simple linear regressions were popularly used to figure out the relationships between the independent variable and dependent variables. It is a good way to find the correlation but not the causality when the underlying (linear) model consists of several independent variables. Instead, multiple regressions could avoid this problem. We utilize the relationship between the regression coefficients of simple linear regressions and the regression coefficients of the corresponding multiple regression in population level to estimate parameters by matching moments. The inverse of dependent variables' sample correlation matrix plays the key role in this moment estimator. A problem arises when the sample size is less than the number of independent variables. In that case the resulting sample correlation matrix is no longer invertible. Another technical problem we face is the variable selection issue. Although a lot of variable selection schemes have been developed in various points of view, it is treated as a multiple testing problem in this work.
The proposed procedure consists of two parts. First, a linear consistent estimator of regression coefficients is provided. The singular sample correlation among thousands of genes is replaced by the adaptive Lasso correlation estimate which is sparse and nonsingular. Second, under a pre-specified local false discovery rate, the disease-causing genes are identified via multiple regression. Generalized cross validation is applied to adjust two unknown quantities: the turning parameter of Adaptive Lasso, λ , and the threshold of local false discovery, q. Finally, the proposed procedure is examined by simulations. Factors under consideration include the sample size, the noise level of regression measured by coefficient of determination, and the location of affecting genes.
論文目次 第一章 緒論 1
第二章 文獻回顧 5
第一節 最小平方法 5
第二節 Bridge Regression 6
第三節 Bridge Regression的特例:
最小平方法、Ridge Regression與Lasso 9
第四節 Adaptive Lasso 10
第三章 研究方法 12
第一節 線性一致估計 12
第二節 稀疏的相關矩陣估計 14
第三節 調整參數的選擇 17
第四節 變數的選擇 17
3.4.1 False Discovery Rate 18
3.4.2 Local False Discovery Rate 19
第四章 模擬研究 24
第一節 模擬探討 25
第二節 模擬設計 28
第三節 預期效果 30
第四節 模擬結果 30
第五章 結論與建議 44
參考文獻 45
參考文獻 1. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300.
2. Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis, Journal of the American Statistical Association, 99(465), 96-104.
3. Efron, B. (2009). Correlated z-values and the accuracy of large-scale statistical estimates, Working paper, Stanford University, 2009.
4. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96(456), 1348-1360.
5. Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools, Technometric, 35(2), 109-135.
6. Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso, Journal of Computational and Graphical Statistics, 7(3), 397-416.
7. Hoerl, A. E. and Kennard, R. W. (1970a). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 42(1), 80-86.
8. Lu, W. and Zhang, H. H. (2007). Variable selection for proportional odds model, Statistics in Medicine, 26, 3771-3781.
9. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
10. Zhang, H. H. and Lu, W. (2007). Adaptive lasso for Cox's proportional hazards model, Biometrika, 94(3), 691-703.
11. Zou, H. (2006). The adaptive Lasso and its oracle properties, Journal of the American Statistical Association, 101(476), 1418-1429.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2014-07-18起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw