進階搜尋


下載電子全文  
系統識別號 U0026-3107201718483700
論文名稱(中文) 利用核密度和廣義估計方程式估計致病基因的個數
論文名稱(英文) Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 105
學期 2
出版年 106
研究生(中文) 吳方渝
研究生(英文) Fang-Yu Wu
學號 R26041053
學位類別 碩士
語文別 中文
論文頁數 38頁
口試委員 口試委員-蘇佩芳
口試委員-林億雄
指導教授-馬瀰嘉
中文關鍵字 EM演算法  核密度估計  廣義估計方程式 
英文關鍵字 EM algorithm  kernel density estimation  generalized estimation equation 
學科別分類
中文摘要 找出致病基因在醫學研究中是非常重要的議題,生物學家藉由同一個病人身上的腫瘤細胞和正常細胞進行基因定序,基因讀值經過RPKM(reads per kilobyte of exon model per million mapped reads)校正後之差值,可利用成對樣本t檢定找出致病基因。但數以萬計的基因進行多重檢定時,如果不調整個別檢定之顯著水準,則整體型一誤差就會膨脹。目前主要解決方法為控制FDR(false discovery rate)和FWER(familywise error rate)。但當虛無假設不為真時,FWER方法會有較小的檢定力而且趨近保守。但是不論控制FDR或FWER,首先需準確地估計虛無假設的個數。

本研究是針對鄭暘諭(2016)所提出對虛無假設個數進行估計的EM演算法,從單維度拓展至多維度的探討。本研究假設基因資料呈混合型多變量常態分配,估計方法主要分為兩個部分,第一部份提出利用EM演算法以及核密度估計(Kernel Density Estimation) 的兩種估計方法,第二部分利用廣義估計方程式(Generalized estimating equation,簡稱GEE)進行估計虛無假設為真的比例和單一顯著水準α值。最後,考慮基因表現值分別在低、中和高相關時,和資料是否呈多變量常態分配下進行模擬,並比較和探討三種提出方法的優劣。

英文摘要 In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses.

This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel
density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method.

Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively.

論文目次 第一章 緒論 1
第二章 文獻回顧 3
2.1 經驗貝氏法(EM演算法) 5
2.2 廣義估計方程(GEE) 7
2.3 RPKM 9
第三章 研究方法 11
第四章 統計模擬 16
第一節 實例分析 16
第二節 模擬程序 18
第三節 模擬結果 21
第五章結論與建議 23
參考文獻 24
附錄 25
參考文獻 1. Benjamini, Y., & Hochberg, Y. (1995). “Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing”, Journal of the Royal Statistical Society, B 57, pp.289-300.
2. Benjamini, Y., Hochberg, Y. (2000). “On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistic”, Journal of Educational and Behavioral Statistics, 25, pp. 60-83.
3. Højsgaard, S., Halekoh, U.,Yan J. (2006). “The R Package geepack for Generalized Estimating Equations”, Journal of Statistical Software, 15,pp.1—11.
4. Liang, K.Y., Zeger, S. L. (1986). “ Longitudinal Data Analysis Using Generalized Linear Models”, Biometrika, 73, pp. 13-22.
5. Ma, M. C., Chao, W. C. (2011). “A Nonparametric Approach of Estimating the Number of True Null Hypotheses in Multiple Testing”, International Statistical Institute, August, Ireland, pp.4669-4674.
6. Ma, M. C., Tsai, C. Y. (2011). “A Nonparametric Approach to Estimate the Number of True Null Hypotheses in Multiple Testing under Dependency”. Master essay of Department of Statistics, NCKU.
7. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B. (2008). “Mapping and quantifying mammalian transcriptomes by RNA-Seq”. Nature method, 5, pp.621-628.
8. Wedderburn, R. W. M. (1974). “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method”. Biometrika,61,pp.439-447.
9. 許乾柚(2008),「利用混合模型估計多重比較中真實虛無假設個數」,國立台北大學統計學系碩士論文。
10. 鄭暘諭(2016),「利用經驗貝氏方法估計錯誤發現率」,國立成功大學統計學系碩士論文。
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2017-08-09起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2017-08-09起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw