進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0407201110074000
論文名稱(中文) 兩條基因序列間差異度衡量指標之研究
論文名稱(英文) Statistical Evaluation of DNA Sequence Alignment Based on BLAST and Dissimilarity Measures
校院名稱 成功大學
系所名稱(中) 統計學系碩博士班
系所名稱(英) Department of Statistics
學年度 99
學期 2
出版年 100
研究生(中文) 戴吟芳
研究生(英文) Yin-Fang Dai
學號 R26984073
學位類別 碩士
語文別 英文
論文頁數 65頁
口試委員 指導教授-馬瀰嘉
口試委員-鄭順林
口試委員-洪宗乾
口試委員-劉宗霖
中文關鍵字 基因序列比對  BLAST  精確度  差異度衡量指標 
英文關鍵字 gene sequence alignment  BLAST  accuracy  dissimilarity measures 
學科別分類
中文摘要 在生物資訊學中,目前常用的序列比對工具為NCBI中的BLAST或者是其他基因序列間差異度衡量指標。然而,這些測度量適合的準確性和閾值,到目前為止尚未被研究。在本篇論文中,將提出一個基因序列間差異度的衡量指標,接著再利用ROC曲線下面積來評估各種不同測度量的精確度。透過對上的比例、敏感度與特異度來找尋合適的測度量之閾值。模擬結果顯示,對稱的K-L距離(Symmetric Kullback-Leibler discrepancy)的方法下之序列比對的精確度大於BLAST方法下的精確度,而本篇論文所提出的測度量之精確度也是大於BLAST方法下的精確度和對稱的K-L距離方法不相上下。
英文摘要 In biology, the current methods used for DNA sequence alignment are either NCBI BLAST or dissimilarity measures. However, the cutoff values of these measures are not studied throughout. In this study, a new dissimilarity measurement is proposed. Moreover, the area under ROC curve is provided to assess the accuracy for gene sequence alignment based on BLAST and dissimilarity measures. The hit rate, sensitivity and specificity are used to find the cutoff values. A simulation study was conducted to empirically investigate the accuracy of the proposed procedure. The simulation results show that the accuracy of gene sequence alignment based on Symmetric Kullback-Leibler discrepancy approach is larger than the accuracy based on BLAST. Besides, the accuracy of gene sequence alignment based on proposed method is also larger than the accuracy based on BLAST.
論文目次 Chapter 1 Introduction 1
Chapter 2 Literature Review 6
2.1 Sequence alignment 6
2.2 Sequence alignment software:BLAST 7
2.3 Sequence dissimilarity based on BLAST 9
2.4 Sequence dissimilarity based on SK-LD 11
Chapter 3 Proposed Methods 17
3.1 Proposed dissimilarity measures 17
3.2 Apply ROC curve to compare the accuracy 18
3.3 Search cut-off value of sequence dissimilarity based on SK-LD 24
Chapter 4 Simulation Study 27
4.1 Simulation process 27
4.2 Simulation result 30
4.2.1 Accuracy of ROC curve 30
4.2.2 The cut-off value of based on SK-LD 38
Chapter 5 Conclusions and Further Research 47
References 48
Appendix 50

參考文獻 1.Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lip man, D. J. (1990), “Basic local alignment search tool”. Journal of Molecular Biology, 215, 403-410.
2.Altschul, S.F., and Gish, W. (1996), “Local alignment statistics”. Methods Enzymolv, 266, 460–480.
3.Bamber, D. (1975), “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph”. Journal of Mathematical Psychology, 12, 387-415.
4.Dembo, A., and Karlin, S. (1991), “Strong limit theorems of empirical functionals for large exceedances of partial sums of i.i.d variables”. Ann. Prob.. 19, 1737–1755.
5.Dembo, A., Karlin, S., and Zeitouni, O. (1994a), “Critical phenomena for sequence matching with scoring”. Ann. Prob., 22, 1993–2021.
6.Dembo, A., Karlin, S., and Zeitouni, O. (1994b), “Limit distribution of maximal non-aligned two-sequence segmental score”. Ann. Prob., 22, 2022–2039.
7.Frith, M.C., Hansen, U., Sponge, J.L., and Weng, Z. (2004), “ Finding functional sequence elements by multiple local alignment.” Nuclei Acids Research, 32, 189-200.
8.Holt, R.A. and Jones, S.J. (2008), “The new paradigm of flow cell sequencing ”. Genome research, 18 (6):839.
9.Karlin, S., and Altschul, S.F. (1990), “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes”. Proceedings of the National Academy of Science, U.S.A., 87, 2264–2268.
10.Lee, S.Y and Chuang, Y.K. (2010), “The Evolution and Development of DNA Sequence Technology ” . J Biomed Lab Sci, 22, 2.
11.Needleman, S. B. and Wunsch, C. D. (1970), “A general method applicable to the search for similarities in the amino acid sequence of two proteins ”. Journal of Molecular Evolution, 48, 443–453.
12.Pearson,W.R. and Lipman,D.J. (1988), “ Improved tools for biological sequence comparison”. Proc. Proceedings of the National Academy of Science, U.S.A., 85, 2444–2448.
13.Pearson,W.R. (1990), “Rapid and sensitive sequence comparison with FASTA and
FASTP”. Methods Enzymol., 183, 63–98.
14.Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, C.A., Hutchison, C.A., Slocombe, P.M. and Smith, M. (1977), “Nucleotide sequence of bacteriophage phi X174 DNA”. Nature ,265 (5596): 687–95.
15.Smith, T. F., Waterman, M. S. and Fitch, W. M. (1981), “ Comparative biosequence metrics”. Journal of Molecular Evolution, 18, 38–46.
16.Smith, T.F., Waterman, M.S., and Burks, C. (1985), “The statistical distribution of nucleic acid similarities”. Nuclei Acids Research, 13, 645–656.
17.Tucker, T., Marra, M. and Friedman, J.M. (2009), “Massively parallel sequencing: the next big thing in genetic medicine”. The American Journal of Human Genetics, 85(2):142–154
18.Waterman, M.S., and Vingron, M. (1994), “Rapid and accurate estimates of statistical significance for sequence data basesearches”. Proc. Natl. Acad. Sci USA, 91, 4625–4628.
19.Wu, T. J., Hsieh, Y. C. and Li, L. A. (2001), “Statistical Measures of DNA Sequences Dissimilarity under Markov Chain Models of Base Composition”. Biometrics, 57, 441-448.
20.Wu, T.J., Huang, Y.H. and Li, L.A. (2005), “Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences” .
Biometrics, 21 (22): 4125–4132.
21.Zhang, Z., Schwartz, S., Wagner, L. and Miller, W. (2000), “A Greedy Algorithm for Aligning DNA Sequences”. Journal of Computational Biology, 7(1-2): 203-214.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-12-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw