進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1608201614534600
論文名稱(中文) 基於硬體運算的生物序列比對之研究
論文名稱(英文) Investigation of Sequence Alignment by Using Hardware Operation
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 104
學期 2
出版年 105
研究生(中文) 余政修
研究生(英文) Zheng-Xiu Yu
學號 N96031415
學位類別 碩士
語文別 中文
論文頁數 51頁
口試委員 指導教授-黃吉川
口試委員-林振森
口試委員-廖德祿
中文關鍵字 序列比對  布隆過濾器  硬體運算  動態規劃演算法 
英文關鍵字 sequence alignment  bloom filter  hardware operation  dynamic programming 
學科別分類
中文摘要 由於生物技術的快速進步,DNA序列定序所需時間不斷縮短以及各種生物基因體計畫的進行,越來越多的生物序列資料被建構成資料庫,DNA序列資料庫中含有的序列數量已高達數百億對,如何在此龐大的序列資料庫進行正確而快速的搜尋是重要課題之一。
序列比對是生物資訊研究中最重要的研究工具,它可以比較及分析出兩條或多條序列之間的相似度,在序列比對中,兩條序列通常長度都是相當長,因此所需的處理時間相當耗時。許多演算法被研究出來降低處理時間。
本論文旨在提出基於硬體運算模擬實現了一套序列比對的資料處理流程。首先在參考序列上,利用hash函式建立布隆過濾器儲存鹼基,利用候選位置產生演算法找出最佳候選位置,減少查找過程中的比對,在使用動態規劃演算法的遞回函式,計算相似度分數與尋找最高對齊方式,完成序列比對。
英文摘要 Due to the rapid advances in biotechnology, more and more biological sequence is built in database, in which there are tens of billions pairs of DNA sequence in the library. Therefore it is an important issue to properly and quickly search this much sequence in database. In bioinformatics study sequence alignment is the most important research tool, which can compare and analyze the similarity between two or more sequences in the sequence alignment. The lengths of the two sequences are usually quite long, so the processing takes much time. Many algorithms have been found out to reduce processing time. In the thesis, we focus on constructing data processing by performing the hardware-based computing simulation. First, the reference sequence uses the hash function to establish Bloom filter for the storage of bases. Second, candidate position is used to generate algorithm to find the best candidate for the position, which can reduce the process. Finally, dynamic programming algorithm is used to compute the similarity scores and find the highest alignment to complete sequence alignment.
論文目次 中文摘要 I
Abstract II
誌謝 VII
目錄 IX
圖目錄 XII
第一章 緒論 1
1-1 研究背景 1
1-2 研究動機 4
1-3 研究目的 7
1-4 本文架構 10
第二章 序列比對相關研究 11
2-1 編輯距離 11
2-2 Sequence Alignment 13
2-3 演算法種類 14
2-4 布洛斯-惠勒轉換演算法 16
2-4-1 後綴陣列 16
2-4-2 布洛斯-惠勒轉換編解碼 19
第三章 最佳路徑分析與候選位置對應產生 22
3-1 求解最佳路徑 22
3-1-1 動態規劃演算法 22
3-1-2 最佳路徑分析 26
3-2 索引表及候選位置對應演算法設計 29
3-2-1 標準的布隆過濾器 29
3-2-2 最佳化hash函式個數 31
3-2-3 布隆過濾器建立 33
3-2-4 候選位置對應演算法 36
第四章 實驗結果 39
4-1 模擬流程 39
4-2 模擬之結果 40
第五章 結論與未來展望 46
5-1 結論 46
5-2 未來展望 46
參考文獻 48
參考文獻 參考文獻
[1] F. Baluška, D. Volkmann, and P. W. Barlow, "Cell bodies in a cage," Nature, vol. 428, pp. 371-371, 2004.
[2] D. Keilin, "The Leeuwenhoek Lecture: The problem of anabiosis or latent life: History and current concept," Proceedings of the Royal Society of London. Series B, Biological Sciences, vol. 150, pp. 149-191, 1959.
[3] I. K. Vasil, "A history of plant biotechnology: from the cell theory of Schleiden and Schwann to biotech crops," Plant cell reports, vol. 27, pp. 1423-1440, 2008.
[4] F. H. Crick, "The biological replication of macromolecules," in Symp. Soc. Exp. Biol, 1958, pp. 138-163.
[5] 张春霆, "生物信息学的现状与展望," 世界科技研究与发展, vol. 22, pp. 17-20, 2000.
[6] F. Sanger, S. Nicklen, and A. R. Coulson, "DNA sequencing with chain-terminating inhibitors," Proceedings of the National Academy of Sciences, vol. 74, pp. 5463-5467, 1977.
[7] A. M. Maxam and W. Gilbert, "A new method for sequencing DNA," Proceedings of the National Academy of Sciences, vol. 74, pp. 560-564, 1977.
[8] 解增言, 林俊华, 谭军, and 舒坤贤, "DNA 测序技术的发展历史与最新进展," 生物技术通报, vol. 8, pp. 64-70, 2010.
[9] E. R. Mardis, "The impact of next-generation sequencing technology on genetics," Trends in genetics, vol. 24, pp. 133-141, 2008.
[10] 权威 and 王亚东, "基于新一代测序数据的比对算法的研究," 智能计算机与应用, vol. 2, pp. 39-41, 2012.
[11] A. J. Gibbs and G. A. McIntyre, "The diagram, a method for comparing sequences," European Journal of Biochemistry, vol. 16, pp. 1-11, 1970.
[12] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," Journal of molecular biology, vol. 215, pp. 403-410, 1990.
[13] D. J. Lipman and W. R. Pearson, "Rapid and sensitive protein similarity searches," Science, vol. 227, pp. 1435-1441, 1985.
[14] A. Apostolico and Z. Galil, "Pattern matching algorithms," 1997.
[15] G. A. Stephen, String searching algorithms vol. 3: World Scientific, 1994.
[16] P. R. Gray, P. Hurst, R. G. Meyer, and S. Lewis, Analysis and design of analog integrated circuits: Wiley, 2001.
[17] C. Mead and L. Conway, Introduction to VLSI systems vol. 1080: Addison-Wesley Reading, MA, 1980.
[18] W. J. Masek and M. S. Paterson, "A faster algorithm computing string edit distances," Journal of Computer and System sciences, vol. 20, pp. 18-31, 1980.
[19] V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," in Soviet physics doklady, 1966, pp. 707-710.
[20] J. Setubal and J. Meidanis, "Sequence comparison and database search," Introduction to computational molecular biology, 1997.
[21] T. F. Smith and M. S. Waterman, "Comparison of biosequences," Advances in applied mathematics, vol. 2, pp. 482-489, 1981.
[22] X. Huang and W. Miller, "A time-efficient, linear-space local similarity algorithm," Advances in Applied Mathematics, vol. 12, pp. 337-357, 1991.
[23] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic acids research, vol. 25, pp. 3389-3402, 1997.
[24] S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of molecular biology, vol. 48, pp. 443-453, 1970.
[25] O. Gotoh, "An improved algorithm for matching biological sequences," Journal of molecular biology, vol. 162, pp. 705-708, 1982.
[26] P. Klus, S. Lam, D. Lyberg, M. S. Cheung, G. Pullan, I. McFarlane, et al., "BarraCUDA-a fast short read sequence aligner using graphics processing units," BMC research notes, vol. 5, p. 27, 2012.
[27] H. Li and R. Durbin, "Fast and accurate short read alignment with Burrows–Wheeler transform," Bioinformatics, vol. 25, pp. 1754-1760, 2009.
[28] H. Li and R. Durbin, "Fast and accurate long-read alignment with Burrows–Wheeler transform," Bioinformatics, vol. 26, pp. 589-595, 2010.
[29] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome," Genome biol, vol. 10, p. R25, 2009.
[30] R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, et al., "SOAP2: an improved ultrafast tool for short read alignment," Bioinformatics, vol. 25, pp. 1966-1967, 2009.
[31] H. Li and N. Homer, "A survey of sequence alignment algorithms for next-generation sequencing," Briefings in bioinformatics, vol. 11, pp. 473-483, 2010.
[32] W. R. Pearson and W. Miller, "[27] Dynamic programming algorithms for biological sequence comparison," Methods in enzymology, vol. 210, pp. 575-601, 1992.
[33] E. Vermij, "Genetic sequence alignment on a supercomputing platform," TU Delft, Delft University of Technology, 2011.
[34] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, vol. 13, pp. 422-426, 1970.
[35] Y. Huayun and G. Jihong, "Bloom Filter 研究进展," 电信科学, vol. 26, pp. 31-36, 2010.
[36] A. Rajaraman, J. D. Ullman, and 王斌, "大数据: 互联网大规模数据挖掘与分布式处理," ed: 北京: 人民邮电出版社, 2012.
[37] M. Mitzenmacher, "Compressed bloom filters," IEEE/ACM Transactions on Networking (TON), vol. 10, pp. 604-612, 2002.
[38] 張柏毅, "以短基因片段重組基因組之演算法設計與硬體實作," 臺灣大學電子工程學研究所學位論文, pp. 1-93, 2012.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-11-08起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw