||Fast Sequence Alignment Methods Using CUDA-enabled GPU
||Institute of Computer Science and Information Engineering
prefix max scan
prefix max scan
在本篇論文中，我們使用具有CUDA技術的GPU提出了兩個快速序列比對的方法。第一個提出的方法稱為Parallel Smith-Waterman (PSW)，是SW演算法的平行版本。在PSW中，SW演算法的遞迴公式被重新定義以至於矩陣的一列可以被平行的計算，接著prefix max scan的方法被使用來減少一個cell的計算複雜度。除此之外，共享記憶體被使用來減少記憶體存取的時間。第二個提出的方法稱為Common Substring Alignment (CSA)，是一個快速計算全域分數和藉由同時加入gap執行比對的全域比對演算法。然而，我們專注於計算全域比對。兩條序列有許多共同子字串可能使序列相似程度很高。所以基於觀察NW演算法的traceback路徑，CSA對齊共同子字串並且在已對齊的共同子字串間加入間隙，最後全域比對分數可以被簡單的取得藉由匹配兩條序列的每個字母。我們的實驗對於序列長度128~8192展示出下列結果。PSW 44~62倍快於基於CPU的SW演算法製作，1.1~7.1倍快於基於CUDA的SW演算法製作CUDASW++。CSA也44~64倍快於基於CPU的NW演算法製作和1.1~6倍快於基於CUDA的NW演算法。
Sequence alignment is a task that calculates the degree of similarity between two sequences. Finding sequence form database which are the most similar to a given query sequence by sequence alignment is the first step in bioinformatics research. The Needleman-Wunsch (NW) algorithm was first proposed to compute the optimal global alignment by using dynamic programming. NW first computes the alignment scores which are then used to perform the so-called traceback process to complete the alignment by adding gaps. Afterward, Smith-Waterman (SW) algorithm was proposed to compute an optimal local alignment. However, the number of sequences in database increases exponentially every year and the cost of these two algorithms is expensive in terms of running time and memory space to find similar database sequences to a query sequence. Thus, accelerating the sequence alignment algorithms has become the trend in recent years.In this thesis, we propose two fast sequence alignment methods using CUDA-enabled GPUs. The first proposed method called Parallel Smith-Waterman (PSW) is a parallel version of SW algorithm. In PSW, the recursive formula of SW algorithm is redefined so that one row of the matrix can be calculated in parallel. Then, the prefix max scan method is used to reduce the cell computation complexity. In addition, on-chip memory is used for reducing the penalty of memory accesses. The second proposed method called Common Substring Alignment (CSA) is a rapid global alignment algorithm that computes the global score and performs the alignment by adding gaps at the same time. However, we focus on computing the global alignment. Two sequences sharing many common substrings may have a high degree of similarity between them. Therefore, based on the observation of the traceback path in NW algorithm, CSA aligns the common substrings and inserts the gaps between the aligned common substrings. Finally, the global alignment score can be easily obtained by matching all the letters of both sequences. Our experiments for sequences of lengths 128 to 8,192 show the following results. PSW is 44~62 times faster than CPU-based implementation of SW algorithm and 1.1~7.1 times faster than CUDASW++, the most popular CUDA-based implementation of SW algorithm, depending on the length difference between query and database sequences. In term of run times for computing the global alignment score, CSA performs 44~64 times faster than CPU-based implementation of NW algorithm and 1.1~6 times faster than the CUDA-based implementation of NW algorithm.
Chapter 1 Introduction 1
Chapter 2 Related work 6
2.1 Needleman-Wunsch algorithm 7
2.1.1 Computation phase 8
2.1.2 Traceback phase 9
2.1.3 Performance metrics 10
2.2 Smith-Waterman algorithm 12
2.2.1 Computation phase 13
2.2.2 Traceback phase 15
2.3 Accelerated computation 16
2.3.1 The anti-diagonal parallelism mechanism for SW 16
2.3.2 The Vector based parallelism mechanism for SW 17
2.3.3 The combined parallelism mechanism for SW 19
2.4 FASTA algorithm 19
2.5 BLAST algorithm 21
2.5.1 Seed generation 22
2.5.2 Ungapped Extension 24
2.5.3 Gapped extension 25
Chapter 3 CUDA 26
3.1 CUDA programming model 26
3.2 CUDA hardware model 30
Chapter 4 Proposed methods 31
4.1 Parallel Smith-Waterman (PSW) 32
4.1.1 Optimization 34
4.1.2 CUDA implementation 35
4.1.3 Pseudo code 37
4.2 Common Substring Alignment (CSA) 41
4.2.1 Find the common substrings 42
4.2.2 Common substring selection strategy 44
4.2.3 Realignment procedure 49
4.2.4 CUDA implementation 50
4.2.5 Pseudo code 51
Chapter 5 Experimental results 54
5.1 PSW performance 55
5.2 CSA performance 57
Chapter 6 Conclusion 60
Chapter 7 References 61
CUDA zone, http://developer.nvidia.com/category/zone/cuda-zone.
Steven Henikoff and Jorja G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proc. Natl. Acad. Sci. Vol. 89, pp. 10915-10919, 1992.
D.S Hirschberg, “A linear space algorithm for computing maximal common subsequence,” Communication of the ACM, 18(6):341-343, 1975.
S. F. Itschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
Isaac TS Li, Warren Shum and Kevin Truong, “160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA),” BMC Bioinformatics 2007, 8:I85.
Cheng Ling and Khaled Benkrid, “Design and Implementation of a CUDA-Compatible GPU-based Core for Gapped BLAST Algorithm,” ICCS 2010.
D. J. Lipman and W. R. Pearson, “Rapid and Sensitive Protein Similarity Searches,” Science, vol. 227, pp. 1435-1441, 1985.
Yongchao Liu, Bertil Schmidt and Douglas L Maskell, “CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions,” BMC Research Notes 2010, 3:93.
Weiguo Liu, Bertil Schmidt, Gerrit Voss and Wolfgang Muller-Wittig, “Streaming algorithms for biological sequence alignment on GPUs,” IEEE Transactions on Parallel and Distributed Systems 2007, 18(9):1270–1281.
Svetlin A Manavski and Giorgio Valle, “CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment,” BMC Bioinfor- matics 2008, 9(Suppl 2):S10.
Eugene W. Myers and Webb Miller, “Optimal alignments in linear space,” Computer Applications in the Biosciences, vol. 4, no. 1, pp. 11-17, 1988.
S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Molecular Biology, vol. 48, pp. 443-453, 1970.
Timothy F. Oliver, Bertil Schmidt, and Douglas L. Maskell, “Reconfigurable architectures for bio-sequence database scanning on FPGAs,” IEEE Trans. Circuit Syst. II 2005, 52:851–855.
Torbjorn Rognes and Erling Seeberg, “Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors,” BIOINF: Bioinformatics, 16, 2000.
Friman Sanchez, Esther Salami, Alex Ramirez, Mateo Valero, “Parallel processing in biological sequence comparison using general purpose processors,” IEEE International Symposium on Workload Characterization(IISWC), 2005.
S. R. Sathe and D. D. Shrimankar, “Parallelization of DNA Sequence Alignment using Open MP,” ICCCS’11, February 12–14, 2011.
Seq-Gen web site, http://tree.bio.ed.ac.uk/software/seqgen/.
T. F. Smith and M. S. Waterman, “Identification of common molecular subsequence,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
David J. States, Warren Gish, and Stephen F. Altschul, “Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices,” METHODS: A Companion to Methods in Enzymology Vol. 3, No. 1, August, pp. 66-70, 1991.
Adam Szalkowski, Christian Ledergerber, Philipp Krähenbühl and Christophe Dessimoz, “SWPS3–fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2,” BMC Research Notes 2008, I:107.