進階搜尋


下載電子全文  
系統識別號 U0026-0408201415224900
論文名稱(中文) 在分支因子和巢型因子的高斯過程模型下研究次序代基因組裝參數調整
論文名稱(英文) Performance Tuning of Nest-generation Sequencing Assembly via Gaussian Process Model with Branching and Nested Factors
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 102
學期 2
出版年 103
研究生(中文) 陳新杰
研究生(英文) Xin-Jie Chen
學號 R26013018
學位類別 碩士
語文別 英文
論文頁數 54頁
口試委員 指導教授-陳瑞彬
口試委員-鄭順林
口試委員-張源俊
中文關鍵字 高斯過程模型  分支拉丁方格設計  期望函數  次序代基因重組  電腦實驗 
英文關鍵字 Gaussian Process model  Branching Latin hypercube design  Next-generation sequecing  De novo assembly  Optimization 
學科別分類
中文摘要 對於次序代基因重組,組裝工具以及其對應的參數選擇對於組裝的品質有很大的影響。在本篇研究中,三種組裝工具: Velvet, SOAPdenovo, ABySS被考慮為分支因子,對應於各工具所特有的參數被當作巢狀因子,以及各工具共有的參數被當作是共享因子。因為基因的組裝是通過電腦模擬重組的,這個選擇問題就成了電腦實驗上對於於上述參數的最優解問題

在這篇研究中,我們提出了一個二階段式的過程來處理這個最優解問題。 首先,我們應用分支拉丁設計探索反應曲面的大致情況,在第二階段,高斯過程模型被應用於建構反應曲面, 然後根據最大化期望函數準則選擇下一個實驗點直到停止的條件滿足為止。在模擬方面,這個二階段過程的表現相當不錯。同時,當應用在實際資料的時候,這個過程迅速的選擇了一個反應值比較大的區域,最終選擇到的實驗點的值也比其他的方法好。
英文摘要 For de novo assembly of next-generation sequencing data, the selection of assembly tool and the corresponding parameters have a great effect on the quality. The tool is treated as the branching factor. Three tools: Velvet, SOAPdenovo and ABySS are considered which are regarded as the levels of the
branching factor. And the parameters which are special for each tools are treated as nested factors. Besides, the parameters shared by all tools are regarded as shared factor. In this study, we want to choose the tool and corresponding parameters that optimize the quality under limited resource.

Because the de novo assembly is simulated with computer, the selection becomes the optimization problem of computer experiment with respect to the factors. We propose a sequential procedure to choose the assembly tool and the corresponding optimal parameters simultaneously. Firstly, we apply the Branching Latin hypercube design to explore the
response surface. Secondly, Gaussian Process model is applied to construct the response surface and select the next experiment point by maximizing the Expected Improvement function until the stopping criterion is meet. The performance of the numerical simulation seems well. The implementation of real data can search into the region with larger value of response quickly and access to a better
experiment point compared to other methods.
論文目次 1 Introduction 1
1.1 Background and Motivation 1
1.2 Data Description 2
1.3 Literature Reviews 2
1.4 Overview 4
2 Branching Latin Hypercube Design 6
2.1 Branching Latin Hypercube Design 6
2.2 Maximin Branching Latin Hypercube Design 8
3 Gaussian Process Model and Expected Improvement Function 10
3.1 Positive Definite Correlation Matrix 10
3.2 Parameter Estimation and Prediction 11
3.3 Expected Improvement Criterion 13
3.4 Other Models 14
3.4.1 Independent Model 14
3.4.2 BQ Model 15
3.4.2.1 Singularity Problem of BQ Model 15
3.4.3 QQ Model 16
4 Simulation 18
4.1 Case 1 with Same Marginal Effect 19
I
4.1.1 Sequential Procedure for Case 1 20
4.1.2 Comparison with Other Correlation Structure 22
4.2 Case 2 26
4.3 Case 3 28
4.4 Case 4 30
4.5 Case 5 33
4.6 Case 6 34
4.6.1 Sequential Procedure for Case 6 34
4.6.2 Comparison with Other Methods 36
5 De novo Assembly 41
6 Conclusion and Future Work 51
6.1 Future Work 51
Bibliography 53
參考文獻 M.M. Allan and G.Walter. A new method for sequencing dna. Biochemistry, 74(2):560–564, Feb. 1977.

R.Z. Daniel and B. Ewan. Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18:821–829, 2008.

R.J. Donald, S. Matthias, and William J. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998.

Alexander. Forrester, Andras. Spbester, and Andy. Keane. Engineering Design via Surrogate Model. John Wiley and Sons Ltd., 51-59, 2008.

Y. Hung, R Joseph, and S.N. Melkote. Design and analysis of computer experiments with branching and nested factors. Technometrics, 51(4):354–365, 2009.

S.-L. Jeng, Y.-H. Wu, and T.-L. Liu. Improving de novo assembly by preprocessing the next-generation sequencing data. Journal of the Chinese Statistical Association, 51:352–372, 2013.

R.Q. Li, H.M. Zhu, J. Ruan,W.B. Qian, Fang X.D., ZH.B. Shi, Y.R. Li, SH.T. Li, G. Shan, K. Kristiansen, S.G. Li, H.M. Yang, J. Wang, and J Wang. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20:265–272, 2009.

Y. Lin, J. Li, H. Shen, L. Zhang, C.J. Papasian, and H.W. Deng. Comparative studies of de novo assembly tools for next-generation sequencing technologies.Bioinformatics, 27:2031–2037, June 2011.

M.d Morris and T.J Mithcell. Exploratory design for computational experiments. Journal of Statistical Planning and Inference, 43(3):381–402, Feb. 1995.

Z.G. Qian, H.Q. Wu, and C.F. Wu. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50 (3):383–396, 2008.

J. Sacks, W.J. Welch, T.J. Mitchell, and H.P. Wynn. Design and analysis of computer experiments. Statistical Science, 4(4):409–423, Nov. 1989.

T.J. Simpson, K. Wong, D.S. Jackman, E.J Schein, S.J. Jones, and I. Birol. Abyss: A parallel assembler for short read sequence data. Genome Research, 19:1117–1123, 2009.

G Taguchi. System of Experimental Design. White Plans, New York.,279- 310, 1987.

W.J. Welch, R. J. Buch, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D. Morris. Screening, predicting and computer experiments. Technometries, 34(4): 15–25, 1992.

R Wu. Nucleotide sequence analysis i. partical seqence of the cohesive ends of bacteriphage and 186 dna. Journal of Molecular Biology, 41(3):501– 521, Aug. 1970.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-08-13起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2019-08-13起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw