進階搜尋


 
系統識別號 U0026-0205201911321900
論文名稱(中文) 使用卷積神經網路預測小分子核糖核酸目標基因於非結合位點序列
論文名稱(英文) Using Convolutional Neural Network to Predict MicroRNA Target without Prior Site Filtering
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 107
學期 1
出版年 108
研究生(中文) 林易瑩
研究生(英文) Yi-Ying Lin
學號 N26060151
學位類別 碩士
語文別 中文
論文頁數 45頁
口試委員 指導教授-張天豪
口試委員-李家岩
口試委員-吳謂勝
口試委員-劉宗霖
口試委員-解巽評
中文關鍵字 卷積神經網路  深度學習  小分子核糖核酸  信使核糖核酸 
英文關鍵字 convolutional neural network  deep learning  microRNA  mRNA 
學科別分類
中文摘要 小分子核糖核酸為生物體內微小的非編碼核糖核酸,其透過與信使核糖核酸的結合來抑制該信使核糖核酸的轉譯表現,而這些結合的位點大多數存在於信使核糖核酸的3’非轉譯區。機器學習等生物資訊的方法在小分子核糖核酸的目標基因預測上,扮演了很重要的角色,過去多數的機器學習預測方式都圍繞著種子區域來開發特徵,但此種特徵很容易被拘束在人類對於生物機制的理解。隨著近年深度學習的發展,越來越多人將深度學習應用於生物資訊的領域,讓神經網路模型自行學習序列中結合的機制。
本研究提出了一個基於卷積神經網路的深度學習模型,對於小分子核糖核酸的目標基因做預測,所提出的模型使用小分子核糖核酸、信使核糖核酸序列作為輸入,本研究除了不使用任何生物特徵作為模型輸入外,所使用的小分子核糖核酸、信使核糖核酸序列也都未經位點篩選,以降低人為對模型的干擾。在本研究中我們與幾個現行研究做比較,包含使用機器學習搭配生物特徵的研究與使用深度學習但其輸入序列有經過位點篩選的研究,而本研究之方法最終取得最好的曲線下面積以及F度量分數。為了深入探討對序列做位點篩選之於神經網路模型的影響,本研究自所提出的模型抽取出位點資訊,並與其他方法所提之位點做比較,而本研究的方法仍取得更好的效能,證明了卷積神經網路在生物序列特徵擷取上比人工篩選還要優良。
英文摘要 microRNAs are small non-coding RNAs that regulate the post transcriptional gene expression by base-pairing with mRNA sequences. Most of the binding sites currently known can be found within the 3’ UTR. Machine learning and bioinformatics methods have play an important role in microRNA target prediction problem. Most previous machine learning methods are based on features developed according to seed region characteristics or other biological rules. But these hand-crafted features may be limited to human understanding on microRNA binding mechanisms. In recent years, more and more studies have adopted deep learning methods since it has the ability to extract meaningful features from raw sequences automatically.
This work proposed a convolutional neural network model to predict microRNA binding targets. In order to minimize human interferences, the proposed model takes raw microRNA sequences and mRNA sequences as inputs with no hand-crafted features or biological filtering involved. When compared to other machine learning and deep learning methods, the proposed method demonstrates a significant improvement in area under ROC curve and F score. Furthermore, we investigated the effect of different site filtering methods on model performance by retrieving the potential sites from the proposed model. In particular, this work has shown that (i) neural network model works better on raw gene sequences than filtered sequence sites and (ii) the proposed method outperformed other state-of-the-art machine learning and deep learning microRNA target prediction methods.
論文目次 目錄
圖目錄 XII
表目錄 XIII
第一章 緒論 1
第二章 相關研究 4
2.1 小分子核糖核酸 ( microRNA ) 4
2.2 小分子核糖核酸預測研究 4
2.2.1 mirMark 4
2.2.2 Mita 5
2.2.3 deepTarget 6
2.2.4 miRAW 7
2.3 卷積神經網路 ( Convolutional Neural Network , CNN) 8
2.3.1 卷積層 ( Convolutional Layer) 9
2.3.2 最大池化層 ( Max Pooling Layer ) 10
2.3.3 全域平均池化層 ( Global Average Pooling Layer ) 11
2.3.4 全連接層 ( Fully Connected Layer, FC ) 12
第三章 研究方法 13
3.1 資料集 13
3.1.1 正資料集 13
3.1.2 負資料集 14
3.2 資料前處理 15
3.3 資料編碼 16
3.4 模型訓練與驗證流程 18
3.5 神經網路模型 19
第四章 研究結果 22
4.1 評估標準 22
4.2 與其他方法之比較 23
4.3 結合位點篩選與模型架構對效能的影響 25
4.4 卷積核大小、池化大小與擷取參數對效能的影響 29
4.5 miRNA結合位點視覺化 31
第五章 結論 35
5.1 結論 35
5.2 未來展望 35
參考文獻 36
附錄 40
參考文獻 1. Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. cell 2005, 120(1):15-20.
2. Ardekani AM, Naeini MM: The role of microRNAs in human diseases. Avicenna journal of medical biotechnology 2010, 2(4):161.
3. Li Y, Kowdley KV: MicroRNAs in common human diseases. Genomics, proteomics & bioinformatics 2012, 10(5):246-253.
4. Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA–target recognition. PLoS biology 2005, 3(3):e85.
5. Seok H, Ham J, Jang E-S, Chi SW: MicroRNA target recognition: insights from transcriptome-wide non-canonical interactions. Molecules and cells 2016, 39(5):375.
6. Agarwal V, Bell GW, Nam J-W, Bartel DP: Predicting effective microRNA target sites in mammalian mRNAs. elife 2015, 4:e05005.
7. Menor M, Ching T, Zhu X, Garmire D, Garmire LX: mirMark: a site-level and UTR-level classifier for miRNA target prediction. Genome biology 2014, 15(10):500.
8. Bandyopadhyay S, Mitra R: TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples. Bioinformatics 2009, 25(20):2625-2631.
9. 朱柏勳: 基於機器學習方法之微型核糖核酸目標基因預測. 國立成功大學; 2017.
10. Schirle NT, Sheu-Gruttadauria J, MacRae IJ: Structural basis for microRNA targeting. Science 2014, 346(6209):608-613.
11. Moore MJ, Scheel TK, Luna JM, Park CY, Fak JJ, Nishiuchi E, Rice CM, Darnell RB: miRNA–target chimeras reveal miRNA 3′-end pairing as a major determinant of Argonaute target specificity. Nature communications 2015, 6:8864.
12. Broughton JP, Lovci MT, Huang JL, Yeo GW, Pasquinelli AE: Pairing beyond the seed supports microRNA targeting specificity. Molecular cell 2016, 64(2):320-333.
13. Min S, Lee B, Yoon S: Deep learning in bioinformatics. Briefings in bioinformatics 2017, 18(5):851-869.
14. Quinlan JR: Induction of decision trees. Machine learning 1986, 1(1):81-106.
15. Cortes C, Vapnik V: Support-vector networks. Machine learning 1995, 20(3):273-297.
16. Lee B, Baek J, Park S, Yoon S: deepTarget: end-to-end learning framework for microRNA target prediction using deep recurrent neural networks. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics: 2016. ACM: 434-442.
17. Pla A, Zhong X, Rayner S: miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts. PLoS computational biology 2018, 14(7):e1006185.
18. Bengio Y, Courville A, Vincent P: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 2013, 35(8):1798-1828.
19. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 2016. 2921-2929.
20. Lee RC, Feinbaum RL, Ambros V: The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. cell 1993, 75(5):843-854.
21. Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T: miRecords: an integrated resource for microRNA–target interactions. Nucleic acids research 2008, 37(suppl_1):D105-D110.
22. Hsu S-D, Lin F-M, Wu W-Y, Liang C, Huang W-C, Chan W-L, Tsai W-T, Chen G-Z, Lee C-J, Chiu C-M: miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic acids research 2010, 39(suppl_1):D163-D169.
23. Breiman L: Random forests. Machine learning 2001, 45(1):5-32.
24. John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human microRNA targets. PLoS biology 2004, 2(11):e363.
25. Liu H, Yue D, Chen Y, Gao S-J, Huang Y: Improving performance of mammalian microRNA target prediction. BMC bioinformatics 2010, 11(1):476.
26. Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E: The role of site accessibility in microRNA target recognition. Nature genetics 2007, 39(10):1278.
27. Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, Anastasopoulos I-L, Maniou S, Karathanou K, Kalfakakou D: DIANA-TarBase v7. 0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic acids research 2014, 43(D1):D153-D159.
28. Oyang Y-J, Hwang S-C, Ou Y-Y, Chen C-Y, Chen Z-W: Data classification with radial basis function networks based on a novel kernel density estimation algorithm. IEEE transactions on neural networks 2005, 16(1):225-236.
29. Lorenz R, Bernhart SH, Zu Siederdissen CH, Tafer H, Flamm C, Stadler PF, Hofacker IL: ViennaRNA Package 2.0. Algorithms for Molecular Biology 2011, 6(1):26.
30. Grosswendt S, Filipchyk A, Manzano M, Klironomos F, Schilling M, Herzog M, Gottwein E, Rajewsky N: Unambiguous identification of miRNA: target site interactions by different types of ligation reactions. Molecular cell 2014, 54(6):1042-1054.
31. Helwak A, Kudla G, Dudnakova T, Tollervey D: Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 2013, 153(3):654-665.
32. Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, Filippidis C, Dalamagas T, Hatzigeorgiou AG: DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic acids research 2013, 41(W1):W169-W173.
33. LeCun Y, Bottou L, Bengio Y, Haffner P: Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998, 86(11):2278-2324.
34. Yin W, Schütze H, Xiang B, Zhou B: Abcnn: Attention-based convolutional neural network for modeling sentence pairs. arXiv preprint arXiv:151205193 2015.
35. Johnson R, Zhang T: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:14121058 2014.
36. Aoki G, Sakakibara Y: Convolutional neural networks for classification of alignments of non-coding RNA sequences. Bioinformatics 2018, 34(13):i237-i244.
37. Nguyen NG, Tran VA, Ngo DL, Phan D, Lumbanraja FR, Faisal MR, Abapihi B, Kubo M, Satou K: DNA sequence classification by convolutional neural network. Journal of Biomedical Science and Engineering 2016, 9(05):280.
38. Lin M, Chen Q, Yan S: Network in network. arXiv preprint arXiv:13124400 2013.
39. Kozomara A, Griffiths-Jones S: miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic acids research 2013, 42(D1):D68-D73.
40. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic acids research 2004, 32(suppl_1):D493-D496.
41. Maziere P, Enright AJ: Prediction of microRNA targets. Drug discovery today 2007, 12(11-12):452-458.
42. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P: Natural language processing (almost) from scratch. Journal of Machine Learning Research 2011, 12(Aug):2493-2537.
43. Mikolov T, Chen K, Corrado G, Dean J: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 2013.
44. Manning CD: Computational linguistics and deep learning. Computational Linguistics 2015, 41(4):701-707.
45. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R: Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 2014, 15(1):1929-1958.
46. Gal Y, Ghahramani Z: Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv preprint arXiv:150602158 2015.
47. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014.
48. He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2016. 770-778.
49. Nair V, Hinton GE: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10): 2010. 807-814.
50. Kingma DP, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
51. Blier L, Wolinski P, Ollivier Y: Learning with Random Learning Rates. arXiv preprint arXiv:181001322 2018.
52. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M: Tensorflow: a system for large-scale machine learning. In: OSDI: 2016. 265-283.
53. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A: Object detectors emerge in deep scene cnns. arXiv preprint arXiv:14126856 2014.
54. Grimson A, Farh KK-H, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular cell 2007, 27(1):91-105.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-06-18起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2019-06-18起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw