進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1207201717494400
論文名稱(中文) 基於機器學習方法之微型核糖核酸目標基因預測
論文名稱(英文) Machine Learning Based MicroRNA Target Prediction
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 105
學期 2
出版年 106
研究生(中文) 朱柏勳
研究生(英文) Po-Hsun Chu
學號 N26040290
學位類別 碩士
語文別 中文
論文頁數 36頁
口試委員 指導教授-張天豪
口試委員-劉宗霖
口試委員-黃仁暐
口試委員-解巽評
中文關鍵字 機器學習  微型核糖核酸  信息核糖核酸 
英文關鍵字 machine learning  microRNA  mRNA 
學科別分類
中文摘要 辨別微型核糖核酸(microRNA)結合的目標基因是研究基因抑制作用的基礎。現今已經有很多預測器,而基於機器學習的預測器更是大幅的提升了預測的效能。對於基於機器學習的預測器來說,負資料集的使用卻仍是一個困難的議題,由於並沒有專門在辨認非目標基因的系統,所以目前基於機器學習方法的的預測器多半是使用自己產生的負資料集作為訓練資料集,而不同生成的方法也會對機器學習演算法帶來不同的效果。機器學習的另外一個要點即是特徵的使用,在本論文中我們將使用一般經驗法則下有用的特徵如互補的種子區域(seed matching region)、結合體熱力穩定性(thermodynamic stability of duplex)…等,並且加入一些新型特徵(de novo feature),如序列模式特徵-兩兩核苷酸(Bigram)以及本研究所提出使用的三三核苷酸(Trigram)特徵。而由於機器學習演算法建立的模型通常較為複雜,人類通常無法直接解釋模型學習到什麼,因此我們使用規則提取的演算法從預測器中提取出基於經驗法則特徵以及新型特徵的規則。
在本研究中我們與幾個現行的預測器比較,取得了很高的ROC AUC分數,其中分析了不同製作方法的負資料集所帶來的影響,並且根據不同的狀況,我們提出一個如何準備負資料集的方法。在機器學習的架構上,為了讓結合的判定更加嚴苛,我們結合了多種不同性質的機器學習演算法,並且使用調和平均數對所有演算法的結果進行平均,藉以得到更穩健的預測。
英文摘要 Identifying mircroRNA binding target is important for studying gene regulation. There are many existing target prediction tools, and the predictors which are based on machine learning algorithm improve performance a lot. An issue in machine learning-based predictor is the negative dataset. Because there is no systematic method to collect negative dataset (non-binding miRNA-mRNA pair), each work will produce their negative dataset. Different generation methods of negative dataset will take different effect on machine learning algorithm. Another important thing on machine learning is feature engineering. This work uses some empirical features such as seed matching type, thermodynamic stability of duplex, accessibility, site location, multiplicity of binding site in previous works and the de novo features (unigram, bigram, trigram) which this work proposed. The last issue is that machine learning algorithm is too complicate for human to interpret what knowledge the machine has learned. Thus, we applied the rule induction algorithm to extract rules which are based on de novo features and empirical features from our model.
In this work, we proposed the harmonic model and got a higher performance than other tools on ROC AUC. In order to make the determination of the miRNA-mRNA binding more stringent, harmonic model aggregates three algorithms with harmonic mean. By many experiments, we provided a guideline about how to prepare the negative dataset in different situations.
論文目次 第一章 緒論 1
第二章 相關研究 3
2.1 微型核糖核酸(microRNA) 3
2.2 微型核糖核酸與信使核糖核酸的交互作用 3
2.2.1 互補的種子區域(Seed Region Complementarity) 3
2.2.2 結合體熱力穩定性(Duplex Thermodynamic Stability) 4
2.2.3 區段可鏈結性(Site Accessibility) 4
2.2.4 序列位置特性(Site Location) 4
2.2.5 多重結合特性(Multiplicity of Binding Sites) 4
2.3 現行的相關研究 4
2.3.1 mirMark 5
2.3.2 TargetMiner 5
2.3.3 miRmap 6
2.3.4 miRanda 6
第三章 研究方法 7
3.1 資料集(Dataset) 7
3.1.1 正資料集(Positive Dataset) 7
3.1.2 負資料集(Negative Dataset) 7
3.2 特徵生成(Feature Generation) 9
3.2.1 目標辨識(Target Recognition) 9
3.2.2 特徵提取(Feature Extraction) 10
3.3 特徵選取(Feature Selection) 11
3.3.1 信息增益算法(Information Gain) 11
3.3.2 前饋式決策樹選取(Decision Tree Forward Selection) 12
3.3.3 變異數(Variance) 13
3.3.4 特徵選取流程 13
3.4 建立預測模型 14
3.4.1 支持向量機(Support Vector Machine) 14
3.4.2 隨機森林(Random Forest) 16
3.4.3 可變核密度估計(Relaxed Variable Kernel Density Estimation, RVKDE) 16
3.4.4 調和平均數(Harmonic Mean)與混合模型(Hybrid Model) 17
3.5 模型之規則提取(Rule Extraction) 18
第四章 研究結果 21
4.1 特徵選擇之結果 21
4.2 現行的預測工具之比較與探討 22
4.3 規則提取(Rule Extraction) 28
第五章 結論與未來展望 32
5.1 結論 32
5.2 未來展望 32
附錄 33
參考文獻 35
參考文獻 1. Bandyopadhyay S, Mitra R (2009) TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples. Bioinformatics 25: 2625-2631.

2. Lekprasert P, Mayhew M, Ohler U (2011) Assessing the utility of thermodynamic features for microRNA target prediction under relaxed seed and no conservation requirements. PLoS One 6: e20622.

3. Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007) The role of site accessibility in microRNA target recognition. Nature genetics 39: 1278-1284.

4. Miranda KC, Huynh T, Tay Y, Ang Y-S, Tam W-L, et al. (2006) A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126: 1203-1217.

5. Grimson A, Farh KK-H, Johnston WK, Garrett-Engele P, Lim LP, et al. (2007) MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular cell 27: 91-105.

6. Doench JG, Sharp PA (2004) Specificity of microRNA target selection in translational repression. Genes & development 18: 504-511.

7. John B, Enright AJ, Aravin A, Tuschl T, Sander C, et al. (2004) Human microRNA targets. PLoS Biol 2: e363.

8. Menor M, Ching T, Zhu X, Garmire D, Garmire LX (2014) mirMark: a site-level and UTR-level classifier for miRNA target prediction. Genome biology 15: 1.

9. Vejnar CE, Zdobnov EM (2012) MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic acids research 40: 11673-11683.

10. Vejnar CE, Blum M, Zdobnov EM (2013) miRmap web: comprehensive microRNA target prediction online. Nucleic acids research 41: W165-W168.

11. Hsu S-D, Lin F-M, Wu W-Y, Liang C, Huang W-C, et al. (2010) miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic acids research: gkq1107.

12. Xiao F, Zuo Z, Cai G, Kang S, Gao X, et al. (2009) miRecords: an integrated resource for microRNA–target interactions. Nucleic acids research 37: D105-D110.

13. Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, et al. (2015) DIANA-TarBase v7. 0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic acids research 43: D153-D159.

14. Kung DM (2011) A Study of RNA Features for MicroRNA Target Prediction. MS thesis.

15. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Machine learning 46: 131-159.

16. Vapnik V (2013) The nature of statistical learning theory: Springer Science & Business Media.

17. Oyang Y-J, Hwang S-C, Ou Y-Y, Chen C-Y, Chen Z-W (2005) Data classification with radial basis function networks based on a novel kernel density estimation algorithm. IEEE transactions on neural networks 16: 225-236.

18. Artin E (1964) The Gamma Function. New York: Holt, Rinehart and Winston.

19. Zięba M, Tomczak JM, Lubicz M, Świątek J (2014) Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied soft computing 14: 99-108.

20. Cohen WW. Fast effective rule induction; 1995. pp. 115-123.

21. Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise; 1996. pp. 226-231.

22. Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. Journal of Machine Learning Research 9: 2579-2605.

論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2017-07-21起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2022-01-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw