進階搜尋


 
系統識別號 U0026-0812200911153787
論文名稱(中文) Gene Ontology架構下基因表現值的多階層關聯規則探勘
論文名稱(英文) Mining Multilevel Association Rules from Gene Expression Profiles and Gene Ontology
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 92
學期 2
出版年 93
研究生(中文) 楊世強
研究生(英文) Shih-Chiang Yang
電子信箱 fox726@giga.net.tw
學號 p7691161
學位類別 碩士
語文別 中文
論文頁數 57頁
口試委員 指導教授-曾新穆
口試委員-蔣榮先
口試委員-李強
口試委員-李建億
口試委員-楊永正
中文關鍵字 微陣列  關聯規則  多階層探勘  基因表現分析  資料探勘 
英文關鍵字 Data Mining  Microarray  Gene Expression Analysis  Association Rules Mining  Multi-Level Mining  Gene Ontology 
學科別分類
中文摘要   在本研究中,我們以資料探勘中的一種方法―關聯規則為基礎,提出一個可整合微陣列的基因表現值及Gene Ontology(GO)概念架構之多階層關聯規則探勘方法。GO提供了三個嚴謹的網路結構,其中的字彙節點可以描述基因產物的Molecular Function,Biological Process和Cellular Component這三個層面的資訊,我們藉由GO裡的這些寶貴資訊來加強我們的關聯規則。最近的一些研究已經可以證實,關聯規則的確可以探勘出基因之間隱藏的互動連結,以及傳統叢集分析所無法顯示出的基因表現樣式。而我們所提出之新方法,則是採用多階層關聯規則探勘法,藉由GO的概念架構歸納基因群,來探勘GO字彙之間的關係。舉例來說,當我們在探勘Biological Process這個分支時,我們可以探勘出類似Process A↑ => Process B↑這種規則,它的意義是當Process A所包含的基因處於激發狀態時,Process B所包含的基因也極有可能同樣處於激發狀態。經由實驗分析證實,我們所提出之方法可以有效率地發掘出在GO概念架構下之基因群功能表現關係。同時,我們也提出以規則樣板為基礎之限制性探勘方法,可以有效地篩選出使用者感興趣之規則集。

英文摘要   Some recent studies have shown that association rules can reveal the interactions between genes, showing patterns that might not have been revealed using traditional clustering methods. We propose a new data mining technique for discovering the multilevel association rules from the gene expression profiles and the concept hierarchy of Gene Ontology (GO). GO provides three structured networks of defined terms that describe the molecular function, biological process and cellular components of the gene products. Our multilevel association rules mining method can find out the relations between GO terms by summarizing the genes with the hierarchy of GO. For example, with the branch of biological process in GO, some rules like Process A (up) → Process B (up) cab be discovered, which indicates that Process B is likely to be up-regulated whenever Process A is up-regulated. Through empirical evaluation, our method is shown to have excellent performance in discovering these hidden multilevel association rules. We also propose a constrained mining method for discovering the rules that users are really interested in.

論文目次 英文摘要……………………………………………………………………………………I
中文摘要…………………………………………………………………………...II
誌謝…………………………………………………………………………………..III
目錄…………………………………………………………………………………..IV
表目錄…………………………………………………………………………...…VIII
圖目錄……………………………………………………………………...…..……IX

第一章 導論……………………………………………………………………………..1
1.1 研究背景……………………………………………………….………..………1
1.2 研究動機……………………………..…………………….……………………1
1.3 問題描述…………………………..……………………………….……………2
1.4 研究方法……………………..………………………………….………………4
1.5 研究貢獻…………………………………..…………………….………………5
1.6 論文架構……………………………………..……………….…………………6

第二章 文獻探討………………………………………………………………………7
2.1 關聯規則………………………………..…………………………………….…7
2.1.1 關聯規則定義……………………………………….……………..…7
2.1.2 關聯規則的目的…………………………………….………………..8
2.1.3 關聯規則探勘方法………………………………….………………..8
2.1.4 Apriori演算法………………………………………….…………..…9
2.1.5 Apriori各函式說明……………………………….…………………10
2.2 關聯規則在生物資訊裡的應用…………………………..……………………11
2.2.1 微陣列表現值資料探勘…………………………………………..…11
2.2.2 各種關聯規則的變化應用…………………………………………..12
2.3 Gene Ontology………………………………………………………………….12
2.3.1 Gene Ontology的基本架構…………………………………………12
2.3.2 基因在Gene Ontology的註解(annotation)…………….…….…14
2.3.3 Molecular Function的用途………………………………………….15
2.3.4 Biological Process的用途……………………………………………15
2.3.5 Cellular Component的用途………………………………………….15
2.4 多階層關聯規則………………………………………………………………..16
2.4.1 多階層概念架構(concept hierarchy)……………………………..16
2.4.2 多階層關聯規則探勘………………………………………………..17
2.4.3 演算法ML_T1LA……………………………………………………18
2.4.4 ML_T1LA範例………………………………………………………20
2.4.5 多維度多階層關聯規則……………………………………………..22

第三章 結合Gene Ontology的多階層關聯規則探勘…………………………23
3.1 資料前處理……………………………………………………………………..23
3.2 分類資訊編碼…………………………………………………………………..24
3.2.1 微陣列資料分類編碼―一般情況………………………………..…24
3.2.2 微陣列資料分類編碼―特殊情況………………….…………….…26
3.2.3 編碼表格………………………………………………………..……27
3.3 Gene Ontology架構下的多階層關聯規則探勘…………………………….…28
3.3.1 MAGO演算法……………………………………………………….28
3.3.2 動態調整各階層的最小支持度…………………………………..…30
3.3.3 跨階層關聯規則……………………………………………………..31
3.4 條件式多階層關聯規則探勘:CMAGO………………………………………33
3.4.1 規則樣版的形式……………………………………………………..33
3.4.2 規則樣版的解析……………………………………………………..34
3.4.3 範例說明……………………………………………………………..34
3.4.4 CMAGO演算法……………………………………………………...35
3.5 方法總結………………………………………………………………………..36

第四章 實驗分析……………………..……………………………………………….37
4.1 實驗資料組……………………………………………………………………..37
4.1.1 實驗資料特性………………………………………………………..38
4.1.2 1-itemset的項目特性………………………………………………..39
4.2 起始階層實驗…………………………………………………………………..41
4.3 最小支持度暨執行時間實驗…………………………………………………..43
4.4 規則樣版實驗…………………………………………………………………..45
4.4.1 最小支持度實驗……………………………………………………..45
4.4.2 樣版實驗……………………………………………………………..46
4.5 規則驗證………………………………………………………………………..47
4.6 實驗總結………………………………………………………………………..50

第五章 結論及未來研究方向………………………………………………………51
5.1 結論……………………………………………………………………………..51
5.2 應用……………………………………………………………………………..51
5.3 未來研究方向……………………………………………………………….….52

參考文獻…………………………...………………………………………….……….54

自述……………………………...……………………………………………………..XI
參考文獻 [1] R. Agrawal, T. Imielinski and A. Swami. Mining Association Rules Between Sets of Items in Large Databases. Proc. of the ACM SIGMOD Conference on Management of Data, pp 207-216. 1993.
[2] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. Proc. 20th Very Large Databases (VLDB) Conference, pp 487-499, Santiage, Chile. 1994.
[3] R. Agrawal, M. Mehta, J. Shafer and R. Srikant. The Quest Data Mining System. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996.
[4] D. Berrar, W. Dubitzky, M. Granzow and R. Ells. Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. Proc. of Critical Assessment of Microarray Data Analysis Techniques (CAbiDA '01), pp 25-28. 2001.
[5] A. Ben-Dor and Z. Yakhini. Clustering Gene Expression Patterns. In RECOMB99: Proc. of the Third Annual International Conference on Computational Molecular Biology, Lyon, France, pp 33-42. 1999.
[6] A. Brazma and J. Vilo. Gene Expression Data Analysis. FEBS Letters, Vol. 480, pp 17-24. BIOKDD01: Workshop on Data Mining in Bioinformatics (with SIGKDD01, Conference) page 29. 2000.
[7] R. Chen, Q. Jiang, H. Yuan and L. Gruenwald. Mining Association Rules in Analysis of Transcription Factors Essential to Gene Expressions. Atlantic Symposium on Computational Biology, and Genome Information Systems & Technology. 2001.
[8] C. Creighton and S. Hanash. Mining Gene Expression Databases for Association Rules. Bioinformatics Vol 19 no. 1, pp 79-86. 2003.
[9] J. L. DeRisi, V. Iyer and P. O. Brown. Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science Vol. 278, pp 680-686. 1997.
[10] S. Doddi, A. Marathe, S.S. Ravi and D.C. Torney. Discovery of Association Rules in Medical Data. Med. Inform. Internet. Med., Vol. 26, pp 25-33. 2001.
[11] S. Fortin and L. Liu. An Object-oriented Approach to Multi-level Association Rule Mining. Proc. of the International Conf. on Information and Knowledge Management (CIKM’96), ACM Press, November 12-16, Rockville, Maryland, USA. 1996.
[12] T.R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. L. Loh, J. Downing, M. A. Caligiuri, C.D. Bloomfield and E. S. Lander. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, Vol. 286, pp 531-537. 1999.
[13] J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. Proc. of the 21st VLDB Conference Zurich, Switzerland, 1995.
[14] J. Han and Y. Fu. Discovery of Multiple-Level Association Rules in Large Databases. IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 5, September/October 1999.
[15] J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the 2000 ACM SIGKDD International Conference on Management of Data, Dallas, Texas, USA. 2000.
[16] T. R. Hughes, M. J. Marton, A. R. Jones, C.J. Roberts, R. Stoughton, C. D. Armour, H. A. Bennett, E. Coffey, H. Dai, Y. D. He, M. J. Kidd and A. M. King. Functional Discovery via a Compendium of Expression Profiles. Cell, 102, pp 109-126. 2000.
[17] T. R. Hvidsten, A. Lægreid and J. Komorowski. Learning Rule-based Models of Biological Process from Gene Expression Time Profiles using Gene Ontology. Bioinformatics Vol. 19 no. 9, pp 1116-1123. 2003.
[18] A. Icev, C. Ruiz and E. F. Ryder. Distance-Enhanced Association Rules for Gene Expression. 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD2003). 2003.
[19] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen and A. I. Verkamo. Finding Interesting Rules from Large Sets of Discovered Association Rules. Conference on Information and Knowledge Management (CIKM), Gaitherburg, MD, USA. 1994.
[20] P. Kotala, P. Zhou, S. Mudivarthy, W. Perrizo and E. Deckard. Gene Expression Profiling of DNA Microarray Data using Peano Count Trees (P-trees). Online Proceedings of the First Virtual Conference on Genomics and Bioinformatics. 2001.
[21] B. Liu, W. Hsu and Y. Ma. Mining Association Rules with Multiple Minimum Supports. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD99), August 15-18, San Diego, CA, USA. 1999.
[22] B. Liu, W. Hsu and Y. Ma. Pruning and Summarizing the Discovered Associations. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD99), August 15-18, San Diego, CA, USA. 1999.
[23] Y. Lu. Concept Hierarchy in Data Mining: Specification, Generation and Implementation. Department of Computer Science, Simon Fraser University. 1997.
[24] R. Mao. Adaptive-FP: An Efficient and Effective Method for Multi-level Multi-Dimensional Frequent Pattern. Department of Computer Science, Simon Fraser University. 1997.
[25] D. Pe’er, A. Regev, G. Elidan and N. Friedman. Inferring Subnetworks from Perturbed Expression Profiles. Bioinformatics Vol.17 Suppl. 1, pp S215-S224. 2001.
[26] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander and T. R. Golub. Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proc. Natl Acad. Sci. USA, Vol. 96, pp 2907-2912. 1999.
[27] The Gene Ontology (GO) Consortium. Gene Ontology: Tool for the Unification of Biology. Nat. Genet, Vol. 25, pp 25-29. 2000.
[28] The Gene Ontology (GO) Consortium. Creating the Gene Ontology Resource: Design and Implementation. Genome Res., Vol. 11, pp 1425-1433. 2001.
[29] M. Tseng and W. Lin. Mining Generalized Association Rules with Multiple Minimum Supports. DaWaK 2001, LNCS 2114, pp. 11-20. 2001.
[30] S. M. Tseng and C. F. Chiu. Mining Multi-Level and Location-Aware Associated Service Patterns in Mobile Environments. submitted to IEEE Transactions on Systems, Man and Cybernetics.(SCI). 2004.
[31] A. Tuzhilin and G. Adomavicius. Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. Proc. Eighth Intl. Conf. on Knowledge Discovery and Data Mining (KDD2002), pp 396-404. 2002.
[32] K. Umebayashi and A. Nakano. Ergosterol is Required for Targeting of Tryptophan Permease to the Yeast Plasma Membrane. The Journal of Cell Biology, Vol. 11, no. 6, pp 1117-1131. 2003.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2004-08-16起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2004-08-16起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw