進階搜尋


下載電子全文  
系統識別號 U0026-1208201418502400
論文名稱(中文) 在蛋白質的特定結構上利用圖形挖掘方法來做分類
論文名稱(英文) A Novel Algorithm for Classifying Protein Specific Residue Structures Using the Graph Mining Approach
校院名稱 成功大學
系所名稱(中) 醫學資訊研究所
系所名稱(英) Institute of Medical Informatics
學年度 102
學期 2
出版年 103
研究生(中文) 楊宗穎
研究生(英文) Zong-Ying Yang
學號 Q56001066
學位類別 碩士
語文別 英文
論文頁數 51頁
口試委員 指導教授-謝孫源
口試委員-郭淑美
口試委員-莊坤達
口試委員-楊士德
口試委員-鄭憲宗
中文關鍵字 蛋白質結構基序  蛋白質三維結構  頻繁子圖搜尋  原子位移參數 
英文關鍵字 Frequent subgraph mining  amino acid  protein three-dimensional(3D) structure  B-factor 
學科別分類
中文摘要 蛋白質分類問題在生醫資訊中是一個相當熱門的研究議題。在這份論文裡,我們利用簡單圖形來展示三維蛋白質結構,圖形裡的每一個點用來表示二十種蛋白質的氨基酸。我們利用原子移位參數有效降低圖形裡的點和邊的複雜程度並加快程式執行速度。鎖定特定蛋白質家族中特定之子圖型樣,並應用搜尋頻繁子圖中的策略去找尋測試蛋白質是否亦有相同之子圖型樣,進一步分類屬於特定型樣的蛋白質家族。在實驗的部分,使用實際的蛋白質資料庫作為實驗數據,我們的分類方法在準確度表現上優於早先已知的方法,而且我們的方法是一個簡單、和容易被實施的方法。
英文摘要 Protein structural classification is an important problem in
bioinformatics. In this paper, we utilize a simple and connected graph to represent a protein three-dimensional structure in which each node represents an amino acid and each edge represents a contact distance between two amino acids. We then use the B-factor (atomic displacement parameters) to significantly reduce the number of nodes and edges in each graph representation. We apply a graph mining approach to find critical subgraphs among these graphs, which can be applied to classify protein structural families. We have conducted an experimental study in which marking characteristic substructural patterns were identified in several protein families in the SCOP database.
論文目次 1 Introduction 1
1.1 Motivation 1
1.2 Result 2
1.3 Organization 3
2 Bioinoformatics Background 6
2.1 Proteins are Chains of Amino Acids 6
2.2 Protein Structure 8
2.3 Protein Structure is Described in Four Levels 9
2.4 B-factors and coordinate error 13
2.5 NP-hard and NP-complete 14
2.6 Basics of Graph Theory 16
3 Related Works 20
3.1 Search strategy (BFS) 21
3.2 Search strategy (DFS) 21
4 Preliminaries 23
4.1 The Proposed Algorithm 25
4.1.1 Building protein graphs 25
4.1.2 Building Adjacency Matrix of Protein Graphs 27
4.1.3 Canonical Adjacency Matrix (CAM) Tree 28
4.1.4 Classification of protein family 30
5 Experimental Results 37
5.1 Implementation and Test Platform 37
5.2 The protein data bank 37
5.3 Protein Representation as a Simple Graph 40
5.4 Accuracy of classification each protein family 40
6 Conclusion 46
Bibliography 47
參考文獻 [1] R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association Rules,” in Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499, September 12-15, 1994.
[2] S. F. Altschul, et al. ”Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic acids research, vol. 25, no. 17, pp.3389–3402, 1997.
[3] P. Aloy, E. Querol, F. X. Aviles, and M. J. E. Sternberg, “Automates Structurebased Prediction of Functional Sites in Proteins: Applications to Assessing the
Validity of Inheriting Protein Function from Homology in Genome Annotation and to Protein Docking,” Journal of Molecular Biology, vol. 311, no. 2, pp. 395–408, 2001.
[4] Z. Aung and K. L. Tan, “Automatic Protein Structure Classification through Structural Fingerprinting,” in Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering (ICBBE), pp. 508–515, March 19-21, 2004.
[5] D. Banyopadhyay, J. Huan, J. Liu, J. Prins, J. Snoeyink, A. Tropsha, and W. Wang, “Using Fast Subgraph Isomorphism Checking for Protein Functional Annotation Using SCOP and Gene Ontology,” The University of North Carolina at Chapel Hill Department of Computer Science Technical Report, 2005.
[6] D. Bandyopadhyay, J. Huan, J. Liu, J. Prins, J. Snoeyink, W. Wang, and A. Tropsha, “Functional Neighbors: Inferring Relationships between Nonhomologous Protein Families Using Family-Specific Packing Motifs,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 5, pp. 1137–1143, 2010.
[7] H. M. Berman, “The Protein Data Bank: a historical perspective,” Acta Crys-tallographica Section A: Foundations of Crystallography, vol. 64, no. 1, pp. 88–95,2008.
[8] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000.
[9] S. Chakravarthy and S. Pradhan, “DB-FSG: An SQL-Based Approach for Frequent Subgraph Mining,” in Proceedings of the 10th Database and Expert Systems Applications (DEXA), pp. 684–692, September 1-5, 2008.
[10] D. Fischer, H. Wolfson and R. Nussinov, “ Three-dimensional, sequence orderindependent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding,” Protein Science vol. 3, no. 5, pp. 769–778, 1994.
[11] S. Henikoff, J. G. Henikoff, and S. Pietrokovski, “Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations, ” Bioin-formatics, vol. 15, no. 6, pp.471–479, 1999.
[12] L. B. Holder, D. J. Cook, and S. Djoko, “Substructure Discovery in the SUBDUE System,” in Proceedings of the Association for the Advancement of Arti cial Intelligence Workshop on Knowledge Discovery in Database (AAAI), pp. 169–180,July 31, 1994.
[13] J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink, J. Prins, and A. Tropsha, “Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs,” Journal of Computational Biology, vol. 12, no. 6, pp. 657–671, 2005.
[14] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha, “Mining Protein Family Specific Residue Packing Patterns From Protein Structure Graphs,” in Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 308–315, March 27-31, 2004.
[15] J. Huan, W. Wang, and J. Prins, “Efficient Mining of Frequent Subgraph in the Presence of Isomorphism,” in Proceedings of the 3th IEEE International Conference on Data Mining (ICDM), pp. 549–552, November 19-22, 2003.
[16] A. Inokuchi, T. Washio, and H. Motoda, “Derivation of the topology structure from massive graph data,” in Proceedings of the 2th Discovery Science International Conference (DS), pp. 330–332, 1999.
[17] A. Inokuchi, T. Washio, and H. Motoda, “An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data,” in Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD), pp. 13–23, 2000.
[18] W. J. Kent “BLATXthe BLAST-like Alignment Tool,” Genome Research, vol. 12, no. 4, pp. 656-664, 2000.
[19] V. Krishna, N. N. R. R. Suri, and G. Athithan, “A comparative survey of algorithms for frequent subgraph discovery,” Current Science, vol. 100, no. 25, pp. 190-198, 2011.
[20] M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” in Proceedings of the 1th IEEE Conference on Data Mining (ICDM), pp. 313–320, November 29 - December 2, 2001.
[21] M. Laberge and T. Yonetani “Common Dynamics of Globin Family Proteins,” International Union of Biochemistry and Molecular Biology, vol. 59, no.8, pp.528–534, 2007.
[22] D. Landsman “Common Sequence and Structural Features in the Heat-Shock Factor and Ets Families of DNA-Binding Domains,” Trends in Biochem Science, vol. 20, no. 6, pp. 225–226, 1995.
[23] W. W. M. Lam, and K. C. C. Chan, “A Graph Mining Algorithm for Classifying Chemical Compounds,” in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 321–324, November 3 - 5, 2008.
[24] J. A. Lees, et al. “The Retinoblastoma Protein Binds to a Family of E2F Transcription Factors,” Molecular and cellular biology, vol. 13, no. 12, pp. 7813–7825, 1993.
[25] S. Nijssen and J. N. Kok, “A Quickstart in Frequent Structure Mining can make a Difference,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 647–652, August 22 - 25, 2004.
[26] S. Nijssen and J. N. Kok, “Frequent Graph Mining and its Application to Molecular Databases,” in Proceedings of the IEEE International Conference on Systems,Man and Cybernetics (SMC), vol. 5, pp. 4571–4577, October 10-13, 2004.
[27] S. Nijssen and J. N. Kok, “The Gaston Tool for Frequent Subgraph Mining,” Electronic Notes in Theoretical Computer Science, vol. 127, no. 1, pp. 77–87, 2005.
[28] S. Parthasarathy and M. R. N. Murthy, “Protein Thermal Stability: Insights from Atomic Displacement Parameters (B-values),” Protein Engineering, vol. 13, no. 1, pp. 9–13, 2000.
[29] A. M. Petros, E. T. Olejniczak and S. W. Fesik, “Structural Biology of the Bcl-2 Family of Proteins,” Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, vol. 1644, no. 2, pp. 83-94, 2004.
[30] E. Remold-O’Donnell, “The Ovalbumin Family of Serpin Proteins,” Federation of European Biochemical Societeies Letters, vol. 315, no, 2, pp. 105-108, 1993.
[31] A. Smalter, J. Huan, Y. Jia, and G. Lushington, “GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 197–207, 2010.
[32] S. Vishveshwara, K. V. Brinda, and N. Kannan, “Protein Structure: Insights From Graph Theory,” Journal of Theoretical and Computational Chemistry, vol. 1, no. 1, pp. 187–211, 2002.
[33] B. Wackersreuther, P. Wackersreuther, and A. Oswald, “Frequent Subgraph Discovery in Dynamic Networks,” in Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG), pp. 155–162, August 5, 2010.
[34] N. Weskamp, D. Kuhn, E. Hllermeier, and G. Klebe, “Efficient similarity search in protein structure databases by k - clique hashing,” Bioinformatics, vol. 20, no. 10, pp. 1522–1526, 2005.
[35] D. W. Williams, J. Huan, and W. Wang, “Graph Database Indexing Using Structured Graph Decomposition,” in Proceedings of the 23th International Conference on Data Engineering (ICDE), pp. 976–975, April 15-20, 2007.
[36] X. Yan, and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” in Proceedings of the 3th IEEE International Conference on Data Mining (ICDM), pp. 721-724, November 19-22, 2002.
[37] X. Yan and J. Han, “CloseGraph: Mining Close Frequent Graph Patterns,” in Proceeding of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp. 286–295, August 24-27, 2003.
[38] Z. Yuan, T. L. Bailey, and R. D. Teasdale, “Prediction of protein B-factor profiles,” Proteins, vol. 58, no. 4, pp. 905–912, 2005.
[39] B-factors and coordinate error. http://www.proteopedia.org/wiki/index.php/Temperature value
[40] Protein data obtained from Protein Data Bank.
http://www.rcsb.org/pdb/home/home.do
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-08-27起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2019-08-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw