||YSGM: an integrative repository of diverse biological features for mining similar genes in yeast
||Department of Electrical Engineering
yeast gene similarity
探討基因相似度能幫助生物學家了解在生物行為過程中基因間的關係，並且利用基因相似度可以改善對於生物行為的認知以及更能理解相似基因所參與的生物過程，進而預測未知基因的功能。目前已有許多研究是探討不同物種間的基因相似度，利用已知的基因預測另一物種中具有相似生物行為之基因的功能。然而，酵母菌基因資訊比其他物種更為完整，探討酵母菌中基因的相似度的研究卻相對較少，而且這些研究中往往只使用單一的酵母菌特徵分析酵母菌基因的相似程度，並沒有採用較完整酵母菌的基因資訊探討酵母菌基因的相似性。在本論文中，我們收集較完整的酵母菌基因資訊，並且建立一個資料庫以幫助生物學家分析酵母菌基因的相似度和了解酵母菌基因間的關係。首先，我們收集了六個酵母菌基因的生物資訊，包含酵母菌蛋白質交互作用資訊(protein-protein interactions)、轉錄因子資訊(transcription factors)、基因表現量資訊(gene expression profiles)、基因突變表現資訊(mutant phenotypes)、基因功能和文獻證據。接著分別利用不同的演算法計算出在各種生物特徵中兩兩基因的相似度分數，其中我們利用超幾何測試(hypergeometric test)分別計算出各對酵母菌基因在蛋白質交互作用資訊、轉錄因子資訊、基因突變表現資訊、基因功能和文獻證據下的交集顯著性當作各對酵母菌基因的相似度分數以及採用Hibbs等人研究相似酵母菌基因的表現量資訊的結果當作基因表現量資訊的相似度分數。最後由使用者或生物學家選擇想要分析的酵母菌基因以及生物特徵，將其他酵母菌基因與此目標酵母菌基因在這些生物特徵下的相似度分數相加起來做為它們的相似度分數。再將這些分數排列後，找出在這些生物特徵中與此目標酵母菌基因最相似的酵母菌基因。比較現有的資料庫STRING，我們具有較完整的酵母菌基因資訊來探索出較正確的相似酵母菌基因。我們相信此研究能夠幫助生物學家了解在不同的生物特徵下酵母菌基因間的相似程度與關係，並進行更深一步的酵母菌研究。
Investigating gene similarity can be used in understanding the relations of genes in biological processes. Determining similar genes by calculating the similarity between them can substantially improve knowledge of biological behavior. Moreover, interpret biological processes participated by similar genes, and in turn, predicts similar mechanisms in other species. Previous related studies of similar genes have primarily concentrated on the association of biological function in cross species. However, few studies have thoroughly investigated the resemblance between genes which exist in the same species, such as yeast. Although several databases and web tools exist, they have not collected adequate databases about yeast or considered various conditions of yeast mechanisms. Therefore, we developed a web tool YSGM (Yeast Similar Genes Miner) that intends to fully represent similarity between yeast genes and exploited more distinctive data to explore yeast similar genes. Here we have integrated various databases which involve protein-protein interaction, genetic interaction, transcription factor(TF)-gene binding pairs, TF-gene regulatory pairs, expression profiles, mutant phenotypes, functional description, and literature evidence. And then, we have used hypergeometric test to calculate the score of gene similarity in seven biological data including protein-protein interactions, genetic interactions, TF-gene binding pairs, TF-gene regulatory pairs, mutant phenotypes, functional descriptions and evidences from the literature, and retrieved and scaled the correlation score from the result of Hibbs et al., who identified similar gene expression profiles between yeast genes and developed serial pattern of expression levels locator(SPELL), as the similarity score of expression profiles of yeast. Subsequently, we combined these similarity scores as the similarity score. Through giving a yeast gene of interest, choosing specific gene similarity features and measuring yeast gene similarity, YSGM can extract similar yeast genes for specified yeast genes under different gene similarity features. In this study, we presented the YSGM database that provides unique biological information for searching similar yeast genes. Furthermore, investigating yeast similarity with YSGM can obtain correct and complete prediction of similar genes of yeast because the biological information in YSGM including many aspects of biological behavior and evidence such as TF-gene binding, TF-gene regulatory and mutant phenotypes for querying similar yeast genes are more distinctive than other database. We believe that the diversity of biological information accumulated in YSGM will drastically enhance the usefulness of data for yeast biologists to study similarity between yeast genes.
中文摘要 - Page.I
Abstract(English) - Page.III
Acknowledgement - Page.V
List of Tables - Page.VIII
List of Figures - Page.IX
List of Abbreviations - Page.X
Chapter 1 Introduction - Page.1
1.1 Motivation and literature review - Page.1
1.1.1 Previously established genome browsing - Page.2
1.1.2 Biological features in yeast - Page.3
1.2 Contributions - Page.4
Chapter 2 Construction of Yeast Similar Genes Miner Database - Page.5
2.1 YSGM construction - Page.5
2.2 Data collection - Page.6
2.3 Calculating similar scores - Page.8
2.4 Implementation of the web service of YSGM - Page.10
Chapter 3 Utility and Discussion - Page.11
3.1 Database interface - Page.11
3.2 Case study - Page.17
3.3 Comparison with related databases - Page.18
Chapter 4 Conclusion - Page.22
4.1 Conclusion - Page.22
References - Page.23
 D. J. Allocco, I. S. Kohane, and A. J. Butte, “Quantifying the relationship between co-expression, co-regulation and gene function,” BMC bioinformatics, vol. 5, no. 1, p. 18, 2004.
 P. D. Andrews and M. Stark, “Dynamic, Rho1p-dependent localization of Pkc1p to sites of polarized growth,” Journal of cell science, vol. 113, no. 15, pp. 2685–2693, 2000.
 M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, et al., “Gene Ontology: tool for the unification of biology,” Nature genetics, vol. 25, no. 1, pp. 25–29, 2000.
 G. D. Bader and C. W. Hogue, “Analyzing yeast protein–protein interaction data obtained from different sources,” Nature biotechnology, vol. 20, no. 10, pp. 991–997, 2002.
 J. I. F. Bass, A. Diallo, J. Nelson, J. M. Soto, C. L. Myers, and A. J. Walhout, “Using networks to measure similarity between genes: association index selection,” Nature methods, vol. 10, no. 12, pp. 1169–1176, 2013.
 M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares, and D. Haussler, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, vol. 97, no. 1, pp. 262–267, 2000.
 C. Brun, F. Chevenet, D. Martin, J. Wojcik, A. Gu´enoche, and B. Jacq, “Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network,” Genome biology, vol. 5, no. 1, p. R6, 2003.
 K. P. Byrne and K. H. Wolfe, “The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species,” Genome research, vol. 15, no. 10, pp. 1456–1461, 2005.
 A. Clare and R. D. King, “Predicting gene function in Saccharomyces cerevisiae,” Bioinformatics, vol. 19, no. suppl 2, pp. ii42–ii49, 2003.
 M. S. Cyert, “Calcineurin signaling in Saccharomyces cerevisiae: how yeast go crazy in response to stress,” Biochemical and biophysical research communications, vol. 311, no. 4, pp. 1143–1150, 2003.
 C. M. Douglas, F. Foor, J. A. Marrinan, N. Morin, J. B. Nielsen, A. M. Dahl, P. Mazur, W. Baginsky,W. Li, and M. El-Sherbeini, “The Saccharomyces cerevisiae FKS1 (ETG1) gene encodes an integral membrane protein which is a subunit of 1, 3-beta-D-glucan synthase,” Proceedings of the National Academy of Sciences, vol. 91, no. 26, pp. 12907–12911, 1994.
 A. Franceschini, D. Szklarczyk, S. Frankild, M. Kuhn, M. Simonovic, A. Roth, J. Lin, P. Minguez, P. Bork, C. von Mering, et al., “STRING v9.1: protein-protein interaction networks, with increased coverage and integration,” Nucleic acids research, vol. 41,
no. D1, pp. D808–D815, 2013.
 S. B. Inoue, N. Takewakt, T. Takasuka, T. Mio, M. Adachi, Y. Fujii, C. Miyamoto, M. Arisawa, Y. Furuichi, and T. Watanabe, “Characterization and Gene Cloning of 1, 3-beta -D-Glucan Synthase from Saccharomyces Cerevisiae,” European journal of biochemistry,
vol. 231, no. 3, pp. 845–854, 1995.
 J. Jungmann, J. C. Rayner, and S. Munro, “The Saccharomyces cerevisiae protein Mnn10p/Bed1p is a subunit of a Golgi mannosyltransferase complex,” Journal of biological chemistry, vol. 274, no. 10, pp. 6579–6585, 1999.
 M. R. Koch and L. Pillus, “The glucanosyltransferase Gas1 functions in transcriptional silencing,” Proceedings of the National Academy of Sciences, vol. 106, no. 27, pp. 11224–11229, 2009.
 T. Kurita, Y. Noda, T. Takagi, M. Osumi, and K. Yoda, “Kre6 protein essential for yeast cell wall beta-1, 6-glucan synthesis accumulates at sites of polarized growth,” Journal of biological chemistry, vol. 286, no. 9, pp. 7429–7438, 2011.
 S. Letovsky and S. Kasif, “Predicting protein function from protein/protein interaction data: a probabilistic approach,” Bioinformatics, vol. 19, no. suppl 1, pp. i197–i204, 2003.
 C. T. Lopes, M. Franz, F. Kazi, S. L. Donaldson, Q. Morris, and G. D. Bader, “Cytoscape Web: an interactive web-based network browser,” Bioinformatics, vol. 26, no. 18, pp. 2347–2348, 2010.
 P.W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003.
 H. Martin, A. Dagkessamanskaia, G. Satchanska, N. Dallies, and J. Franc¸ois, “KNR4, a suppressor of Saccharomyces cerevisiae cwh mutants, is involved in the transcriptional control of chitin synthase genes,” Microbiology, vol. 145, no. 1, pp. 249–258, 1999.
 H. Martin-Yken, A. Dagkessamanskaia, F. Basmaji, A. Lagorce, and J. Francois, “The interaction of Slt2 MAP kinase with Knr4 is necessary for signalling through the cell wall integrity pathway in Saccharomyces cerevisiae,” Molecular microbiology, vol. 49, no. 1, pp. 23–35, 2003.
 P. Mazur, N. Morin, W. Baginsky, M. El-Sherbeini, J. A. Clemas, J. B. Nielsen, and F. Foor, “Differential expression and function of two homologous subunits of yeast 1, 3-beta-D-glucan synthase.,” Molecular and Cellular Biology, vol. 15, no. 10, pp. 5671–5681, 1995.
 H. Qadota, C. P. Python, S. B. Inoue, M. Arisawa, Y. Anraku, Y. Zheng, T. Watanabe, D. E. Levin, and Y. Ohya, “Identification of yeast Rho1p GTPase as a regulatory subunit of 1, 3- beta-glucan synthase,” Science, vol. 272, no. 5259, pp. 279–281, 1996.
 P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy,” arXiv preprint cmp-lg/9511007, 1995.
 A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. G¨uldener, G. Mannhaupt, M. M¨unsterk¨otter, et al., “The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes,” Nucleic acids research, vol. 32, no. 18, pp. 5539–5545, 2004.
 F. Rusnak, P. Mertz, et al., “Calcineurin: form and function,” Physiological reviews, vol. 80, no. 4, pp. 1483–1522, 2000.
 M. Sekiya-Kawasaki, M. Abe, A. Saka, D. Watanabe, K. Kono, M. Minemura-Asakawa, S. Ishihara, T. Watanabe, and Y. Ohya, “Dissection of upstream regulatory components of the Rho1p effector, 1, 3- -glucan synthase, in Saccharomyces cerevisiae,” Genetics, vol. 162, no. 2, pp. 663–676, 2002.
 C. Stark, B.-J. Breitkreutz, A. Chatr-Aryamontri, L. Boucher, R. Oughtred, M. S. Livstone, J. Nixon, K. Van Auken, X. Wang, X. Shi, et al., “The BioGRID interaction database: 2011 update,” Nucleic acids research, vol. 39, no. suppl 1, pp. D698–D704, 2011.
 D. Stojanova, M. Ceci, D. Malerba, et al., “Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction,” BMC bioinformatics, vol. 14, no. 1, p. 285, 2013.
 M. C. Teixeira, P. T. Monteiro, J. F. Guerreiro, J. P. Gonc¸alves, N. P. Mira, S. C. dos Santos, T. R. Cabrito, M. Palma, C. Costa, A. P. Francisco, et al., “The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae,” Nucleic acids research, vol. 42, no. D1, pp. D161–D166, 2014.
 H. Terashima, K. Hamada, and K. Kitada, “The localization change of Ybr078w/Ecm33, a yeast GPI-associated protein, from the plasma membrane to the cell wall, affecting the cellular function,” FEMS microbiology letters, vol. 218, no. 1, pp. 175–180, 2003.
 C. L. Tucker, J. F. Gera, and P. Uetz, “Towards an understanding of complex protein networks,” Trends in cell biology, vol. 11, no. 3, pp. 102–106, 2001.
 T. Utsugi, M. Minemura, A. Hirata, M. Abe, D. Watanabe, and Y. Ohya, “Movement of yeast 1, 3- beta-glucan synthase is essential for uniform cell wall synthesis,” Genes to Cells, vol. 7, no. 1, pp. 1–9, 2002.
 M. Vidal, “A biological atlas of functional maps,” Cell, vol. 104, no. 3, pp. 333–339, 2001.
 W. Yamochi, K. Tanaka, H. Nonaka, A. Maeda, T. Musha, and Y. Takai, “Growth site localization of Rho1 small GTP-binding protein and its involvement in bud formation in Saccharomyces cerevisiae.,” The Journal of cell biology, vol. 125, no. 5, pp. 1077–1093, 1994.
 E. Zamir and B. Geiger, “Molecular complexity and dynamics of cell-matrix adhesions,” Journal of cell science, vol. 114, no. 20, pp. 3583–3590, 2001.
 C. Zhu, K. J. Byers, R. P. McCord, Z. Shi, M. F. Berger, D. E. Newburger, K. Saulrieta, Z. Smith, M. V. Shah, M. Radhakrishnan, et al., “High-resolution DNA-binding specificity analysis of yeast transcription factors,” Genome research, vol. 19, no. 4, pp. 556–566, 2009.