進階搜尋


 
系統識別號 U0026-0812200913510490
論文名稱(中文) 質譜資料前處理中基底線修正與波峰校準之新方法
論文名稱(英文) Novel Baseline Correction and Peak Alignment Methods for Mass Spectrometry Data Preprocessing
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 95
學期 2
出版年 96
研究生(中文) 劉其瑋
研究生(英文) Chi-Wei Liu
電子信箱 p7694417@mail.ncku.edu.tw
學號 p7694417
學位類別 碩士
語文別 中文
論文頁數 59頁
口試委員 口試委員-廖寶琦
口試委員-蔣榮先
指導教授-曾新穆
口試委員-唐傳義
中文關鍵字 波峰校準  基底線修正  質譜資料前處理  資料探勘  蛋白質譜 
英文關鍵字 baseline correction  MS data preprocessing  data mining  mass spectrometry  peak alignment 
學科別分類
中文摘要 質譜分析在蛋白質體學研究中是重要的技術之一,而質譜資料前處理過程中又以基底線修正及波峰校準處理更是影響最後分析結果品質的關鍵。在目前已有的研究方法中,往往基底線修正的失真度與波峰校準的雜訊敏感度皆過高。因此,在本研究中,我們分別提出改進方法。在基底線修正的處理上,我們結合凸包(convex hull)演算法和LOESS迴歸法的優點找出更精準的質譜基底線,如此便能提升質譜訊號的品質。另一方面,由於目前已有的波峰校準方法無法找出雜訊位置,因此做出來的校對結果容易受到雜訊影響,所以我們提出了一個新的波峰校準演算法TPC (Two-Phases Clustering),利用此演算法,我們可以有效地從含有雜訊的波峰集中,把潛在雜訊從中篩選出來,進而提升質譜波峰資料間校對的正確性。在實驗部份,我們使用真實資料與人造資料來測試效能。在真實資料的實驗結果中,其效能評估比之前的方法還要好,而在人造資料的實驗中,我們所提出的方法可以更精確的找出實驗預藏的潛在雜訊,並且其涵蓋率(Recall)、精確率(Precision)以及F-measure值都很高。由實驗結果來看,我們提出的方法的確比目前已有的分析法有更佳的正確性。
英文摘要 In most proteomic studies, Mass spectrometry (MS) data analysis has become an important protein identification technique. The “baseline correction” and “peak alignment” methods are the key factors in MS data preprocessing stage for further analysis. However, the existing baseline correction methods may cause the distortion for original peak signals. And the existing peak alignment methods may be sensitive to noise peaks across various MS samples. In this study, we proposed two novel algorithms for these two key factors. We combined Convex Hull algorithm and LOESS regression method to find a better baseline for a MS data. It can successfully correct each MS peak profile and the result is more similar to original profile than the existing methods do. In the existing peak alignment methods, no studies have ever tried to point out the inconsistent peaks across various MS samples. We also proposed a new TPC (Two-phases clustering) algorithm to align multiple MS samples while the potential noise peaks could be indexed. In our experiments, we used real MS datasets and also generated synthetic datasets to evaluate the accuracy of peak alignment method. The results show that our method is better than previous method.
論文目次 中文摘要 I
英文摘要 II
誌謝 IV
目錄 V
表目錄 VII
圖目錄 VIII
第一章 導論 1
1.1 背景 1
1.2 研究動機 4
1.3 問題定義 5
1.4 研究方法 6
1.5 貢獻 7
1.6 論文架構 7
第二章 文獻探討 8
2.1 生物資訊學上的相關研究 8
2.2 基底線修正(Baseline Correction) 9
2.3 強度值正規化(Intensity Normalization) 10
2.4 波峰偵測(Peak Detection) 11
2.5 波峰校準(Peak Alignment) 14
第三章 研究方法 18
3.1 相關基礎描述 18
3.2 基底線校正(Baseline Correction) 19
3.3 波峰校準 (Peak Alignment) 22
3.3.1 Intensity Clustering Phase 24
3.3.2 Build Potential Noise List 26
3.3.3 M/Z Clustering Phase 28
第四章 實驗分析 31
4.1 實驗資料與環境 31
4.2 真實資料(Real Data)實驗結果 32
4.3 人造資料(Synthetic Data) 產生器 39
4.3.1 荷質比模擬 與 生物變異性 41
4.3.2 強度值模擬 42
4.3.3 產生雜訊點 43
4.4 人造資料(Synthetic Data)的評比方法 46
4.4.1 人造資料(Synthetic Data)實驗結果 48
4.5 實驗總結 52
第五章 結論與未來的研究方向 53
5.1 結論 53
5.2 未來發展 54
第六章 參考文獻 55
參考文獻 [1] R. Aebersold, and M. Mann, “Mass spectrometry-based proteomics”. Nature, 422, 198–207, 2003
[2] K. A. Baggerly, J. S. Morris, J. Wang, D. Gold, L. C. Xiao, and K. R. Coombes, “A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples,” Proteomics, vol. 3, pp. 1667-72, 2003.
[3] E. J. Breen, F. G. Hopwood, K. L. Williams, and M. R. Wilkins. “Automatic poisson peak harvesting for high throughput protein identification,” Electrophoresis, 21:2243–2251,2000.
[4] TP. Conrads, VA. Fusaro, S. Ross, D. Johann, V. Rajapakse, BA. Hitt, SM. Strinberg, EC. Kohn, DA. Fishman, G. Whitely, JC. Barrett, LA. Liotta, EF 3rd. Petricoin, TD. Veenstra, “High-resolution serum proteomic features for ovarian cancer detection,” Endocr Relat Cancer, 2004 Jun;11(2):163-78.
[5] KR. Coombes, HA. Fritsche, C. Clarke, JN. Chen, KA. Baggerly, JS. Morris, LC. Xiao, MC. Hung, HM. Kuerer, “Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization,” Clinical Chemistry. 2003 Oct;49(10):1615-23.
[6] K. R. Coombes, S. Tsavachidis, J. S. Morris, K. A. Baggerly, M. C. Hung, and H. M. Kuerer, "Improved peak detection and quantification of mass spec-trometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform," The University of Texas M.D. Anderson Cancer Center, Technical Report UTMDABTR-001-04, 2004.
[7] E.P. Diamandis, “Mass Spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations.” Mol. Cell. Proteomics, 3, 367–378, 2004
[8] R. Etziono, N. Urban, S. Ramsey, M. Mcintosh, S. Schwartz, B. Reid, J. Radich, G. Anderson, L. Hartwell, “The case for early detection,” Nature reviews cancer, 3(4):243-52, 2003 Apr.
[9] E. T. Fung and C. Enderwick, “ProteinChip clinical proteomics: computational challenges and solutions,” Biotechniques, vol. Suppl, pp. 34-8, 40-1, 2002.
[10] P. Geurts, M. Fillet, D. de Seny, MA. Meuwis, M. Malaise, MP. Meerville, L. Wehenkel, “Proteomic mass spectra classification using decision tree based ensemble methods,” Bioinformatics, Volume 21, Number 14, page 3138--3145 – 2005
[11] Y. Hu, S. Zhang, J. Yu, J. Liu, S. Zheng, “SELDI-TOF-MS: the proteomics and bioinformatics approaches in the diagnosis of breast cancer,” Breast, 14(4):250-5, 2005 Aug.
[12] Q. Liu, B. Krishnapuram, P. Pratapa, X. Liao, A. Hartemink, L. Carin, “Identification of differentially expressed proteins using MALDI-TOF mass spectra,” Asilomar Conf on Signals, Systems and Computers, November 2003.
[13] D. I. Malyarenko, W. E. Cooke, B. L. Adam, G. Malik, H. Chen, E. R. Tracy, M. W. Trosset, M. Sasinowski, O. J. Semmes, and D. M. Manos, "Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques," Clin Chem, vol. 51, pp. 65-74, 2005.
[14] EF. Petricoin, AM. Ardekani, BA. Hitt, PJ. Levine, VA. Fusaro, SM. Steinberg, GB. Mills, C. Simone, DA. Fishman, EC. Kohn, LA. Liotta, “Use of proteomic patterns in serum to identify ovarian cancer,” Lancet , 359(9306):572-7, 2002 Feb 16
[15] J. Prados A. Kalousis M. Hilario, “On Preprocessing of SELDI-MS Data and its Evaluation,” 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) , pp 953-958, 2006.
[16] J. Prados, A. Kalousis, JC. Sanchez, L. Allard, O. Carrette, M. Hilario, “Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents,” Proteomics, 2004 Aug;4(8):2320-32.
[17] V. Paradis, F. Degos, D. Dargere, N. Pham, J. Belghiti, C. Degott, J. L. Janeau, A. Bezeaud, D. Delforge, M. Cubizolles, I. Laurendeau, and P. Bedossa, "Identification of a new marker of hepatocellular carcinoma by serum protein profiling of patients with chronic liver diseases," Hepatology, vol. 41, pp. 40-7, 2005.
[18] H. W. Ressom, R. S. Varghese, and E. Orvisky, et al., “Analysis of MALDI-TOF serum profiles for biomarker selection and sample classification,” in Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), November 2005.
[19] A. C. Sauve, T. P. Speed, and "Normalization, baseline correction and alignment of high-throughput mass spectrometry data " Proceedings of the Genomic Signal Processing and Statistics workshop, Baltimore, MD, USA., May 26-27, 2004.
[20] R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Koong, QT. Le, “Sample classification from protein mass spectrometry, by 'peak probability contrasts',” Bioinformatics, 2004 Nov 22;20(17):3034-44.
[21] R.J.O. Torgrip, M.Aberg, B. Karlberg, and S.P. Jacobsson, “Peak alignment using reduced set mapping,” J. Chemometrics, 17, 573-582, 2003
[22] M. Wagner, D. Naik, A. Pothen, “Protocols for disease classification from mass spectrometry data,” Proteomics, 2003 Sep;3(9):1692-8
[23] B. Williams, S. Cornett, A. Crecelius, R. Caprioli, B. Dawant, and B. Bodenheimer, “An algorithm for baseline correction of MALDI mass spectra,” in Proceedings of the 43rd ACM Southeast Conference (ACMSE '05), March 2005.
[24] W. Yu, B. Wu, N. Lin, K. Stone, K. Williams, H. Zhao, “Detecting and aligning peaks in mass spectrometry data with applications to MALDI,” Computational Biology and Chemistry 30(1): 27-38 (2006).
[25] Y. Yasui, D. McLerran, BL. Adam, M. Winget, M. Thornquist, Z. Feng, ” An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers,” Journal of Biomedicine and Biotechnology, 2003(4):242-248.
[26] Y. Yasui, M. Pepe, ML. Thompson, BL. Adam, GL. Wright, Y. Qu, JD. Potter, M. Winget, M. Thornquist, Z. Feng, “A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection,” Biostatistics, 2003 Jul;4(3):449-63.
[27] W. Yu, X. Li, J. Liu, B. Wu, KR. Williams, H. Zhao, “Multiple peak alignment in sequential data analysis: a scale-space-based approach,” IEEE/ACM Trans Comput Biol Bioinform. 2006 Jul-Sep;3(3):208-19.
[28] Z. Zhang, R. C. Bast, Jr., Y. Yu, J. Li, L. J. Sokoll, A. J. Rai, J. M. Rosenzweig, B. Cameron, Y. Y. Wang, X. Y. Meng, A. Berchuck, C. Van Haaften-Day, N. F. Hacker, H. W. de Bruijn, A. G. van der Zee, I. J. Jacobs, E. T. Fung, and D. W. Chan, "Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer," Cancer Res, vol. 64, pp. 5882-90, 2004.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2010-08-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2012-08-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw