Self-Learning Search Algorithms for High-Res Mass Spectra
Self-Learning Search Algorithms for High-Res Mass Spectra
Disciplines
Biology (25%); Computer Sciences (75%)
Keywords
-
Bioinformatics,
Tandem Mass Spectrometry,
Machine Learning,
High-Resolution,
Identification Algorithms
To identify proteins in biological samples mass spectrometry (MS) is most often applied: proteins are digested to peptides which are subsequently analyzed. Within the last decade, a new generation of mass spectrometers has been developed that are capable of acquiring mass spectra with high resolution and high mass accuracy. This has significantly changed the characteristics of mass spectra; however, this development has not been accompanied by a corresponding progress in peptide identification algorithms capable of fully exploiting the available information. We therefore propose to develop a set of novel identification algorithms that are specifically designed for the analysis of modern mass spectra and incorporate multiple sources of information in the here proposed bioinformatics research project. Preliminary research results are promising: The project consortium consisting of the Proteomics Group at IMP Vienna and the Bioinformatics Research Group at FH OÖ (Campus Hagenberg) has already conducted successful joint research in the analysis of MS data: Identification rates comparable or even superior to Mascot, the current gold-standard, have been achieved using a first version of a scoring function designed by the proposing consortium. Encouraged by these preliminary research results, we are convinced that considering additional sources of information will further improve identification rates of mass spectra - therefore this project is dedicated to research on a combination of the following novel approaches: We plan to use machine learning techniques to analyze peptide elution times, fragmentation patterns and mass accuracy characteristics specific to the instrument; in addition, observed m/z values will be recalibrated based on the mass error of highly reliable identifications, and the remaining mass error with regard to the learned distribution will be incorporated into the scoring function. Sophisticated peak picking strategies will also be designed using machine learning. These improvements will help increase identification rates in challenging situations such as hybrid spectra and exhaustive searches for a wide range of post-translational modifications. The latter approach leads to exponentially growing search spaces and an accompanying drop in spectra identification rates because the information in MS spectra on its own is not sufficient to cope with the increased search space. Instead of applying brute force methods we plan to solve this problem using construction heuristics, i.e., evolutionary algorithms that realize intelligent search strategies for large numbers of unknown post-translational modifications based on a combination of database search and de novo identification. All research results achieved in this project will be published and made freely available to the bioinformatics and proteomics communities. Improving identification rates of peptides in general and of unknown modifications in particular will permit a deeper insight into the proteome; computer science shall thus form a new basis for finding answers to important medical and biological questions.
Proteins in biological samples are typically characterised by Mass Spectrometry (MS). Therefore, proteins are digested into peptides and subsequently analysed by MS. Technological development in the last decades paved the way for new generations of MS instruments which provide higher resolution and mass accuracy. Specialized algorithms, specifically tailored for the analysis of these spectra, which also take auxiliary information into account were developed within this multi-disciplinary bioinformatics research project. This leads to an improved utilization of the available data and provides more reliable protein information for biological research.One of the highlights of this project is the development of MS Amanda, which was specifically developed for the analysis of high resolution mass spectra. MS Amanda is capable of reliably identifying more peptides and proteins than Gold Standard algorithms. It was published in Journal of Proteome Research and in collaboration with Thermo Fisher Scientific - integrated in Proteome Discoverer which serves as a default data analysis software for Thermo instruments. Thereby MS Amanda is used in hundreds of reserch groups worldwide which is also reflected in numerous citations.Furthermore, artificial intelligence methods were used to characterize peptide elution times, fragmentation patterns and instrument specific characteristics of mass accuracy in order to take this information into account for peptide identification. Therefore we developed Elutator, an algorithm for validation of identification results based on a predictive elution time model. This model can also be trained and adapted for specific laboratory conditions. Together, MS Amanda and Elutator achieve more than 60% higher identification rate than conventional search strategies.In this project, researchers at IMP Vienna and the bioinformatics research group at FH OÖ, Hagenberg have developed and published algorithms. These algorithms are openly accessible to the research community and enable higher identification rates for peptides as well as unknown modifications. This allows a deeper insight into biological samples. In this context, informatics forms the basis for finding answers to key biological questions.
- Stephan M. Winkler, FH Oberösterreich , associated research partner
Research Output
- 1805 Citations
- 30 Publications
-
2018
Title Complete resolution of sister chromatid intertwines requires the Polo-like kinase Cdc5 and the phosphatase Cdc14 in budding yeast DOI 10.13130/massari-lucia-francesca_phd2018-03-26 Type Other Author Massari L Link Publication -
2016
Title Linear ubiquitination by LUBEL has a role in Drosophila heat stress response DOI 10.15252/embr.201642378 Type Journal Article Author Asaoka T Journal The EMBO Reports Pages 1624-1640 Link Publication -
2016
Title Erratum: Corrigendum: MuSK Kinase Activity is Modulated By A Serine Phosphorylation Site in The Kinase Loop DOI 10.1038/srep38271 Type Journal Article Author Camurdanoglu B Journal Scientific Reports Pages 38271 Link Publication -
2016
Title MuSK Kinase Activity is Modulated By A Serine Phosphorylation Site in The Kinase Loop DOI 10.1038/srep33583 Type Journal Article Author Camurdanoglu B Journal Scientific Reports Pages 33583 Link Publication -
2015
Title Comprehensive Cross-Linking Mass Spectrometry Reveals Parallel Orientation and Flexible Conformations of Plant HOP2–MND1 DOI 10.1021/acs.jproteome.5b00903 Type Journal Article Author Rampler E Journal Journal of Proteome Research Pages 5048-5062 Link Publication -
2015
Title Rio1 promotes rDNA stability and downregulates RNA polymerase I to ensure rDNA segregation DOI 10.1038/ncomms7643 Type Journal Article Author Iacovella M Journal Nature Communications Pages 6643 Link Publication -
2015
Title Polysialylation controls dendritic cell trafficking by regulating chemokine recognition DOI 10.1126/science.aad0512 Type Journal Article Author Kiermaier E Journal Science Pages 186-190 Link Publication -
2018
Title Structural prediction of protein models using distance restraints derived from cross-linking mass spectrometry data DOI 10.1038/nprot.2017.146 Type Journal Article Author Orbán-Németh Z Journal Nature Protocols Pages 478-494 Link Publication -
2017
Title PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search DOI 10.1021/acs.jproteome.7b00563 Type Journal Article Author Dorl S Journal Journal of Proteome Research Pages 290-295 -
2017
Title The Haystack Is Full of Needles: Technology Rescues Sugars! DOI 10.1016/j.molcel.2017.11.024 Type Journal Article Author Cummings R Journal Molecular Cell Pages 827-829 Link Publication -
2017
Title Comparative glycoproteomics of stem cells identifies new players in ricin toxicity DOI 10.1038/nature24015 Type Journal Article Author Stadlmann J Journal Nature Pages 538-542 Link Publication -
2018
Title Author Correction: Structural prediction of protein models using distance restraints derived from cross-linking mass spectrometry data DOI 10.1038/s41596-018-0024-7 Type Journal Article Author Orbán-Németh Z Journal Nature Protocols Pages 1724-1724 Link Publication -
2018
Title CharmeRT: Boosting Peptide Identifications by Chimeric Spectra Identification and Retention Time Prediction DOI 10.1021/acs.jproteome.7b00836 Type Journal Article Author Dorfer V Journal Journal of Proteome Research Pages 2581-2589 Link Publication -
2018
Title Analysis of PNGase F-Resistant N-Glycopeptides Using SugarQb for Proteome Discoverer 2.1 Reveals Cryptic Substrate Specificities DOI 10.1002/pmic.201700436 Type Journal Article Author Stadlmann J Journal PROTEOMICS Pages 1700436 Link Publication -
2018
Title Optimized fragmentation improves the identification of peptides cross-linked using MS-cleavable reagents DOI 10.1101/476051 Type Preprint Author Stieger C Pages 476051 Link Publication -
2018
Title N-terminal ß-strand underpins biochemical specialization of an ATG8 isoform DOI 10.1101/453563 Type Preprint Author Zess E Pages 453563 Link Publication -
2018
Title apQuant: Accurate Label-Free Quantification by Quality Filtering DOI 10.1021/acs.jproteome.8b00113 Type Journal Article Author Doblmann J Journal Journal of Proteome Research Pages 535-541 -
2020
Title Autophagy mediates temporary reprogramming and dedifferentiation in plant somatic cells DOI 10.15252/embj.2019103315 Type Journal Article Author Rodriguez E Journal The EMBO Journal Link Publication -
2020
Title ANGEL2 is a member of the CCR4 family of deadenylases with 2',3'-cyclic phosphatase activity DOI 10.1126/science.aba9763 Type Journal Article Author Pinto P Journal Science Pages 524-530 -
2019
Title Optimized Fragmentation Improves the Identification of Peptides Cross-Linked by MS-Cleavable Reagents DOI 10.1021/acs.jproteome.8b00947 Type Journal Article Author Stieger C Journal Journal of Proteome Research Pages 1363-1370 Link Publication -
2019
Title Autophagy mediates temporary reprogramming and dedifferentiation in plant somatic cells DOI 10.1101/747410 Type Preprint Author Rodriguez E Pages 747410 Link Publication -
2019
Title N-terminal ß-strand underpins biochemical specialization of an ATG8 isoform DOI 10.1371/journal.pbio.3000373 Type Journal Article Author Zess E Journal PLOS Biology Link Publication -
2014
Title Regulation of Gene Expression through a Transcriptional Repressor that Senses Acyl-Chain Length in Membrane Phospholipids DOI 10.1016/j.devcel.2014.04.025 Type Journal Article Author Hofbauer H Journal Developmental Cell Pages 729-739 Link Publication -
2014
Title Jagunal homolog 1 is a critical regulator of neutrophil function in fungal host defense DOI 10.1038/ng.3070 Type Journal Article Author Wirnsberger G Journal Nature Genetics Pages 1028-1033 Link Publication -
2014
Title Deep and Precise Quantification of the Mouse Synaptosomal Proteome Reveals Substantial Remodeling during Postnatal Maturation DOI 10.1021/pr500456t Type Journal Article Author Moczulska K Journal Journal of Proteome Research Pages 4310-4324 -
2014
Title MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra DOI 10.1021/pr500202e Type Journal Article Author Dorfer V Journal Journal of Proteome Research Pages 3679-3684 Link Publication -
2015
Title Quantitative Phosphoproteomics of the Ataxia Telangiectasia-Mutated (ATM) and Ataxia Telangiectasia-Mutated and Rad3-related (ATR) Dependent DNA Damage Response in Arabidopsis thaliana *[S] DOI 10.1074/mcp.m114.040352 Type Journal Article Author Roitinger E Journal Molecular & Cellular Proteomics Pages 556-571 Link Publication -
2013
Title Aurora B and Cdk1 mediate Wapl activation and release of acetylated cohesin from chromosomes by phosphorylating Sororin DOI 10.1073/pnas.1305020110 Type Journal Article Author Nishiyama T Journal Proceedings of the National Academy of Sciences Pages 13404-13409 Link Publication -
2013
Title Optimized Nonlinear Gradients for Reversed-Phase Liquid Chromatography in Shotgun Proteomics DOI 10.1021/ac401145q Type Journal Article Author Moruz L Journal Analytical Chemistry Pages 7777-7785 Link Publication -
2015
Title A Symbolic Regression Based Scoring System Improving Peptide Identifications for MS Amanda DOI 10.1145/2739482.2768509 Type Conference Proceeding Abstract Author Dorfer V Pages 1335-1341