• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Evolution and Function of the Environmental Protein Sequence Universe

Evolution and Function of the Environmental Protein Sequence Universe

Thomas Rattei (ORCID: 0000-0002-0592-7791)
  • Grant DOI 10.55776/P27703
  • Funding program Principal Investigator Projects
  • Status ended
  • Start April 1, 2015
  • End March 31, 2020
  • Funding amount € 309,960
  • Project website

Disciplines

Computer Sciences (100%)

Keywords

    Bioinformatics, Clustering, Computational biology, Network analysis, Protein sequencing analysis, PVC superphylum

Abstract Final report

Protein sequences are generated in large quantities by DNA sequencing and represent one of the most important reservoirs of molecular biological data. Protein sequences point to the molecular functions and biological roles of their gene products through blueprints of the function and structure of their encoded proteins and their connected evolutionary relationships. During the last decade, the sequencing of metagenomes directly from environmental samples without cultivation has significantly expanded the known protein sequence universe. However, the environmental protein universe is still mainly unstructured and awaits specific utilization in computational biology; although, hundreds of metagenomes have been deeply sequenced and thereby account for the majority of protein sequences stored in databases. The central aim of this proposal is investigating the fundamental evolutionary structures behind the environmental protein sequences previously obtained. We will cluster the entire protein sequence universe, including metagenomes, into evolutionary related families. Based on established concepts, such as orthology or protein domains, this project will develop novel clustering methods for large protein networks. Based on this large-scale evolutionary reconstruction, we will investigate the function of protein families in the environmental protein sequence universe. We will comprehensively determine the relative abundances of protein families in different environments. We expect to discover many associations that will not only link known protein families to specific habitat types but will also establish connections between families of unknown function and the environment. The abundance matrix of protein families in different environments will be further studied with respect to the predictive power of environmental co-occurrence profiles for the prediction of functional interactions between protein families. We expect to develop a novel method that will significantly extend current principles for the prediction of protein interactions. In a case study, we will utilize the structured environmental protein sequences universe to investigate the phylogenetic and ecological diversity of the monophyletic PCV superphylum (Planctomycetes, Verrucomicrobia, Chlamydiae, Lentisphaerae, etc.), a bacterial clade with exceptional physiologies and major medical, ecological and biotechnological importance. Although this proposal is mainly focused on fundamental biological questions, it also comprises broader aspects such as developing novel and universal methods and resources in computational biology as well as improving our knowledge about biotechnologically and medically important bacteria.

This project investigates the architecture of the protein sequence universe. Proteins are essential biomolecules for structure and function of all cellular organisms as well as viruses. DNA sequencing generates a massive stream of molecular biological data. Protein sequences are inferred from these data, and determine structure and function of their gene products. In combination with connected evolutionary relationships these sequences point to molecular functions and biological roles of proteins. The entirety of proteins is referred to as the protein universe. Massive efforts in metagenomic projects sequencing DNA directly from environmental samples without cultivation steps expanded the known protein sequence universe markedly. While data from metagenomic studies now dominate protein databases, these data are still unstructured and not efficiently used to a large extent. The main goal of this project is investigating the fundamental evolutionary structures of the environmental protein sequence universe. Building upon established concepts, such as orthology or sequence similarity, new methods for organizing the protein sequence universe were analyzed. Particular focus was on specific general phenomena in similarity networks of high-dimensional data, which occur not only in protein sequence networks, but also in natural language processing, or automatic music recommendation systems. A large variety of concepts and methods from computational biology and machine learning are refined and applied, resulting in general insights into the structure of the protein sequence universe, as well as the development of a repertoire of algorithms, methods, and tools for the efficient utilization of high-dimensional data. Their use is not limited to biology, because the underlying concepts apply to all domains dealing with high-dimensional spaces. The developed methods are, therefore, relevant to multiple scientific domains and technical disciplines. This project also delivered results that suggest further studies considering deep learning for protein sequence vector representations. Such representations in combination with approximate neighbor search algorithms could resolve the problem of computational bottlenecks due to expensive similarity search in ever-growing sequence databases.

Research institution(s)
  • ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence - 34%
  • Universität Wien - 66%
Project participants
  • Arthur Flexer, ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence , associated research partner
International project participants
  • Christian Von Mering, University of Zurich - Switzerland

Research Output

  • 30061 Citations
  • 13 Publications
  • 1 Software
  • 1 Fundings
Publications
  • 2020
    Title scikit-hubness: Hubness Reduction and Approximate Neighbor Search
    DOI 10.21105/joss.01957
    Type Journal Article
    Author Feldbauer R
    Journal Journal of Open Source Software
    Pages 1957
    Link Publication
  • 2020
    Title SciPy 1.0: fundamental algorithms for scientific computing in Python
    DOI 10.1038/s41592-019-0686-2
    Type Journal Article
    Author Virtanen P
    Journal Nature Methods
    Pages 261-272
    Link Publication
  • 2019
    Title Deep learning for extremely fast protein similarity search
    Type Conference Proceeding Abstract
    Author Feldbauer R
    Conference Austrian High Performance Computing Meeting 2019
    Link Publication
  • 2019
    Title scikit-hubness: Hubness Reduction and Approximate Neighbor Search
    DOI 10.48550/arxiv.1912.00706
    Type Preprint
    Author Feldbauer R
  • 2020
    Title DeepNOG: fast and accurate protein orthologous group assignment
    DOI 10.1093/bioinformatics/btaa1051
    Type Journal Article
    Author Feldbauer R
    Journal Bioinformatics
    Pages 5304-5312
    Link Publication
  • 2016
    Title ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes
    DOI 10.1093/bioinformatics/btw393
    Type Journal Article
    Author Weinmaier T
    Journal Bioinformatics
    Pages 3327-3329
    Link Publication
  • 2016
    Title An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection
    DOI 10.1109/icdmw.2016.0106
    Type Conference Proceeding Abstract
    Author Flexer A
    Pages 716-723
  • 2016
    Title Centering Versus Scaling for Hubness Reduction
    DOI 10.1007/978-3-319-44778-0_21
    Type Book Chapter
    Author Feldbauer R
    Publisher Springer Nature
    Pages 175-183
  • 2018
    Title Fast Approximate Hubness Reduction for Large High-Dimensional Data
    DOI 10.1109/icbk.2018.00055
    Type Conference Proceeding Abstract
    Author Feldbauer* R
    Pages 358-367
  • 2018
    Title Protein vector representations for fast similarity search
    Type Conference Proceeding Abstract
    Author Feldbauer R
    Conference German Conference on Bioinformatics 2018
    Link Publication
  • 2015
    Title The Unbalancing Effect of Hubs on K-Medoids Clustering in High-Dimensional Spaces
    DOI 10.1109/ijcnn.2015.7280303
    Type Conference Proceeding Abstract
    Author Schnitzer D
    Pages 1-8
  • 2018
    Title A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
    DOI 10.1007/s10115-018-1205-y
    Type Journal Article
    Author Feldbauer R
    Journal Knowledge and Information Systems
    Pages 137-166
    Link Publication
  • 2015
    Title EffectiveDB—updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems
    DOI 10.1093/nar/gkv1269
    Type Journal Article
    Author Eichinger V
    Journal Nucleic Acids Research
    Link Publication
Software
  • 2020 Link
    Title SciPy 1.0
    Link Link
Fundings
  • 2018
    Title NVIDIA GPU Grant Program
    Type Capital/infrastructure (including equipment)
    Start of Funding 2018

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF