• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

C-Perform: Methods and Tools for Collocation Extraction and Performance-Oriented Parsing

C-Perform: Methods and Tools for Collocation Extraction and Performance-Oriented Parsing

Harald Trost (ORCID: )
  • Grant DOI 10.55776/P12920
  • Funding program Principal Investigator Projects
  • Status ended
  • Start December 1, 1998
  • End May 31, 2003
  • Funding amount € 180,047
  • Project website

Disciplines

Computer Sciences (75%); Mathematics (10%); Linguistics and Literature (15%)

Keywords

    COMPUTATIONAL LINGUISTICS, CORPUS-BASED NATURAL LANG.PROC, COLLOCATIONS, LEXICALIZATION, NATURAL LANGUAGE PROCESSING, PARSING

Abstract

The aim of this project is to lay the foundations for a new generation of systems that enable fast, efficient and robust natural language processing and are still sufficiently general. Based on the assumption that particular aspects of performance are grammaticalized, we pursue a novel approach to grammar where performance and competence aspects are already interleaved within the grammar model. In particular, we aim at modeling the interaction of generativity which is the distinctive feature of competence, and lexicalization which is a feature of language usage. To achieve this goal, the influence of lexicalization on generativity is studied within the phenomenon of collocations. The interaction of lexical and structural information is modeled by means of corpus-based statistical techniques. Due to the impact of generative grammar on linguistics, collocations have been regarded as a phenomenon outside the grammar. In general, reduction of grammar to competence aspects has lead to grammar models that account for the dichotomy of syntactically correct versus incorrect utterances, but ignore the fact that some of the correct analyses are more adequate than others. This emphasis on competence information leads to ambiguity - a severe problem for processing as the search space becomes large - and thus leads to fairly slow systems. Control and compilation strategies have been developed in computational linguistics to reduce ambiguity and thus gain processing effciency. These approaches are useful means to mimic performance, but do not tackle the fundamental problem. Concurrently, we have witnessed a renaissance of statistics within natural language processing. Performance aspects influence the stochastic language models as they are reflected in the language data (corpora). Likelihood replaces the true-false dichotomy which enables the processing of unrestricted text. But statistical models are linguistically poor which makes them reliable only for very restricted domains. This is where results of this project shall bring improvement. In order to come up with efficient and sufliciently general systems we need to combine statistical models with elaborate linguistic knowledge. One possibility to achieve this goal is to provide corpora with linguistically elaborate annotation schemes. Grammatical competence can also alleviate another inherent problem of statistical models. Since the number of model parameters is limited by the size of the training corpus a linguistically guided pre-selection of appropriate candidate parameters is crucial. Within the project, stochastic grammars with different degrees of lexicalization will be induced from a German newspaper corpus. Parametrization of the grammar models is guided by insights gained from corpus-based retrieval of collocations. The initial model will be trained on annotated portions of the corpus. The parameters will be systematically varied and tested in a number of parsing experiments. With parsing, an additional aspect of performance comes into play. With respect to collocation extraction, corpus pre-processing tools will be adapted in order to automatically enrich raw text with structural information required for collocation extraction. As theoretical result, the project will provide insights into the interaction of generativity and lexicalization within collocations, and as a consequence insights into the interaction of competence and performance aspects of natural language. As practical outcome, the project provides methods and tools for automatic high precision extraction of collocations from raw text, methods and tools to induce a highly lexicalized stochastic grammar model from arbitrary corpora, and a CKY-type stochastic parser parametrizable with respect to the grammar. Both, grammar model and parser are particularly designed for the requirements of robust and efficient processing of real world German text, and thus overcome the disadvantages of existing stochastic parsers for German which have largely been developed on the basis of English - a language which in contrast to German has little inflection, rigid word order and a fairly restricted amount of non-local phenomena. Interest in performance-oriented grammar models is not restricted to computational linguistics but also a topic of research in theoretical linguistics and psycholinguistics. Thus the work within the project can benefit from a broader range of research, and results achieved in the project are expected to influence research in the other fields.

Research institution(s)
  • ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence - 100%
International project participants
  • Hans Uszkoreit, Universität des Saarlandes - Germany

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF