• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Architecture and Development of High-Quality PoS Tagger

Architecture and Development of High-Quality PoS Tagger

Harald Trost (ORCID: )
  • Grant DOI 10.55776/P16614
  • Funding program Principal Investigator Projects
  • Status ended
  • Start June 1, 2003
  • End February 28, 2007
  • Funding amount € 223,771
  • Project website

Disciplines

Computer Sciences (60%); Linguistics and Literature (40%)

Keywords

    Computational Linguistics, Constraint Grammars, Linguistic Methology, Human Language Technology, Part-of-Speech tagging, Natural Language Processing

Abstract Final report

The project aims at the development and implementation of a new architecture for a high-quality Part-of-Speech (PoS) tagging. PoS taggers resolve the ambiguity of word forms in text - at least on the level of part-of-speech (e.g., German "sieben" is ambiguous between numeral and verb) or on some finer level (e.g., the gender of German "Leiter"). Currently, two types of approaches exist: - statistical taggers, assigning each word its "most probable" reading as (automatically) learned from a tagged traning text (i.e. avoiding the usage of explicit rules of the language) - "Constraint Grammar" taggers, using explicit, linguistics-based grammar rules, which in current systems are completely hand-crafted. Both approaches have their assets as well as drawbacks. The project aims at combining these two approaches into a single tagging architecture (tagging system) where the strengths of both approach are accented while the weaknesses are mutually compensated for. Thus, the tagging architecture should be able to overcome the current quality barrier of about 93-96% reliability. Training a statistical tagger can proceed swiftly (provided a tagged training corpus is available), since the methods are well-understood and wide-spread, but methods for the efficient development of the rules for a Constraint Grammar tagger are still missing. Hence, building up such a tagger requires nowadays an extraordinary experience and skill for writing down the large number of individual, language-specific and (sometimes) complicated rules. In order to improve this situation, defining an effective methodology for the creation of rules of a Constraint Grammar tagger will constitute an important subtask of the project. Apart from these more theoretical aims, a validation / practical demonstration of the developed methodology is also due, together with an evaluation of the practical results achieved. This sums up to the following three main objectives of (and simultaneously to the three innovations in the field of PoS tagging contributed by) the project: 1. proposing and advocating a novel tagger architecture combining the statistical and the Constraint Grammar based tagging scheme into a tagging system with higher accuracy than any of its components taken alone; 2. developing a systematic method for writing rules of a Constraint Grammar tagger, together with a novel and (provably) more powerful method of their application; 3. implementing and evaluating a combined tagger for German.

Part-of-Speech (PoS) tagging describes the process of automatically labeling each word in a text with its correct PoS label. For example, the sentence "Time flies like an arrow" should be labelled like this: "Time (Noun) flies (Verb) like (Prep) an (Article) arrow (Noun)". PoS tags convey important linguistic information and many natural language processing systems use PoS tagging as a pre-processing step. Why is PoS tagging difficult? Because words are ambiguous: Time can be a verb or a noun, flies a verb or a noun and so on. State-of-the-art taggers make use of statistical knowledge gained from large corpora to disambiguate between these possibilities. They perform generally quite well, selecting about 97 times out of 100 the correct Part-of-Speech tag. But though errors are few they are sometimes embarrassing, i.e. they are errors no human would ever make. And, for many applications one would wish to get an even better performance: 99 out of 100 should be achievable. The primary goal of the project was to develop and implement a methodology for a high-quality linguistically motivated partial PoS tagger for German that avoids "embarrassing" errors. Such a tagger performs disambiguation strictly on a linguistic basis, i.e. its architecture has the following properties: - the initialization step labels each word with all its morphological readings ("PoS tags"); - the tagger proper removes all those morphological readings of a word which are (gramatically) impossible in the particular context. In the course of the project we developed a methodology to express linguistic constraints in a concise form; i.e. we concentrated on designing rules for impossible sequences of PoS tags. When applied to a sentence, these rules help eliminate any such sequences of PoS tags. Altogether we discovered about 160 such rules for German. The principal advantage of the approach is that the output of the system is fully reliable in the sense that the tagger commits no errors during its operation. But this method does not normally perform full disambiguation; full disambiguation only takes place where the linguistic knowledge employed allows for it (this corresponds to the linguistic reality - many sentences are inherently ambiguous). To achieve full disambiguation the system is complemented by a standard statistical tagger finalizing the disambiguation down to a single tag per word. Because our system reduces the number of possible PoS tags - which form the input to the statistical tagger - the overall quality of the combined system surpasses purely statistical taggers. An evaluation on a large corpus of newspaper articles demonstrated that the combined tagger incorporating our system comes close to the ideal - it almost reached 98 of 100 correct tag assignments.

Research institution(s)
  • ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence - 100%

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF