• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Anton Zeilinger
    • scilog Magazine
    • Awards
      • FWF Wittgenstein Awards
      • FWF START Awards
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • Elise Richter
        • Elise Richter PEEK
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
        • Accounting for Approved Funds
        • Labor and Social Law
        • Project Management
      • Project Phase Ad Personam
        • Accounting for Approved Funds
        • Labor and Social Law
        • Project Management
      • Expiring Programs
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open Access Policy
          • Open Access Policy for Peer-Reviewed Publications
          • Open Access Policy for Peer-Reviewed Book Publications
          • Open Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • Twitter, external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Structured and Continuous Reinforcement Learning

Structured and Continuous Reinforcement Learning

Ronald Ortner (ORCID: 0000-0001-6033-2208)
  • Grant DOI 10.55776/P26219
  • Funding program Principal Investigator Projects
  • Status ended
  • Start April 10, 2014
  • End May 9, 2016
  • Funding amount € 130,536
  • Project website
  • E-mail

Disciplines

Computer Sciences (50%); Mathematics (50%)

Keywords

    Reinforcement Learning, Regret Analysis, Computational Learning Theory

Abstract Final report

In reinforcement learning, an agent tries to learn optimal behavior in an unknown environment by evaluating feedback usually some quantifiable and comparable reward to his actions. As Theo learner`s actions may pay off not immediately, he must be able to learn also from delayed feedback, for example by accepting short-term discouraging feedback to achieve a long-term goal giving large positive feedback. Thus, in typical reinforcement learning applications like robotics, control, or game playing, Theo learner will get rewarding feedback only when a given task is finished after a series of coordinated actions which individually give no or even misleading feedback. While various reinforcement learning algorithms have been developed, these methods have been denied a major breakthrough in practice. One of Theo major problems with application of reinforcement learning algorithms to real world problems is that typical algorithms are not efficient in large domains. Thus, while many potential applications could be handled by reinforcement learning algorithms in principle, from Theo practical point of view they are too costly, as their complexity and regret (Theo total lost reward with respect to an optimal strategy) grow linearly or even polynomially with Theo size of Theo underlying domain. One of Theo reasons for this is that unlike humans reinforcement learning algorithms usually are not able to exploit similarities and structures in Theo domain of a problem. In a precursor project, together with scientists from Theo SequeL team at Inria Lille, an interdisciplinary center for reinforcement learning, we were able to define very general similarity structures for reinforcement learning problems in finite domains and to achieve improved theoretical regret bounds when Theo underlying similarity structure is known. Theo developed techniques and algorithms also led to Theo first theoretical regret bounds for reinforcement learning in continuous domains. Theo proposed project wants to take Theo research on continuous reinforcement learning a setting which is of particular importance for applications a step further, not only by improving over Theo known bounds, but also by Theo development of efficient algorithms. Moreover, we also want to investigate in more general settings where Theo learner does not have direct access to Theo domain information, but only to a set of possible models. Also for this setting, Theo precursor project has produced first theoretical results, assuming finite domains and that Theo set of possible models contains Theo correct model. In Theo proposed project, we aim at generalizing this to infinite domains and loosening Theo assumption on Theo model set, which shall not necessarily contain Theo correct model, but only a good approximation of it.

In reinforcement learning a learner wants to learn optimal behavior in an unknown environment. For example, the goal of the learner could be to reach a certain location or state, or to solve a complex task. The learning process itself is governed only by feedback of the environment. That is, the learner can observe the reaction of the environment to his actions and e.g. obtains a reward for solving a given task. Since the solution of a task may require the execution of a longer sequence of coordinated actions, the learner must be able to learn also from delayed feedback, for example by accepting short-term discouraging feedback to achieve a long-term goal giving high reward. Problem settings of this kind are in principle solvable by existing reinforcement learning algorithms, which can even be shown theoretically to be able to solve any task, provided that the task has certain properties (like that it is possible to recover from mistakes). However, at the same time these algorithms are hardly applicable to real world problems. This is mainly due to the fact that the representation of even the simplest problems gives rise to huge state spaces, so that algorithms cannot solve these problems in reasonable time. In the project at hand we managed to develop reinforcement learning algorithms for problems with continuous state space, which are of particular importance in the context of applications but for which there have been only few theoretical results available so far. It could be shown that in environments that behave nicely the new algorithm can provably learn faster than known algorithms. Another question that was dealt with in the project was whether a learning algorithm can learn to use simpler representations in the learning process. More precisely, the learner is given a set of possible representations, some of which are suitable, while others can even be misleading. In this setting it could be shown that a learning algorithm developed in the project can successfully learn even if there is no completely correct representation at its disposal. Instead, it is sufficient that there is at least one representation that is a good approximation of the environment. It is particularly interesting that for successful learning it is not necessary to identify this representation, which can be more difficult and sometimes is even impossible.

Research institution(s)
  • Montanuniversität Leoben - 100%
International project participants
  • Remi Munos, Inria Lille - Nord Europe - France
  • Jan Peters, Technische Universität Darmstadt - Germany

Research Output

  • 31 Citations
  • 9 Publications
Publications
  • 2015
    Title Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning.
    Type Journal Article
    Author Lakshmanan K
    Journal JMLR Workshop and Conference Proceedings Volume 37: Proceedings of The 32nd International Conference on Machine Learning, ICML 2015.
  • 2014
    Title Regret bounds for restless Markov bandits
    DOI 10.1016/j.tcs.2014.09.026
    Type Journal Article
    Author Ortner R
    Journal Theoretical Computer Science
    Pages 62-76
    Link Publication
  • 2016
    Title Improved Learning Complexity in Combinatorial Pure Exploration Bandits.
    Type Journal Article
    Author Bartlett P Et Al
    Journal JMLR Workshop and Conference Proceedings Volume 51: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016.
  • 2016
    Title Pareto Front Identification from Stochastic Bandit Feedback.
    Type Journal Article
    Author Auer P
    Journal JMLR Workshop and Conference Proceedings Volume 51: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016.
  • 2016
    Title An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits.
    Type Journal Article
    Author Auer P
    Journal JMLR Workshop and Conference Proceedings: Proceedings of the 29th Conference on Learning Theory, COLT 2016
  • 2014
    Title Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
    DOI 10.1007/978-3-319-11662-4_11
    Type Book Chapter
    Author Ortner R
    Publisher Springer Nature
    Pages 140-154
  • 2014
    Title Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
    DOI 10.48550/arxiv.1405.2652
    Type Preprint
    Author Ortner R
  • 2016
    Title Optimal Behavior is Easier to Learn than the Truth
    DOI 10.1007/s11023-016-9389-y
    Type Journal Article
    Author Ortner R
    Journal Minds and Machines
    Pages 243-252
    Link Publication
  • 2016
    Title An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
    DOI 10.48550/arxiv.1605.08722
    Type Preprint
    Author Auer P

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • Twitter, external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF