• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Birgit Mitter
      • Oliver Spadiut
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • Alternative Methods to Animal Testing
        • European Partnership BE READY
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • LUKE – Ukraine
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Korea
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol-South Tyrol-Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

InSitu - Integrated Situated Visual Scene and Natural Language Understanding for Human Robot Interaction

InSitu - Integrated Situated Visual Scene and Natural Language Understanding for Human Robot Interaction

Michael Zillich (ORCID: )
  • Grant DOI 10.55776/TRP139
  • Funding program Translational Research
  • Status ended
  • Start March 1, 2011
  • End February 28, 2015
  • Funding amount € 369,306

Disciplines

Electrical Engineering, Electronics, Information Engineering (20%); Computer Sciences (50%); Linguistics and Literature (30%)

Keywords

    Computer Vision, Cognitive Systems, Natural language Understanding, Integration, Robotics

Abstract Final report

Recent years have seen major advances in personal and assistive robots (from household robots to robots for elder care). Yet we are still far away from truly natural human-robot interaction in everyday situations and environments. Robust visual scene understanding and natural language understanding on robots are currently two of the major road blocks. We believe that this is in part due to the fact that these two are often treated separately. Note that typical dialogue between humans situated in the same scene is full of cases where vision and dialogue are used jointly to ground a common understanding. Humans tend to look towards an object that they are currently referring to in dialogue, guiding the attention of the dialogue partner. Similarly object attributes extracted from (even partly) parsed utterances like "Could you hand me the red ...`` will guide the search for the respective object. Vice versa, visually observing a scene that is being talked about supports understanding of ambiguous or underspecified utterances while they are being processed - "the red book on the floor`` will most likely refer to a book visible to the speaker and not the one behind her back. So vision and natural language processing can mutually and incrementally constrain each other. In this project, we will tackle the problem of tightly integrating visual scene understanding with natural language understanding. We believe that for robots to reach human-like performance in natural interactions, the vision, natural language, and also action subsystems of the robotic architecture need to be very tightly integrated to be able to mutually constrain each other. This, in turn, requires concurrent processing of vision, language, and actions where all algorithms must be interruptible and able to incorporate new information incrementally on the fly. It also requires a software framework that allows seamless integration of components and algorithms at a fine temporal granularity. By providing such a tightly integrated system, robots will be able to detect objects faster and more reliably, resolve referential expressions of perceivable referents and ambiguous references more quickly, carry out intended actions more quickly, and achieve much more natural dialogues with humans in everyday environments.

The InSitu project looked at the tight integration of machine vision and natural language understanding. Both of these are tough problems in their own right, especially in the context of autonomous robots, which perform tasks in everyday environments, such as put the yellow cup on the table on the shelf. A human given that task would already, while that sentence is being uttered, follow the gaze or gesture of the speaker and look for something yellow.In InSitu these two sub problems were therefore treated together within a common framework. Cues from (even partially) understood utterances help image understanding by focusing analysis on the relevant parts of the image, e.g., yellow objects in the above example. The visual analysis of a complete complex 3-dimensional scene can be quite time- consuming (many seconds even on current computers). We could show that using situation aware attentional mechanisms together with incremental processing methods the detection of relevant objects in the scene could be significantly sped up. Depending on situation and given task different attention channels are used, e.g. colour, shape or position (... the left cup...).Vice versa visual cues from the scene feed back into language understanding. We could show how to learn object categories in a single shot from utterances such as a medkit is a white box with a red cross on it and verbally describable visual features (cross, red). These learned classes generalize to qualitatively similar objects and allow considerations such asThe cross is green is that still a medkit?. Such semantically deep representations thus go beyond the typical statistical methods, which learn similarity from a large number of examples, without being able to explicitly explain where that similarity lies.Especially important features in this context are those that are functionally relevant, so called affordances, like handles (to pick up) or cavities (the inside of a cup, as a container). In InSitu we developed a taxonomy and systematic approach to describe everyday objects in terms of their affordances, and do detect those affordances in 3D scenes.The methods developed within InSitu were evaluated on 3 different robot platforms at the project partners TU Wien and Tufts University / Boston.

Research institution(s)
  • Technische Universität Wien - 100%

Research Output

  • 192 Citations
  • 28 Publications
Publications
  • 2015
    Title What We Can Learn From the Primate’s Visual System
    DOI 10.1007/s13218-014-0345-9
    Type Journal Article
    Author Krüger N
    Journal KI - Künstliche Intelligenz
    Pages 9-18
  • 2014
    Title 4D Space-Time Mereotopogeometry-Part Connectivity Calculus for Visual Object Representation
    DOI 10.1109/icpr.2014.740
    Type Conference Proceeding Abstract
    Author Varadarajan K
    Pages 4316-4321
  • 2015
    Title Saliency-Based Object Discovery on RGB-D Data with a Late-Fusion Approach
    DOI 10.1109/icra.2015.7139441
    Type Conference Proceeding Abstract
    Author Garcíal G
    Pages 1866-1873
  • 2011
    Title Learning What Matters: Combining Probabilistic Models of 2D and 3D Saliency Cues
    DOI 10.1007/978-3-642-23968-7_14
    Type Book Chapter
    Author Potapova E
    Publisher Springer Nature
    Pages 132-142
  • 2012
    Title Attention-driven Segmentation of Cluttered 3D Scenes.
    Type Conference Proceeding Abstract
    Author Potapova E
    Conference Proc. of the 21st Int. Conf. on Pattern Recognition (ICPR), Tsukuba, Japan, 2012
  • 2012
    Title Web Mining Driven Object Locality Knowledge Acquisition for Efficient Robot Behavior.
    Type Conference Proceeding Abstract
    Author Vincze M Et Al
  • 2012
    Title My Robot is Smarter than Your Robot - On the Need for a Total Turing Test for Robots.
    Type Conference Proceeding Abstract
    Author Zillich M
    Conference AISB/IACAP Symposium - Revisiting Turing and his Test: Comprehensiveness, Qualia, and the Real World, Birmingham, UK, 2012
  • 2012
    Title Robust Multiple Model Estimation with Jensen-Shannon Divergence.
    Type Conference Proceeding Abstract
    Author Vincze M Et Al
    Conference Proc. of the 21st Int. Conf. on Pattern Recognition (ICPR), Tsukuba, Japan, 2012, 4 p, ISBN
  • 2012
    Title Web Mining Driven Object Locality Knowledge Acquisition for Efficient Robot Behavior
    DOI 10.1109/iros.2012.6385931
    Type Conference Proceeding Abstract
    Author Zhou K
    Pages 3962-3969
  • 2014
    Title Learning of perceptual grouping for object segmentation on RGB-D data
    DOI 10.1016/j.jvcir.2013.04.006
    Type Journal Article
    Author Richtsfeld A
    Journal Journal of Visual Communication and Image Representation
    Pages 64-73
    Link Publication
  • 2014
    Title Incremental Attention-Driven Object Segmentation**The research leading to these results has received funding from the Austrian Science Fund (FWF) under grant agreement No. TRP 139-N23 InSitu and from the European Community's Seventh Framework Pro
    DOI 10.1109/humanoids.2014.7041368
    Type Conference Proceeding Abstract
    Author Potapova E
    Pages 252-258
  • 2014
    Title Learning to Recognize Novel Objects in One Shot through Human-Robot Interactions in Natural Language Dialogues.
    Type Conference Proceeding Abstract
    Author Karuse E
    Conference Twenty-Eighth Conference on Artificial Intelligence (AAAI)
  • 2014
    Title Incremental Attention-driven Object Segmentation.
    Type Conference Proceeding Abstract
    Author Potapova E
  • 2013
    Title Incrementally Biasing Visual Search Using Natural Language Input.
    Type Conference Proceeding Abstract
    Author Krause E
    Conference Proc. of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
  • 2013
    Title Probabilistic Cue Integration for Real-Time Object Pose Tracking
    DOI 10.1007/978-3-642-39402-7_26
    Type Book Chapter
    Author Prankl J
    Publisher Springer Nature
    Pages 254-263
  • 2013
    Title Gaussian-weighted Jensen–Shannon divergence as a robust fitness function for multi-model fitting
    DOI 10.1007/s00138-013-0513-1
    Type Journal Article
    Author Zhou K
    Journal Machine Vision and Applications
    Pages 1107-1119
    Link Publication
  • 2013
    Title Advances in real-time object tracking - Extensions for robust object tracking with a Monte-Carlo particle filter.
    Type Journal Article
    Author Mörwald T
  • 2013
    Title Geometric data abstraction using B-splines for range image segmentation
    DOI 10.1109/icra.2013.6630569
    Type Conference Proceeding Abstract
    Author Morwald T
    Pages 148-153
  • 2013
    Title Spatial Structure Analysis for Autonomous Robotic Vision Systems
    DOI 10.1109/worv.2013.6521933
    Type Conference Proceeding Abstract
    Author Zhou K
    Pages 165-170
  • 2013
    Title Advances in real-time object tracking
    DOI 10.1007/s11554-013-0388-4
    Type Journal Article
    Author Mörwald T
    Journal Journal of Real-Time Image Processing
    Pages 683-697
    Link Publication
  • 2014
    Title What Vision Can, Can’t and Should Do
    DOI 10.1007/978-3-319-06614-1_9
    Type Book Chapter
    Author Zillich M
    Publisher Springer Nature
    Pages 119-131
  • 2014
    Title Attention-Driven Object Detection and Segmentation of Cluttered Table Scenes using 2.5D Symmetry
    DOI 10.1109/icra.2014.6907584
    Type Conference Proceeding Abstract
    Author Potapova E
    Pages 4946-4952
  • 2014
    Title From Animals to Robots and Back: Reflections on Hard Problems in the Study of Cognition, A Collection in Honour of Aaron Sloman
    DOI 10.1007/978-3-319-06614-1
    Type Book
    editors Wyatt J, Petters D, Hogg D
    Publisher Springer Nature
  • 2013
    Title 3D Information as a Way to Improve the Quality of Attention Points.
    Type Conference Proceeding Abstract
    Author Potapova E
    Conference Proc. of the Austrian Robotics Workshop, Graz, 2013
  • 2013
    Title Anytime Perceptual Grouping of 2D Features into 3D Basic Shapes
    DOI 10.1007/978-3-642-39402-7_8
    Type Book Chapter
    Author Richtsfeld A
    Publisher Springer Nature
    Pages 73-82
  • 2013
    Title A Pilot Study on Eye-tracking in 3D Search Tasks.
    Type Conference Proceeding Abstract
    Author Pirri F Et Al
    Conference Workshop on Solutions for Automatic Gaze Data Analysis (SAGA), Bielefeld, 2013
  • 2011
    Title Language-modulated attention and its tight coupling to visual processes (poster).
    Type Conference Proceeding Abstract
    Author Potapova E
    Conference Rovereto Attention Workshop: Attention and Objects, 2011
  • 2013
    Title Local 3D Symmetry for Visual Saliency in 2.5D Point Clouds
    DOI 10.1007/978-3-642-37331-2_33
    Type Book Chapter
    Author Potapova E
    Publisher Springer Nature
    Pages 434-445

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF