A multimodal speech interface for accessing web pages
A multimodal speech interface for accessing web pages
Disciplines
Computer Sciences (90%); Linguistics and Literature (10%)
Keywords
-
ARTIFICIAL INTELLIGENCE,
MULTIMODAL INTERFACES,
LANGUAGE ENGINEERING,
WEB INTERFACES
While in everyday life, humans communicate with their environment by language, spoken as well as written, supported by signs and gestures, human-computer interaction is still far behind. Currently, graphical user interfaces (GUI) are a defacto standard, the logical next step however is to move toward an even richer and more natural interaction by integrating communication via language. A prominent example for this need is demonstrated by the World Wide Web (WWW) which is of growing importance in everyday life. While developers of WWW pages may use any combination of text, audio, image, and video in their presentation to address the user - thus fully exploiting the multimedia possibilities of the web - the users` reaction is much more limited in being restricted mainly to point-and-click operations. Complex types of interaction, however, cannot be handled by mouse clicking and typing simple phrases alone. By adding better language capabilities, the gap between navigation and interaction in a communicative setting can be bridged. Language-based queries provide also the advantage of reaching through the hypertext structure directly to the required (textual) information. This frees the user from a dependence on the document structure offered by the content provider, which is advantageous because often, users` and content providers` intentions may differ. At the same time, the user is not restricted to any predefined wording of the query. The proposed project will show new ways of integrating speech and language with classical access methods, and investigate the respective shortcomings and advantages of different combinations. Since due to the limitations of current speech recognition technology, a generic speech-driven web browser is still ahead of the current technological possibilities, we propose as a testbed application a system for providing access to German-speaking newspapers that are available on-line. There are three focal areas of research where new contributions are expected: First, empirically founded pre-design studies will provide important insights about the role of speech in a multimodal system for accessing web pages. Second, research in this project will yield insights concerning mechanisms for analyzing spoken utterances in the context of a multimodal environment. While speech recognition for spontaneous speech usually results in high word error rates, it is to be expected that the background knowledge derived from the state of the interface will help in selecting the intended utterance. Finally, a prototypical access system to the WWW which includes text and speech input will be developed which allows for queries addressing browsing functionality, structure and content. In combination with the empirical research undertaken, this system will provide insights into usability thereby giving important cues for the design of systems featuring multimodal interaction.