Multimedia and User Credibility Knowledge Extraction
Multimedia and User Credibility Knowledge Extraction
ERA-Net: CHIST ERA
Disciplines
Computer Sciences (80%); Mathematics (20%)
Keywords
-
Multimedia,
User Credibility,
Concept Similarity,
Information Fusion
Web3.0 has already appeared in the public vocabulary over 5 years ago. While its definition remains unclear, what has become clear in the last half decade is that the web has become a support for social media. Directly from cameras, phones, tablets or computers, users are pushing multimedia data towards their peers and the world at large. MUCKE addresses this stream of multimedia social data with new and reliable knowledge extraction models designed for multilingual and multimodal data shared on social networks. It departs from current knowledge extraction models, which are mainly quantitative, by giving a high importance to the quality of the processed data, in order to protect the user from an avalanche of equally topically relevant data. It does so using two central innovations: automatic user credibility estimation for multimedia streams and adaptive multimedia concept similarity. Credibility models for multimedia streams are a highly novel topic, which will be cast as a multimedia information fusion task and will constitute the main scientific contribution of the project. Adaptive multimedia concept similarity departs from existing models by creating a semantic representation of the underlying corpora and assigning a probabilistic framework to them. The utility of these two innovations will be demonstrated in an image retrieval system. Extensive evaluation will be performed in order to assess the reliability of the extracted knowledge against representative datasets. Additionally, a new, shared evaluation task focused on user credibility estimation will be proposed. The two core innovations rely on innovative text processing, image processing and fusion methods. Text processing will concentrate on tasks such as word sense disambiguation, concept recognition and anaphora resolution. Image processing will include parsimonious content description, large scale concept detection and detector robustness. Multimedia fusion will focus on a flexible combination of text and image modalities based on a probabilistic framework. All proposed methods will be designed to take advantage of the structural properties of the social networks. Particular focus will be placed on the proposition of scalable algorithms, which cope with large-scale, heterogeneous data. The consortium is formed of four partners, three universities and one research institute with complementary competences that cover the scientific domains associated to the project. Together, in MUCKE, they will introduce new models for processing noisy multimodal and multilingual data that will constitute the base for innovative services.
What is credible information online? Do computers understand what they do? These are fundamental questions that constituted the foundations of the scientific investigations of the MUCKE project, an international collaboration between universities and research organisations in Austria, France, Romania, and Turkey, led by the TU Wien. Semantic text and image processing has reached a new level in the last half decade, as the artificial intelligence methods of the second half of the 20th century have seen a rebirth, under new titles such as Deep Learning or Convolutional Neural Networks. The fact is that these methods demonstrate, in some use-cases, an uncanny ability to simulate understanding. In MUCKE, by adapting these methods to a use-case very familiar to most users: photography (as exemplified by websites and services such as Flickr), we have shown that they can be used to verify the accuracy of the tags provided by the users. In extensive tests, such methods were shown to provide tags to photographs at quality levels similar to a human. This means that, as we compare different human annotators to identify those that do a better or worse job, we can now compare human and machine annotators and assess the credibility of a humans tagging actions by observing long term behaviour in comparison with a machines prediction. While this is a breakthrough, it is not immediately obvious how to extend this to other usecases.The issues in eHealth or Intellectual Property protection, where the issues of credibility and semantics are very important, present new, exciting challenges, but also high rewards: improving access to trustworthy medical information to patients, general practitioners and specialists, increasing innovation and at the same time properly rewarding innovators.
- Technische Universität Wien - 100%
- Adrian Popescu, Commissariat à l´Energie Atomique (CEA) - France
- Adrian Iftene, University Alexandru-Ioan-Cuza at Iasi - Romania
- Pinar Duygulu-Sahin, Bilkent University - Turkey
Research Output
- 184 Citations
- 34 Publications
-
0
Title Dataset: Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset (2015). Type Other -
0
Title Open Source Software: MUCKE Information Retrieval Evaluation System (2016). Type Other -
0
Title Dataset: Div400: A Social Image Retrieval Result Diversification Dataset (2014). Type Other -
0
Title Dataset: Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries (2016). Type Other -
2014
Title Exploiting health related features to infer user expertise in the medical domain. Type Conference Proceeding Abstract Author Müller H Et Al Conference Web Search Click Data workshop at WSCM, New York City, NY, USA. 2014. -
2014
Title User intent behind medical queries DOI 10.1145/2637002.2637043 Type Conference Proceeding Abstract Author Palotti J Pages 283-286 Link Publication -
2014
Title TUW@ Retrieving Diverse Social Images Task 2014. Type Conference Proceeding Abstract Author Hanbury A Et Al Conference MediaEval. 2014 -
2014
Title Insight to Hyponymy Lexical Relation Extraction in the Patent Genre Versus Other Text Genres. Type Conference Proceeding Abstract Author Anderson L Conference IPaMin@ KONVENS -
2017
Title A faceted approach to reachability analysis of graph modelled collections DOI 10.1007/s13735-017-0145-8 Type Journal Article Author Sabetghadam S Journal International Journal of Multimedia Information Retrieval Pages 157-171 Link Publication -
2015
Title Div150Cred: A social image retrieval result diversification with user tagging credibility dataset. Type Conference Proceeding Abstract Author Ionescu B Conference MMSys 2015 -
2015
Title TUW@ TREC clinical decision support track. Type Conference Proceeding Abstract Author Hanbury A Et Al Conference Proceedings of the 2014 Text Retrieval Conference. -
2015
Title On the use of statistical semantics for metadata-based social image retrieval. Type Conference Proceeding Abstract Author Lupu M Et Al Conference Proc. Content-Based Multimiedia Imaging (CBMI) 2015 -
2015
Title Leveraging Metropolis-Hastings Algorithm on Graph-based Model for Multimodal IR. Type Conference Proceeding Abstract Author Rauber A Et Al Conference GSB@SIGIR 2015 -
2015
Title Toward Optimized Multimodal Concept Indexing DOI 10.1007/978-3-319-27932-9_13 Type Book Chapter Author Rekabsaz N Publisher Springer Nature Pages 141-152 -
2015
Title How users search and what they search for in the medical domain DOI 10.1007/s10791-015-9269-8 Type Journal Article Author Palotti J Journal Information Retrieval Journal Pages 189-224 Link Publication -
2015
Title Div150Cred DOI 10.1145/2713168.2713192 Type Conference Proceeding Abstract Author Ionescu B Pages 207-212 -
2015
Title Credibility in Information Retrieval DOI 10.1561/1500000046 Type Journal Article Author Ginsca A Journal Foundations and Trends® in Information Retrieval Pages 355-475 -
2016
Title Building Evaluation Datasets for Consumer-Oriented Information Retrieval. Type Conference Proceeding Abstract Author Goeuriot L Conference Proc. of Language Resources and Evaluation Conference (LREC) 2016 -
2016
Title TUW@ TREC Clinical Decision Support Track 2015. Type Conference Proceeding Abstract Author Hanbury A Conference Proceedings of the 2015 Text Retrieval Conference. -
2016
Title Div150Multi: a social image retrieval result diversification dataset with multi-topic queries. Type Conference Proceeding Abstract Author Ionescu B Conference MMSys 2016 -
0
DOI 10.1145/2578726 Type Other -
0
DOI 10.1145/2637002 Type Other -
2015
Title TUW @ MediaEval 2015 Retrieving Diverse Social Images Task. Type Conference Proceeding Abstract Author Hanbury A Et Al Conference MediaEval 2015. -
2015
Title Retrieving Diverse Social Images at MediaEval 2015: Challenge, Dataset and Evaluation. Type Conference Proceeding Abstract Author Ionescu B Conference MediaEval 2015. -
2015
Title On the Use of Statistical Semantics for Metadata-Based Social Image Retrieval DOI 10.1109/cbmi.2015.7153634 Type Conference Proceeding Abstract Author Rekabsaz N Pages 1-4 Link Publication -
2015
Title Diagnose This If You Can DOI 10.1007/978-3-319-16354-3_62 Type Book Chapter Author Zuccon G Publisher Springer Nature Pages 562-567 -
2015
Title Evaluating User Image Tagging Credibility DOI 10.1007/978-3-319-24027-5_4 Type Book Chapter Author Ginsca A Publisher Springer Nature Pages 41-52 -
2015
Title Reachability Analysis of Graph Modelled Collections DOI 10.1007/978-3-319-16354-3_41 Type Book Chapter Author Sabetghadam S Publisher Springer Nature Pages 370-381 -
2015
Title CLEF eHealth evaluation lab 2015, task 2: Retrieving information about medical symptoms. Type Conference Proceeding Abstract Author Palotti J -
2014
Title Retrieving Diverse Social Images at MediaEval 2014: Challenge, Dataset and Evaluation. Type Conference Proceeding Abstract Author Ionescu B Conference MediaEval 2014 -
2014
Title A Combined Approach of Structured and Non-structured IR in Multimodal Domain DOI 10.1145/2578726.2578801 Type Conference Proceeding Abstract Author Sabetghadam S Pages 491-494 Link Publication -
2014
Title A review of users' search contexts for lifelogging system design DOI 10.1145/2637002.2637040 Type Conference Proceeding Abstract Author Liu Y Pages 271-274 -
2014
Title A System Framework for Concept- and Credibility-Based Multimedia Retrieval DOI 10.1145/2578726.2582624 Type Conference Proceeding Abstract Author Bierig R Pages 543-546 Link Publication -
2014
Title Which One to Choose: Random Walks or Spreading Activation? DOI 10.1007/978-3-319-12979-2_11 Type Book Chapter Author Sabetghadam S Publisher Springer Nature Pages 112-119