Projectdetail

Disciplines

Computer Sciences (100%)

Keywords

Information Retrieval, Domain-specific Information Retrieval, Information Retrieval Evaluation

Abstract

Final report

Information Retrieval (IR) on theWorld WideWeb functions very effectively and efficiently. These search tools are specifically designed as multi-purpose tools, applicable in as wide an array of situations as possible. Nevertheless, all information is not equal. There are areas for which these tools are conceived too broadly to be useful: health or biomedical information, intellectual property information, social science publications, blogs, press photographs, etc. A search in one of these domains is called a domain-specific search. Such a search is specific in terms of the collection of documents indexed, search refinements arising from the domain characteristics, domain coverage specificity, types of multimodal data (e.g. images, chemical formulae) present in the documents, and the end users and their tasks. Even though many domains have similar characteristics and challenges, there is as yet no general framework for developing domain-specific search solutions, and no way of characterising a domain to allow the best approach or tools to be chosen. ADmIRE will contribute to the following areas of IR: 1. Domain-specific IR methodologies and 2. IR evaluation. For domain-specific IR, we will develop a framework consisting of a classification scheme for domain-specific problems and a protocol giving the optimal evidence-based approach to solving the problem based on the characterisation. The latter will be obtained by doing systematic reviews of IR papers. In fact, the creation of systematic reviews and hence of guidelines for domain-specific IR is an example of a (rather difficult) domain- specific search problem. We will use this problem as the central scenario for ADmIRE, working on the research and development to create semi-automated tools for IR researchers to create systematic reviews effectively. This scenario therefore simultaneously serves as a model of a domain-specific problem and as a means to find the evidence-based solution. For IR evaluation, ADmIRE will develop guidelines for future IR evaluation campaigns and publications to allow easier and more effective use of the evaluation results to guide decisions in domain-specific search system design, and develop a framework for component-based evaluation based on the workflow paradigm. The framework will being IR evaluation closer to the e-Science paradigm, allowing better sharing of experimental setups between researchers, and more flexibility in performing IR experiments with multiple configurations of components. IR (and potentially other empirically-based areas of computer science) has the advantage of already starting from a position where experimental studies are the cornerstone of the domain. ADmIRE will put together all of this experimental information, organizing it to make use of it in a specific domain, which is a very challenging and yet absolutely necessary task for the development of the field of IR.

The list of search results returned by search engines and the order in which they are presented can have huge impact on the outcomes of peoples decisions. A doctor searching for new treatments for a disease may not find a treatment if the document describing it is not ranked highly enough in the result list, leading to a patient potentially not being given the optimal treatment. An examiner in a patent office missing some prior art during a patent search and hence granting a patent may trigger legal proceedings between two companies. For this reason, it is important that search engines, particularly those used in such professional environments, are free from bias. In the ADmIRE project, we identified various search engine biases and developed methods to overcome these biases. Search engines use various heuristic equations for matching documents to queries. One potential bias is the findability bias, in which a document has particular characteristics that make the likelihood of it being returned for most queries rather small. It is possible to measure the findability bias experimentally by querying a document with a huge collection of queries and seeing how often it is returned. Here we proposed an analytical approach to measuring findability which reduces the extensive experimentation needed. A second potential bias is document length. If a search engine counts how often a query word is present in a document, then longer documents likely have the word present more often. Through recognising that a document can be long either because it talks about many topics (e.g. a collection of short stories), or because it talks about a single topic in a long- winded way (e.g. some textbooks), we were able to propose a method to reduce document length bias in search engine results. The final bias addressed is in evaluating search engines. In order to know how well a search engine functions, it is tested with a number of queries, for which people judge which documents are relevant or not knowing which documents are relevant and retrieved by the search engine means that measures of the effectiveness of a search engine can be made. However, with typically millions of documents searched, it is infeasible for the relevance of every document for every query to be manually judged. Not having all documents manually judged means that the effectiveness measures are biased in some situations. To reduce this bias, we developed approaches to automatically select a collection of documents that should be manually judged in order to reduce the bias as much as possible. Furthermore, approaches that can reduce the bias in the effectiveness measurements given an existing test collection (collection of queries, documents, and manual relevance judgements) were also developed, which allows the hundreds of existing test collections to be used to give less biased measurements.

Research institution(s)

Technische Universität Wien - 100%

International project participants

Vivien Petras, Humboldt-Universität zu Berlin - Germany
Henning Müller Zum Hagen, Universität Hamburg - Germany
Gareth Jones, Dublin City University - Ireland
Nicola Ferro, Università degli studi di Padova - Italy
Hamisch Cunningham, University of Sheffield

Research Output

266 Citations
27 Publications

Publications

Title	DASyR(IR) - Document Analysis System for Systematic Reviews (in Information Retrieval)
DOI	10.1109/icdar.2015.7333830
Type	Conference Proceeding Abstract
Author	Piroi F
Pages	591-595

Title	Report on the Evaluation-as-a-Service (EaaS) Expert Workshop
DOI	10.1145/2795403.2795416
Type	Journal Article
Author	Hopfgartner F
Journal	ACM SIGIR Forum
Pages	57-65
Link	Publication

Title	Fixed budget pooling strategies based on fusion methods
DOI	10.1145/3019612.3019692
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	919-924
Link	Publication

Title	Word Embedding Causes Topic Shifting; Exploit Global Context!
DOI	10.1145/3077136.3080733
Type	Conference Proceeding Abstract
Author	Rekabsaz N
Pages	1105-1108

Title	Visual Pool
DOI	10.1145/3077136.3084146
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	1321-1324

Title	Splitting Water
DOI	10.1145/2766462.2767749
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	103-112

Title	Verboseness Fission for BM25 Document Length Normalization
DOI	10.1145/2808194.2809486
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	385-388

Title	A systematic approach to normalization in probabilistic models
DOI	10.1007/s10791-018-9334-1
Type	Journal Article
Author	Lipani A
Journal	Information Retrieval Journal
Pages	565-596
Link	Publication

Title	Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models
DOI	10.18653/v1/p17-1157
Type	Conference Proceeding Abstract
Author	Rekabsaz N
Pages	1712-1721
Link	Publication

Title	Exploration of a Threshold for Similarity Based on Uncertainty in Word Embedding
DOI	10.1007/978-3-319-56608-5_31
Type	Book Chapter
Author	Rekabsaz N
Publisher	Springer Nature
Pages	396-409

Title	Does Online Evaluation Correspond to Offline Evaluation in Query Auto Completion?
DOI	10.1007/978-3-319-56608-5_70
Type	Book Chapter
Author	Bampoulidis A
Publisher	Springer Nature
Pages	713-719

Title	Fixed-Cost Pooling Strategies Based on IR Evaluation Measures
DOI	10.1007/978-3-319-56608-5_28
Type	Book Chapter
Author	Lipani A
Publisher	Springer Nature
Pages	357-368

Title	Back to the Sketch-Board: Integrating Keyword Search, Semantics, and Information Retrieval
DOI	10.1007/978-3-319-53640-8_5
Type	Book Chapter
Author	Azzopardi J
Publisher	Springer Nature
Pages	49-61

Title	A faceted approach to reachability analysis of graph modelled collections
DOI	10.1007/s13735-017-0145-8
Type	Journal Article
Author	Sabetghadam S
Journal	International Journal of Multimedia Information Retrieval
Pages	157-171
Link	Publication

Title	The Curious Incidence of Bias Corrections in the Pool
DOI	10.1007/978-3-319-30671-1_20
Type	Book Chapter
Author	Lipani A
Publisher	Springer Nature
Pages	267-279

Title	Fairness in Information Retrieval
DOI	10.1145/2911451.2911473
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	1171-1171

Title	Assessors Agreement: A Case Study Across Assessor Type, Payment Levels, Query Variations and Relevance Dimensions
DOI	10.1007/978-3-319-44564-9_4
Type	Book Chapter
Author	Palotti J
Publisher	Springer Nature
Pages	40-53

Title	An Initial Analytical Exploration of Retrievability
DOI	10.1145/2808194.2809495
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	329-332

Title	An Information Retrieval Ontology for Information Retrieval Nanopublications
DOI	10.1007/978-3-319-11382-1_5
Type	Book Chapter
Author	Lipani A
Publisher	Springer Nature
Pages	44-49

Title	Domain Specific Search
DOI	10.1007/978-3-319-12511-4_6
Type	Book Chapter
Author	Lupu M
Publisher	Springer Nature
Pages	96-117

Title	A Real-World Framework for Translator as Expert Retrieval
DOI	10.1007/978-3-319-11382-1_14
Type	Book Chapter
Author	Rekabsaz N
Publisher	Springer Nature
Pages	141-152

Title	Extracting Nanopublications from IR Papers
DOI	10.1007/978-3-319-12979-2_5
Type	Book Chapter
Author	Lipani A
Publisher	Springer Nature
Pages	53-62

Title	The Solitude of Relevant Documents in the Pool
DOI	10.1145/2983323.2983891
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	1989-1992
Link	Publication

Title	The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias
DOI	10.1145/2970398.2970429
Type	Conference Proceeding Abstract
Author	Lipani A
Pages	105-108
Link	Publication

Title	Interactive Exploration of Healthcare Queries
DOI	10.1109/cbmi.2016.7500275
Type	Conference Proceeding Abstract
Author	Bampoulidis A
Pages	1-4

Title	Generalizing Translation Models in the Probabilistic Relevance Framework
DOI	10.1145/2983323.2983833
Type	Conference Proceeding Abstract
Author	Rekabsaz N
Pages	711-720

Title	Report on the Cloud-Based Evaluation Approaches Workshop 2015
DOI	10.1145/2964797.2964804
Type	Journal Article
Author	Müller H
Journal	ACM SIGIR Forum
Pages	38-41
Link	Publication

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)

Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)

Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)

Disciplines

Keywords

Research Output