Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)
Abstracting Domain-Specific Information Retrieval and Evaluation (ADmIRE)
Disciplines
Computer Sciences (100%)
Keywords
-
Information Retrieval,
Domain-specific Information Retrieval,
Information Retrieval Evaluation
Information Retrieval (IR) on theWorld WideWeb functions very effectively and efficiently. These search tools are specifically designed as multi-purpose tools, applicable in as wide an array of situations as possible. Nevertheless, all information is not equal. There are areas for which these tools are conceived too broadly to be useful: health or biomedical information, intellectual property information, social science publications, blogs, press photographs, etc. A search in one of these domains is called a domain-specific search. Such a search is specific in terms of the collection of documents indexed, search refinements arising from the domain characteristics, domain coverage specificity, types of multimodal data (e.g. images, chemical formulae) present in the documents, and the end users and their tasks. Even though many domains have similar characteristics and challenges, there is as yet no general framework for developing domain-specific search solutions, and no way of characterising a domain to allow the best approach or tools to be chosen. ADmIRE will contribute to the following areas of IR: 1. Domain-specific IR methodologies and 2. IR evaluation. For domain-specific IR, we will develop a framework consisting of a classification scheme for domain-specific problems and a protocol giving the optimal evidence-based approach to solving the problem based on the characterisation. The latter will be obtained by doing systematic reviews of IR papers. In fact, the creation of systematic reviews and hence of guidelines for domain-specific IR is an example of a (rather difficult) domain- specific search problem. We will use this problem as the central scenario for ADmIRE, working on the research and development to create semi-automated tools for IR researchers to create systematic reviews effectively. This scenario therefore simultaneously serves as a model of a domain-specific problem and as a means to find the evidence-based solution. For IR evaluation, ADmIRE will develop guidelines for future IR evaluation campaigns and publications to allow easier and more effective use of the evaluation results to guide decisions in domain-specific search system design, and develop a framework for component-based evaluation based on the workflow paradigm. The framework will being IR evaluation closer to the e-Science paradigm, allowing better sharing of experimental setups between researchers, and more flexibility in performing IR experiments with multiple configurations of components. IR (and potentially other empirically-based areas of computer science) has the advantage of already starting from a position where experimental studies are the cornerstone of the domain. ADmIRE will put together all of this experimental information, organizing it to make use of it in a specific domain, which is a very challenging and yet absolutely necessary task for the development of the field of IR.
The list of search results returned by search engines and the order in which they are presented can have huge impact on the outcomes of peoples decisions. A doctor searching for new treatments for a disease may not find a treatment if the document describing it is not ranked highly enough in the result list, leading to a patient potentially not being given the optimal treatment. An examiner in a patent office missing some prior art during a patent search and hence granting a patent may trigger legal proceedings between two companies. For this reason, it is important that search engines, particularly those used in such professional environments, are free from bias. In the ADmIRE project, we identified various search engine biases and developed methods to overcome these biases. Search engines use various heuristic equations for matching documents to queries. One potential bias is the findability bias, in which a document has particular characteristics that make the likelihood of it being returned for most queries rather small. It is possible to measure the findability bias experimentally by querying a document with a huge collection of queries and seeing how often it is returned. Here we proposed an analytical approach to measuring findability which reduces the extensive experimentation needed. A second potential bias is document length. If a search engine counts how often a query word is present in a document, then longer documents likely have the word present more often. Through recognising that a document can be long either because it talks about many topics (e.g. a collection of short stories), or because it talks about a single topic in a long- winded way (e.g. some textbooks), we were able to propose a method to reduce document length bias in search engine results. The final bias addressed is in evaluating search engines. In order to know how well a search engine functions, it is tested with a number of queries, for which people judge which documents are relevant or not knowing which documents are relevant and retrieved by the search engine means that measures of the effectiveness of a search engine can be made. However, with typically millions of documents searched, it is infeasible for the relevance of every document for every query to be manually judged. Not having all documents manually judged means that the effectiveness measures are biased in some situations. To reduce this bias, we developed approaches to automatically select a collection of documents that should be manually judged in order to reduce the bias as much as possible. Furthermore, approaches that can reduce the bias in the effectiveness measurements given an existing test collection (collection of queries, documents, and manual relevance judgements) were also developed, which allows the hundreds of existing test collections to be used to give less biased measurements.
- Technische Universität Wien - 100%
Research Output
- 266 Citations
- 27 Publications
-
2016
Title Generalizing Translation Models in the Probabilistic Relevance Framework DOI 10.1145/2983323.2983833 Type Conference Proceeding Abstract Author Rekabsaz N Pages 711-720 -
2016
Title Assessors Agreement: A Case Study Across Assessor Type, Payment Levels, Query Variations and Relevance Dimensions DOI 10.1007/978-3-319-44564-9_4 Type Book Chapter Author Palotti J Publisher Springer Nature Pages 40-53 -
2015
Title Report on the Evaluation-as-a-Service (EaaS) Expert Workshop DOI 10.1145/2795403.2795416 Type Journal Article Author Hopfgartner F Journal ACM SIGIR Forum Pages 57-65 Link Publication -
2017
Title A faceted approach to reachability analysis of graph modelled collections DOI 10.1007/s13735-017-0145-8 Type Journal Article Author Sabetghadam S Journal International Journal of Multimedia Information Retrieval Pages 157-171 Link Publication -
2017
Title Word Embedding Causes Topic Shifting; Exploit Global Context! DOI 10.1145/3077136.3080733 Type Conference Proceeding Abstract Author Rekabsaz N Pages 1105-1108 -
2017
Title Visual Pool DOI 10.1145/3077136.3084146 Type Conference Proceeding Abstract Author Lipani A Pages 1321-1324 -
2017
Title Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models DOI 10.18653/v1/p17-1157 Type Conference Proceeding Abstract Author Rekabsaz N Pages 1712-1721 Link Publication -
2017
Title Back to the Sketch-Board: Integrating Keyword Search, Semantics, and Information Retrieval DOI 10.1007/978-3-319-53640-8_5 Type Book Chapter Author Azzopardi J Publisher Springer Nature Pages 49-61 -
2017
Title Fixed budget pooling strategies based on fusion methods DOI 10.1145/3019612.3019692 Type Conference Proceeding Abstract Author Lipani A Pages 919-924 Link Publication -
2017
Title Fixed-Cost Pooling Strategies Based on IR Evaluation Measures DOI 10.1007/978-3-319-56608-5_28 Type Book Chapter Author Lipani A Publisher Springer Nature Pages 357-368 -
2017
Title Does Online Evaluation Correspond to Offline Evaluation in Query Auto Completion? DOI 10.1007/978-3-319-56608-5_70 Type Book Chapter Author Bampoulidis A Publisher Springer Nature Pages 713-719 -
2017
Title Exploration of a Threshold for Similarity Based on Uncertainty in Word Embedding DOI 10.1007/978-3-319-56608-5_31 Type Book Chapter Author Rekabsaz N Publisher Springer Nature Pages 396-409 -
2016
Title Interactive Exploration of Healthcare Queries DOI 10.1109/cbmi.2016.7500275 Type Conference Proceeding Abstract Author Bampoulidis A Pages 1-4 -
2016
Title The Solitude of Relevant Documents in the Pool DOI 10.1145/2983323.2983891 Type Conference Proceeding Abstract Author Lipani A Pages 1989-1992 Link Publication -
2014
Title Domain Specific Search DOI 10.1007/978-3-319-12511-4_6 Type Book Chapter Author Lupu M Publisher Springer Nature Pages 96-117 -
2014
Title An Information Retrieval Ontology for Information Retrieval Nanopublications DOI 10.1007/978-3-319-11382-1_5 Type Book Chapter Author Lipani A Publisher Springer Nature Pages 44-49 -
2014
Title A Real-World Framework for Translator as Expert Retrieval DOI 10.1007/978-3-319-11382-1_14 Type Book Chapter Author Rekabsaz N Publisher Springer Nature Pages 141-152 -
2014
Title Extracting Nanopublications from IR Papers DOI 10.1007/978-3-319-12979-2_5 Type Book Chapter Author Lipani A Publisher Springer Nature Pages 53-62 -
2016
Title Fairness in Information Retrieval DOI 10.1145/2911451.2911473 Type Conference Proceeding Abstract Author Lipani A Pages 1171-1171 -
2016
Title The Curious Incidence of Bias Corrections in the Pool DOI 10.1007/978-3-319-30671-1_20 Type Book Chapter Author Lipani A Publisher Springer Nature Pages 267-279 -
2016
Title The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias DOI 10.1145/2970398.2970429 Type Conference Proceeding Abstract Author Lipani A Pages 105-108 Link Publication -
2016
Title Report on the Cloud-Based Evaluation Approaches Workshop 2015 DOI 10.1145/2964797.2964804 Type Journal Article Author Müller H Journal ACM SIGIR Forum Pages 38-41 Link Publication -
2015
Title An Initial Analytical Exploration of Retrievability DOI 10.1145/2808194.2809495 Type Conference Proceeding Abstract Author Lipani A Pages 329-332 -
2015
Title DASyR(IR) - Document Analysis System for Systematic Reviews (in Information Retrieval) DOI 10.1109/icdar.2015.7333830 Type Conference Proceeding Abstract Author Piroi F Pages 591-595 -
2015
Title Splitting Water DOI 10.1145/2766462.2767749 Type Conference Proceeding Abstract Author Lipani A Pages 103-112 -
2015
Title Verboseness Fission for BM25 Document Length Normalization DOI 10.1145/2808194.2809486 Type Conference Proceeding Abstract Author Lipani A Pages 385-388 -
2018
Title A systematic approach to normalization in probabilistic models DOI 10.1007/s10791-018-9334-1 Type Journal Article Author Lipani A Journal Information Retrieval Journal Pages 565-596 Link Publication