Projectdetail

Grant DOI 10.55776/I4471
Funding program Einzelprojekte International
Status ended
Start April 1, 2021
End March 31, 2025
Funding amount € 399,887
Project website

Bilaterale Ausschreibung: Frankreich

Disciplines

Computer Sciences (80%); Mathematics (20%)

Keywords

Information Retrieval, Evaluation, Explainability

Abstract

Final report

Attending industry conferences for information systems (e.g. medical, news, intellectual property domains), it is easy to observe, in the last 2-3 years, a surge in semantic search systems, using artificial intelligence to produce the best results for a variety of work tasks that have an underlying search application. End -users of such systems have no means of assessing the value of these systems, but have to trust the companies offering them. At the same time, companies developing these search based applications have no reliable tools to integrate effectiveness evaluation as part of their testing procedures. The challenge here lies in the fact that while there are numerous benchmarks available in the academic community, there is no quantification of the differences between them. Such a benchmark is typically constituted of a set of documents to be indexed by the search engine (the document collection), a set of queries that simulate user information needs (the query set), and a set of relevance judgements (the qrel set). Changes in any of these, in order for a search system to maintain optimal performance, need to be reflected in changes in the system parameters. But while changes in effectiveness and changes in system parameters are typically easy to observe or measure, changes in the benchmark are currently difficult, if not impossible to measure. Building on the state-of-the-art in representation learning, KoDicare investigates methods to understand changes in the benchmarks beyond the simple term statistics. Significant changes in the document collection or query set need to be quantified at a semantic level. Using such a quantification, which we denote as the Knowledge Delta, we will be able to run ablation studies tests where we change, in a controlled environment, units of Knowledge and observe differences in performance of the search system. The ability to do so has a significant impact both on academic research (providing the means for more controlled experiments in information retrieval) as well as on industry (providing the means to update the search engine if and only if the environment has significantly changed). KoDicare brings together the Research Studios Austria Forschungsgesellschaft, the Laboratoire dInformatique de Grenoble, and Qwant SAS to develop the fundamental theory needed to integrate effectiveness evaluation into future ( semantic) search systems.

Evaluating search systems requires setting up an environment: select a paradigm, metrics, a dataset, etc. The choice of an environment is rarely motivated objectively, and the impact of its variations (choosing a dataset against another, altering one) is rarely measured. Such objectivity comes from a quantifiable understanding of the differences between datasets, documents, or test queries. In Kodicare, we generically call such difference "knowledge delta". Evaluation of several environments, knowing their knowledge deltas, leads to measuring and qualifying "results deltas". Online systems require continuous evaluation with a stable and meaningful environment; which guarantees the reproducibility and explainability of systems results. The environment and result deltas will be able to support such continuous evaluation, and to provide explanations. The theoretical results will be confronted to real cases defined by a French company that deploys a web search engine (Qwant). Scientific and technical challenges: To our knowledge, no such framework dedicated to real continuous evaluation of information retrieval systems exist, due to the numerous parameters that must be handled. The deltas proposed by Kodicare are then a sensible way to tackle this problem. Continuous evaluation is only possible with real cases, which are often difficult to define without the help of web search companies. The large implication of Qwant helped the project define usable scenarios and to test them on a real-life serach engine.

Research institution(s)

Research Studios Austria - 100%

Project participants

Mihai Lupu, Research Studios Austria , former principal investigator

International project participants

Philippe Mulhem, IMAG - France
Christophe Servan, Qwant Research - France

Research Output

22 Citations
14 Publications
7 Datasets & models

Publications

Title	LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation
DOI	10.1145/3539618.3591921
Type	Conference Proceeding Abstract
Author	Deveaud R
Pages	3086-3094

Title	AMATU@ SimpleText2024: are LLMs any good for scientific leaderboard extraction
Type	Other
Author	Alaa El-Ebshihy
Conference	Conference and Labs of the Evaluation Forum (CLEF 2024)
Link	Publication

Title	Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance
Type	Other
Author	Hsuvas Borkakoty
Conference	Conference and Labs of the Evaluation Forum (CLEF 2024)
Link	Publication

Title	Towards Result Delta Prediction Based on Knowledge Deltas for Continuous IR Evaluation
Type	Other
Author	Alaa El-Ebshihy
Conference	Proceedings of the workshop QPP++ 2023: Query Performance Prediction and Its Evaluation in New Tasks, co-located with The 45th European Conference on Information Retrieval (ECIR)
Link	Publication

Title	Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance
Type	Other
Author	Alkhalifa R
Conference	Conference and Labs of the Evaluation Forum (CLEF 2023)
Link	Publication

Title	LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023
DOI	10.1007/978-3-031-28241-6_58
Type	Book Chapter
Author	Alkhalifa R
Publisher	Springer Nature
Pages	499-505
Link	Publication

Title	LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation
DOI	10.48550/arxiv.2303.03229
Type	Preprint
Author	Deveaud P
Link	Publication

Title	LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024
DOI	10.1007/978-3-031-56072-9_8
Type	Book Chapter
Author	Alkhalifa R
Publisher	Springer Nature
Pages	60-66
Link	Publication

Title	Overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance
DOI	10.1007/978-3-031-71908-0_10
Type	Book Chapter
Author	Alkhalifa R
Publisher	Springer Nature
Pages	208-230
Link	Publication

Title	Predicting Retrieval Performance Changes in Evolving Evaluation Environments
DOI	10.1007/978-3-031-42448-9_3
Type	Book Chapter
Author	El-Ebshihy A
Publisher	Springer Nature
Pages	21-33
Link	Publication

Title	LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance
DOI	10.1007/978-3-031-88720-8_58
Type	Book Chapter
Author	Cancellieri M
Publisher	Springer Nature
Pages	382-388

Title	Benchmark Creation for Narrative Knowledge Delta Extraction Tasks: Can LLMs Help?
DOI	10.1007/978-3-031-88714-7_32
Type	Book Chapter
Author	El-Ebshihy A
Publisher	Springer Nature
Pages	335-344

Title	Extended Abstract of LongEval at CLEF 2025: Longitudinal Evaluation of IR Systems on Web and Scientific Data
Type	Other
Author	Alaa El-Ebshihy
Conference	Conference and Labs of the Evaluation Forum (CLEF 2025)
Link	Publication

Title	LongEval at CLEF 2025: Longitudinal Evaluation of IR Systems on Web and Scientific Data
DOI	10.1007/978-3-032-04354-2_20
Type	Book Chapter
Author	Cancellieri M
Publisher	Springer Nature
Pages	363-387
Link	Publication

Datasets & models

Public Access
Title	LongEval 2025 Web Retrieval Collection
DOI	10.48436/th5h0-g5f51
Type	Database/Collection of data
Link	Link

Public Access
Title	LongEval 2025 CORE Retrieval Train Collection
DOI	10.48436/r643n-yc044
Type	Database/Collection of data
Link	Link

Public Access
Title	LongEval 2025 CORE Retrieval Test Collection
DOI	10.48436/v8phe-g8911
Type	Database/Collection of data
Link	Link

Public Access
Title	LongEval Train Collection
Type	Database/Collection of data
Link	Link

Public Access
Title	LongEval 2024 Train Collection
DOI	10.48436/y60e9-k9b51
Type	Database/Collection of data
Link	Link

Public Access
Title	LongEval 2024 Test Collection
DOI	10.48436/xr350-79683
Type	Database/Collection of data
Link	Link

Public Access
Title	kodicare_framework
Type	Computer model/algorithm
Link	Link

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Knowledge Delta based improvement and continuous evaluation

Knowledge Delta based improvement and continuous evaluation

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Knowledge Delta based improvement and continuous evaluation

Knowledge Delta based improvement and continuous evaluation

Disciplines

Keywords

Research Output