Knowledge Delta based improvement and continuous evaluation
Knowledge Delta based improvement and continuous evaluation
Bilaterale Ausschreibung: Frankreich
Disciplines
Computer Sciences (80%); Mathematics (20%)
Keywords
-
Information Retrieval,
Evaluation,
Explainability
Attending industry conferences for information systems (e.g. medical, news, intellectual property domains), it is easy to observe, in the last 2-3 years, a surge in semantic search systems, using artificial intelligence to produce the best results for a variety of work tasks that have an underlying search application. End -users of such systems have no means of assessing the value of these systems, but have to trust the companies offering them. At the same time, companies developing these search based applications have no reliable tools to integrate effectiveness evaluation as part of their testing procedures. The challenge here lies in the fact that while there are numerous benchmarks available in the academic community, there is no quantification of the differences between them. Such a benchmark is typically constituted of a set of documents to be indexed by the search engine (the document collection), a set of queries that simulate user information needs (the query set), and a set of relevance judgements (the qrel set). Changes in any of these, in order for a search system to maintain optimal performance, need to be reflected in changes in the system parameters. But while changes in effectiveness and changes in system parameters are typically easy to observe or measure, changes in the benchmark are currently difficult, if not impossible to measure. Building on the state-of-the-art in representation learning, KoDicare investigates methods to understand changes in the benchmarks beyond the simple term statistics. Significant changes in the document collection or query set need to be quantified at a semantic level. Using such a quantification, which we denote as the Knowledge Delta, we will be able to run ablation studies tests where we change, in a controlled environment, units of Knowledge and observe differences in performance of the search system. The ability to do so has a significant impact both on academic research (providing the means for more controlled experiments in information retrieval) as well as on industry (providing the means to update the search engine if and only if the environment has significantly changed). KoDicare brings together the Research Studios Austria Forschungsgesellschaft, the Laboratoire dInformatique de Grenoble, and Qwant SAS to develop the fundamental theory needed to integrate effectiveness evaluation into future ( semantic) search systems.
Evaluating search systems requires setting up an environment: select a paradigm, metrics, a dataset, etc. The choice of an environment is rarely motivated objectively, and the impact of its variations (choosing a dataset against another, altering one) is rarely measured. Such objectivity comes from a quantifiable understanding of the differences between datasets, documents, or test queries. In Kodicare, we generically call such difference "knowledge delta". Evaluation of several environments, knowing their knowledge deltas, leads to measuring and qualifying "results deltas". Online systems require continuous evaluation with a stable and meaningful environment; which guarantees the reproducibility and explainability of systems results. The environment and result deltas will be able to support such continuous evaluation, and to provide explanations. The theoretical results will be confronted to real cases defined by a French company that deploys a web search engine (Qwant). Scientific and technical challenges: To our knowledge, no such framework dedicated to real continuous evaluation of information retrieval systems exist, due to the numerous parameters that must be handled. The deltas proposed by Kodicare are then a sensible way to tackle this problem. Continuous evaluation is only possible with real cases, which are often difficult to define without the help of web search companies. The large implication of Qwant helped the project define usable scenarios and to test them on a real-life serach engine.
- Research Studios Austria - 100%
- Mihai Lupu, Research Studios Austria , former principal investigator
- Philippe Mulhem, IMAG - France
- Christophe Servan, Qwant Research - France
Research Output
- 14 Publications
- 7 Datasets & models
-
2024
Title Overview oftheCLEF 2024 LongEval Lab onLongitudinal Evaluation ofModel Performance; In: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9-12, 2024, Proceedings, Part II DOI 10.1007/978-3-031-71908-0_10 Type Book Chapter Publisher Springer Nature Switzerland -
2024
Title LongEval: Longitudinal Evaluation ofModel Performance atCLEF 2024; In: Advances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part VI DOI 10.1007/978-3-031-56072-9_8 Type Book Chapter Publisher Springer Nature Switzerland -
2024
Title AMATU@ SimpleText2024: are LLMs any good for scientific leaderboard extraction Type Other Author Alaa El-Ebshihy Conference Conference and Labs of the Evaluation Forum (CLEF 2024) Link Publication -
2024
Title Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance Type Other Author Hsuvas Borkakoty Conference Conference and Labs of the Evaluation Forum (CLEF 2024) Link Publication -
2026
Title LongEval atCLEF 2025: Longitudinal Evaluation ofIR Systems onWeb andScientific Data; In: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 16th International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9-12, 2025, Proceedings DOI 10.1007/978-3-032-04354-2_20 Type Book Chapter Publisher Springer Nature Switzerland -
2025
Title LongEval atCLEF 2025: Longitudinal Evaluation ofIR Model Performance; In: Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V DOI 10.1007/978-3-031-88720-8_58 Type Book Chapter Publisher Springer Nature Switzerland -
2025
Title Benchmark Creation forNarrative Knowledge Delta Extraction Tasks: Can LLMs Help?; In: Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part III DOI 10.1007/978-3-031-88714-7_32 Type Book Chapter Publisher Springer Nature Switzerland -
2025
Title Extended Abstract of LongEval at CLEF 2025: Longitudinal Evaluation of IR Systems on Web and Scientific Data Type Other Author Alaa El-Ebshihy Conference Conference and Labs of the Evaluation Forum (CLEF 2025) Link Publication -
2023
Title LongEval: Longitudinal Evaluation ofModel Performance atCLEF 2023; In: Advances in Information Retrieval - 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part III DOI 10.1007/978-3-031-28241-6_58 Type Book Chapter Publisher Springer Nature Switzerland -
2023
Title Predicting Retrieval Performance Changes inEvolving Evaluation Environments; In: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 14th International Conference of the CLEF Association, CLEF 2023, Thessaloniki, Greece, September 18-21, 2023, Proceedings DOI 10.1007/978-3-031-42448-9_3 Type Book Chapter Publisher Springer Nature Switzerland -
2023
Title Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance Type Other Author Alkhalifa R Conference Conference and Labs of the Evaluation Forum (CLEF 2023) Link Publication -
2023
Title Towards Result Delta Prediction Based on Knowledge Deltas for Continuous IR Evaluation Type Other Author Alaa El-Ebshihy Conference Proceedings of the workshop QPP++ 2023: Query Performance Prediction and Its Evaluation in New Tasks, co-located with The 45th European Conference on Information Retrieval (ECIR) Link Publication -
2023
Title LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation DOI 10.1145/3539618.3591921 Type Conference Proceeding Abstract Author Deveaud R Pages 3086-3094 -
2023
Title LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation DOI 10.48550/arxiv.2303.03229 Type Preprint Author Deveaud P Link Publication
-
2025
Link
Title LongEval 2025 Web Retrieval Collection DOI 10.48436/th5h0-g5f51 Type Database/Collection of data Public Access Link Link -
2025
Link
Title LongEval 2025 CORE Retrieval Test Collection DOI 10.48436/v8phe-g8911 Type Database/Collection of data Public Access Link Link -
2025
Link
Title LongEval 2025 CORE Retrieval Train Collection DOI 10.48436/r643n-yc044 Type Database/Collection of data Public Access Link Link -
2024
Link
Title LongEval Train Collection Type Database/Collection of data Public Access Link Link -
2024
Link
Title LongEval 2024 Train Collection DOI 10.48436/y60e9-k9b51 Type Database/Collection of data Public Access Link Link -
2024
Link
Title LongEval 2024 Test Collection DOI 10.48436/xr350-79683 Type Database/Collection of data Public Access Link Link -
2023
Link
Title kodicare_framework Type Computer model/algorithm Public Access Link Link