At CLEF 2017, the PIR-CLEF workshop will provide a pilot Lab task based on a small and preliminary test collection; a pilot evaluation using this collection will be run during the CLEF 2017 campaign to allow other groups to experience with and provide feedback about our proposed PIR evaluation methodology. The outcomes of this activity will be a shared trial, feedback, and improvements to our proposed experimental methodology for comparative evaluation of PIR methods, with the intention of offering a fully tuned Lab at CLEF 2018.
The pilot test collection will provide all the traditional components needed in a laboratory-based evaluation experiment, such as topics, relevance judgements and so on; moreover it is accompanied by a set of user-related information for modelling and introducing profiles in the evaluation experiment. Specifically, this personal information will consist of:
- user personal information: including gender, age range, native language, and occupation.
- search logs: which contain the history of the user’s interactions with a search engine.
- the user’s documents of interest: provided as useful and raw sources to extract topical user preferences.
- basic user profile representations in the form of bag-of-words will be also provided with the aim of offering a basic model of the user’s topical interests.
- user satisfaction: a satisfaction grade decided by the user and providing a feedback on the ranking of documents.
We will use ClueWeb12, which contains over 730 million Web pages, as the basic repository to extract the above mentioned user-related information. Recognising that operating a ClueWeb12 service is a significant undertaking, API access to an existing ClueWeb12 search service will be made available to participants.
The pilot workshop will also investigate the most appropriate measures for the evaluation of personalized IR systems. There are numerous traditional evaluation metrics in the IR literature that can be applied also to personalized search as, for example, Precision (P), Recall (R), Precision at n (P@n), Mean Average Precision (MAP). Furthermore there also exist several more user centered metrics, like (normalized) Discounted Cumulative Gain (nDCG), Relative Relevance (RR) or Half Life Relevance (HLR). However, we think there is a need for measures embedding more accurate user models and better describing the user behaviour, especially in the context of personalised search. Therefore, as part of the planning for a task design for CLEF 2018, the PIR-CLEF 2017 workshop will explore the potential need for novel evaluation metrics for PIR, for example to provide direct comparison of user models or to flexibly embed the user dynamics.
The 2017 edition of PIR-CLEF Lab will consist of:
- A pilot PIR task intended to enable practical exploration of our proposed PIR evaluation methodology, with the intention of offering a fully tuned Lab at CLEF 2018.
- A one day workshop that will include a report on PIR-CLEF pilot tasks, with short participant presentations, some invited presentations on the themes of benchmarking, personalization and adaptation, a discussion of potential PIR tasks for CLEF 2018.