Evaluation of Retrieval Systems PDF 2024-2025
Document Details
Uploaded by Deleted User
University of Ghardaia
2024
Hamani Wissal
Tags
Summary
This document appears to be a presentation on the evaluation of information retrieval systems. It discusses topics such as the definitions of retrieval systems, evaluation metrics, and the role of the Text Retrieval Conference (TREC).
Full Transcript
University of Ghardaia Faculty of Science and Technology Department of Mathematics and Computer Science Module: Information retrieval Evaluation of retrieval systems...
University of Ghardaia Faculty of Science and Technology Department of Mathematics and Computer Science Module: Information retrieval Evaluation of retrieval systems Prepared by: Hamani Wissal Supervisor: M. Ben Guenane Academic Year: 2024-2025 Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 1 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 1 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 2 / 39 Definition The evaluation of information retrieval (IR) systems is a critical area of research aimed at determining how effectively these systems retrieve relevant documents in response to user queries. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 3 / 39 Definition With the increasing complexity of text data and user needs, it is essential to establish robust evaluation metrics and frameworks to ensure that retrieval systems meet expectations in various domains, from search engines to recommendation systems. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 4 / 39 IR System Overview Document Collection Document Normalisation Indexer Query IR System Indexes Ranking/Matching Module Set of relevant documents Evaluation Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 5 / 39 Problematique How can we effectively evaluate retrieval systems to ensure they meet user needs in retrieving relevant and accurate information from vast datasets? Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 6 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 7 / 39 Why Metrics Matter: Metrics evaluate how well an IR system retrieves relevant information for users. Aligns system performance with user needs. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 8 / 39 Key Metrics Recall, Precision, and Combined Measures (like F1-Score). Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane 9 / 39 Recall Definition The proportion of relevant documents retrieved out of all relevant documents available. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane10 / 39 Precision Definition The proportion of retrieved documents that are relevant. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane11 / 39 Precision and Recall Visualization All Documents Retrieved Documents Relevant Documents Retrieved Relevant Documents # of Retrieved Relevant Documents Recall = # of Relevant Documents Precision = # of#Retrieved Relevant Documents of Retrieved Documents Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane12 / 39 Combined Measures In Information Retrieval (IR) systems, combined measures evaluate system performance by balancing multiple factors such as relevance, precision, recall, and user satisfaction. These measures provide a more comprehensive understanding of the system’s effectiveness. Here are the most common combined measures: Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane13 / 39 F1-Score A harmonic mean of Precision and Recall, providing a single measure that balances both concerns. Formula Precision × Recall F1 = 2 × Precision + Recall Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane14 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane15 / 39 Precision, Recall, and F1-Score Example Query: ”machine learning algorithms” Document Collection: D1: Machine learning algorithms are widely used in data science and AI. D2: The study of computer algorithms involves graph theory. D3: Machine learning models require labeled datasets. D4: Programming languages like Python are popular in software engineering. D5: AI applications include NLP and robotics. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane16 / 39 Precision, Recall, and F1-Score Example Retrieved Documents: {D1, D3, D5} Relevant Documents: {D1, D3} Metrics: Relevant Retrieved 2 Precision = = ≈ 0.67 Total Retrieved 3 Relevant Retrieved 2 Recall = = = 1.0 Total Relevant 2 Precision · Recall F1 Score = 2 · = 0.8 Precision + Recall Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane17 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane18 / 39 TREC TREC, organized by NIST since 1992, is a pivotal workshop series advancing research and benchmarking in information retrieval and search technologies. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane19 / 39 Purpose TREC provides a structured platform for researchers to test and compare their information retrieval systems using large-scale, standardized datasets. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane20 / 39 Evaluation Tracks Each year, TREC organizes multiple tracks that address specific retrieval challenges, such as ad hoc search, question answering, legal or biomedical document retrieval, and conversational systems. tracks allow participants to focus on diverse and evolving IR problems. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane21 / 39 Test Collections TREC offers extensive test collections, including large corpora of documents, standardized queries, and relevance judgments (often referred to as ”ground truth”) provided by human assessors. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane22 / 39 Metrics TREC popularized evaluation metrics like Precision, Recall, Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG), which are widely used to measure the effectiveness of retrieval systems. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane23 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane24 / 39 Evaluation Metrics Metrics are used to quantify the performance of an IR system. Precision Recall F1-Score Mean Average Precision (MAP) Normalized Discounted Cumulative Gain (nDCG) ROC and PR Curves Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane25 / 39 Test Collections Corpus: A collection of documents. Queries: A set of user questions or search terms. Relevance Judgments: Labels indicating which documents are relevant for each query. TREC (Text REtrieval Conference) datasets. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane26 / 39 Relevance Judgment Binary Relevance: Documents are classified as either relevant or not. Graded Relevance: Documents are rated on a scale (e.g., highly relevant, somewhat relevant). Relevance is subjective, varying with user needs. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane27 / 39 User-Centered Evaluation Measures system usability and effectiveness for end-users. Factors include time taken to find documents, user satisfaction, and query reformulation behavior. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane28 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane29 / 39 Figure: Simple Information Retrieval using Keyword Matching in Python Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane30 / 39 Figure: Precision, Recall, and F1-Score Evaluation Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane31 / 39 Figure: Visualisation Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane32 / 39 Figure: Descriptive of sys Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane33 / 39 Figure: Descriptive of F1-score Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane34 / 39 Table of Contents 1 Introduction Definition 2 Evaluation Metrics Overview Purpose of Metrics in IR Systems Recall and Precision Combined Measures 3 Example 4 The Text Retrieval Conference Key Features of TREC 5 Assessments 6 Implementation 7 conclusion Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane35 / 39 Conclusion Evaluating retrieval systems involves assessing precision, recall, and user satisfaction. A balance between these metrics is essential for effective performance. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane36 / 39 Conclusion Overall, comprehensive evaluation helps improve the accuracy and efficiency of information retrieval. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane37 / 39 References https://web.fe.up.pt/~ssn/wiki/_media/teach/dapi/202021/ lectures/dapi2021-web-ir.pdf. https: //www.cl.cam.ac.uk/teaching/1415/InfoRtrv/lecture5.pdf. Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane38 / 39 Thanks! Academic Year: 2024-2025 Prepared by: Hamani Wissal Supervisor: M. Bensystems Evaluation of retrieval Guenane39 / 39