8D Project Overview on AI Chatbot Evaluation

Study Notes

8D Project Overview

Evaluate AI chatbot responses to user prompts
Rate the prompt based on 8 criteria:
- Accuracy
- Citation Correctness
- Instruction Following
- Refusal
- Grammar/Presentation
- Relevance
- Tone/Style
- Comprehensiveness
Focus on whether the response addresses linked sources or social media posts in the conversation history
Tasks may not have refernce links or social media posts, but check each chat history for them
Higher quality responses effectively address and/or incorporate information from external sources of information
Quality is of the utmost importance

Important Documents and Resources

SRT (Customer Platform) Interface Preview: Familiarize yourself with the different parts.
Key SRT Terms
Remotasks/Outlier Interface Preview: Familiarizes yourself with where to input information into the task interface.
Key Remotasks/Outlier Terms
SRT Login Instructions
Task Workflow: You will evaluate response to a prompt and be given conversation history as context on Remotasks/Outlier and SRT Platform.

Benchmark Exam

Complete benchmark training tasks

How to Complete Specific Tasks

Facebook Feed Context: These tasks are fairly common where the latest model response must be relevant to the most recent Facebook Feed Context.
- Example: Facebook Feed Context tasks show up fairly often on the project.

Example Task of Prompt and Response with Citations

Example task of Prompt and Response with Citations:
- The model can cite documents.
- Example document ID: []

Response Grading Rubric

Tone & Style: Refresher Course available.
Refusals:
- Safety Refusal: Explicit mention of something along the lines of "I'm not supposed to answer that".
- Capability Refusal: "I don't understand this prompt."
- Ignorance: "I don't know that"
- Other Refusal: This is a refusal that does not fit into the other defined categories.

Appendix

Good Justification

Example of acceptable review notes:

How to Rate Facebook Feed Tasks (One-Feed Queue)

The Basics:
- Example with rating and explanation/thought process:
- Chat History and latest user prompt:
- Facebook Graphic:
- References:
- Response:
- Rating with Explanations:

Instructions Update Log

Important Project Updates:
- Use Discourse Blockers Google Form to request help regarding the Discourse platform

Important Project Updates

If any source looks suspicious, follow up with independent fact checking

8D Project Overview on AI Chatbot Evaluation

Choose a study mode

Podcast

Questions and Answers

What is the primary objective of the 8D evaluations?

Which of the following is NOT a criterion for rating responses in the 8D evaluations?

When evaluating a response, what should be particularly checked in relation to external sources?

What is the significance of the hierarchy of cooperation types in the 8D evaluations?

What does a safety refusal indicate in the context of 8D evaluations?

Which task type can evaluators expect to encounter in the 8D project?

What should evaluators do if they find suspicious sources listed in the references?

In what context are the Facebook feed tasks rated?

What is the primary focus when evaluating the latest response in a conversation history?

What should be done if the task involves an image that is necessary for evaluation but does not display?

In tasks where the model cites sources, what must be verified?

Which of the following describes Facebook Feed Context tasks?

What should be considered when using the SRT platform?

Which instruction is NOT part of the task workflow on the SRT platform?

What should be prioritized when reviewing chat history and citations?

When dealing with a safety refusal, what should be ensured in the response?

Study Notes

8D Project Overview

Important Documents and Resources

Benchmark Exam

How to Complete Specific Tasks

Example Task of Prompt and Response with Citations

Response Grading Rubric

Appendix

Good Justification

How to Rate Facebook Feed Tasks (One-Feed Queue)

Instructions Update Log

Important Project Updates

Studying That Suits You

More Like This

ChatGPT: AI Chatbot Quiz

ChatGPT: AI Chatbot & Natural Language Processing

AI Chatbot Comparison: ChatGPT vs Copilot vs Gemini

Farmers' AI Chatbot Overview

Quick Share