RAG evaluation

Details

File: mistral/evaluation/RAG_evaluation.ipynb
Type: Jupyter Notebook
Use Cases: RAG, Evaluation
Content

Notebook content (JSON format):
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BjLq3hMOP1KR"
   },
   "source": [
    "# Evaluating RAG: Using Mistral Models for LLM as a Judge (With Structured Outputs)\n",
    "\n",
    "This cookbook shows an example of using the Mistral AI models for LLM As A Judge using structured outputs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XZhlfwDMP5hj"
   },
   "source": [
    "## Imports & API Key Setting\n",
    "You can get your api key from: https://console.mistral.ai/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vB4dHYDU9mST",
    "outputId": "e6ef92d6-778f-4ee5-9783-5ee1d6078a0f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: mistralai==1.5.1 in /opt/anaconda3/lib/python3.12/site-packages (1.5.1)\n",
      "Requirement already satisfied: httpx==0.28.1 in /opt/anaconda3/lib/python3.12/site-packages (0.28.1)\n",
      "Requirement already satisfied: pydantic==2.10.6 in /opt/anaconda3/lib/python3.12/site-packages (2.10.6)\n",
      "Requirement already satisfied: python-dateutil==2.9.0.post0 in /opt/anaconda3/lib/python3.12/site-packages (2.9.0.post0)\n",
      "Requirement already satisfied: jsonpath-python==1.0.6 in /opt/anaconda3/lib/python3.12/site-packages (1.0.6)\n",
      "Requirement already satisfied: typing-inspect==0.9.0 in /opt/anaconda3/lib/python3.12/site-packages (0.9.0)\n",
      "Requirement already satisfied: eval-type-backport>=0.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from mistralai==1.5.1) (0.2.2)\n",
      "Requirement already satisfied: anyio in /opt/anaconda3/lib/python3.12/site-packages (from httpx==0.28.1) (4.8.0)\n",
      "Requirement already satisfied: certifi in /opt/anaconda3/lib/python3.12/site-packages (from httpx==0.28.1) (2025.1.31)\n",
      "Requirement already satisfied: httpcore==1.* in /opt/anaconda3/lib/python3.12/site-packages (from httpx==0.28.1) (1.0.7)\n",
      "Requirement already satisfied: idna in /opt/anaconda3/lib/python3.12/site-packages (from httpx==0.28.1) (3.10)\n",
      "Requirement already satisfied: annotated-types>=0.6.0 in /opt/anaconda3/lib/python3.12/site-packages (from pydantic==2.10.6) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.27.2 in /opt/anaconda3/lib/python3.12/site-packages (from pydantic==2.10.6) (2.27.2)\n",
      "Requirement already satisfied: typing-extensions>=4.12.2 in /opt/anaconda3/lib/python3.12/site-packages (from pydantic==2.10.6) (4.12.2)\n",
      "Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from python-dateutil==2.9.0.post0) (1.16.0)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /opt/anaconda3/lib/python3.12/site-packages (from typing-inspect==0.9.0) (1.0.0)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /opt/anaconda3/lib/python3.12/site-packages (from httpcore==1.*->httpx==0.28.1) (0.14.0)\n",
      "Requirement already satisfied: sniffio>=1.1 in /opt/anaconda3/lib/python3.12/site-packages (from anyio->httpx==0.28.1) (1.3.1)\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Enter Mistral AI API Key ········\n"
     ]
    }
   ],
   "source": [
    "!pip install mistralai==1.5.1 httpx==0.28.1 pydantic==2.10.6 python-dateutil==2.9.0.post0 jsonpath-python==1.0.6 typing-inspect==0.9.0\n",
    "from pydantic import BaseModel, Field\n",
    "from enum import Enum\n",
    "from typing import List\n",
    "from getpass import getpass\n",
    "from mistralai import Mistral\n",
    "\n",
    "# Define the API key and model\n",
    "api_key = getpass(\"Enter Mistral AI API Key\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xfYrV_vOQIwY"
   },
   "source": [
    "## Main Code For LLM As A Judge For RAG (With Structured Outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "rXc7OkcP47dF",
    "outputId": "4168cde7-bc7a-479e-edd2-13174281e538"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🏆 RAG Evaluation:\n",
      "\n",
      "Criteria: Context Relevance\n",
      "Reasoning: The retrieved context is relevant to the query as it defines renewable energy and lists various types such as solar, wind, hydro, and geothermal energy. It provides a basic understanding of what renewable energy encompasses, which is useful for addressing the benefits of renewable energy.\n",
      "Score: 3/3\n",
      "\n",
      "Criteria: Answer Relevance\n",
      "Reasoning: The generated answer addresses the user's query by highlighting the environmental benefits of renewable energy, specifically mentioning solar and wind energy. It discusses the reduction of carbon emissions, which is a key benefit of renewable energy. However, it does not mention other types of renewable energy like hydro and geothermal, which were included in the context.\n",
      "Score: 2/3\n",
      "\n",
      "Criteria: Groundedness\n",
      "Reasoning: The generated answer is mostly grounded in the retrieved context. It mentions solar and wind energy, which are part of the context. However, it does not mention hydro and geothermal energy, which were also included in the context. Additionally, the answer introduces the benefit of reducing carbon emissions, which is not explicitly stated in the context but is a well-known benefit of renewable energy.\n",
      "Score: 2/3\n"
     ]
    }
   ],
   "source": [
    "from pydantic import BaseModel, Field\n",
    "from enum import Enum\n",
    "from getpass import getpass\n",
    "from mistralai import Mistral\n",
    "\n",
    "# Initialize the Mistral client with the API key\n",
    "client = Mistral(api_key=api_key)\n",
    "model = \"mistral-large-latest\"\n",
    "\n",
    "# Define Enum for scores\n",
    "class Score(str, Enum):\n",
    "    no_relevance = \"0\"\n",
    "    low_relevance = \"1\"\n",
    "    medium_relevance = \"2\"\n",
    "    high_relevance = \"3\"\n",
    "\n",
    "# Define a constant for the score description\n",
    "SCORE_DESCRIPTION = (\n",
    "    \"Score as a string between '0' and '3'. \"\n",
    "    \"0: No relevance/Not grounded/Irrelevant - The context/answer is completely unrelated or not based on the context. \"\n",
    "    \"1: Low relevance/Low groundedness/Somewhat relevant - The context/answer has minimal relevance or grounding. \"\n",
    "    \"2: Medium relevance/Medium groundedness/Mostly relevant - The context/answer is somewhat relevant or grounded. \"\n",
    "    \"3: High relevance/High groundedness/Fully relevant - The context/answer is highly relevant or grounded.\"\n",
    ")\n",
    "\n",
    "# Define separate classes for each criterion with detailed descriptions\n",
    "class ContextRelevance(BaseModel):\n",
    "    explanation: str = Field(..., description=(\"Step-by-step reasoning explaining how the retrieved context aligns with the user's query. \"\n",
    "                    \"Consider the relevance of the information to the query's intent and the appropriateness of the context \"\n",
    "                    \"in providing a coherent and useful response.\"))\n",
    "    score: Score = Field(..., description=SCORE_DESCRIPTION)\n",
    "\n",
    "class AnswerRelevance(BaseModel):\n",
    "    explanation: str = Field(..., description=(\"Step-by-step reasoning explaining how well the generated answer addresses the user's original query. \"\n",
    "                    \"Consider the helpfulness and on-point nature of the answer, aligning with the user's intent and providing valuable insights.\"))\n",
    "    score: Score = Field(..., description=SCORE_DESCRIPTION)\n",
    "\n",
    "class Groundedness(BaseModel):\n",
    "    explanation: str = Field(..., description=(\"Step-by-step reasoning explaining how faithful the generated answer is to the retrieved context. \"\n",
    "                    \"Consider the factual accuracy and reliability of the answer, ensuring it is grounded in the retrieved information.\"))\n",
    "    score: Score = Field(..., description=SCORE_DESCRIPTION)\n",
    "\n",
    "class RAGEvaluation(BaseModel):\n",
    "    context_relevance: ContextRelevance = Field(..., description=\"Evaluation of the context relevance to the query, considering how well the retrieved context aligns with the user's intent.\" )\n",
    "    answer_relevance: AnswerRelevance = Field(..., description=\"Evaluation of the answer relevance to the query, assessing how well the generated answer addresses the user's original query.\" )\n",
    "    groundedness: Groundedness = Field(..., description=\"Evaluation of the groundedness of the generated answer, ensuring it is faithful to the retrieved context.\" )\n",
    "\n",
    "# Function to evaluate RAG metrics\n",
    "def evaluate_rag(query: str, retrieved_context: str, generated_answer: str):\n",
    "    chat_response = client.chat.parse(\n",
    "        model=model,\n",
    "        messages=[\n",
    "            {\n",
    "                \"role\": \"system\",\n",
    "                \"content\": (\n",
    "                    \"You are a judge for evaluating a Retrieval-Augmented Generation (RAG) system. \"\n",
    "                    \"Evaluate the context relevance, answer relevance, and groundedness based on the following criteria: \"\n",
    "                    \"Provide a reasoning and a score as a string between '0' and '3' for each criterion. \"\n",
    "                    \"Context Relevance: How relevant is the retrieved context to the query? \"\n",
    "                    \"Answer Relevance: How relevant is the generated answer to the query? \"\n",
    "                    \"Groundedness: How faithful is the generated answer to the retrieved context?\"\n",
    "                )\n",
    "            },\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": f\"Query: {query}\\nRetrieved Context: {retrieved_context}\\nGenerated Answer: {generated_answer}\"\n",
    "            },\n",
    "        ],\n",
    "        response_format=RAGEvaluation,\n",
    "        temperature=0\n",
    "    )\n",
    "    return chat_response.choices[0].message.parsed\n",
    "\n",
    "# Example usage\n",
    "query = \"What are the benefits of renewable energy?\"\n",
    "retrieved_context = \"Renewable energy includes solar, wind, hydro, and geothermal energy, which are naturally replenished.\"\n",
    "generated_answer = \"Renewable energy sources like solar and wind are environmentally friendly and reduce carbon emissions.\"\n",
    "evaluation = evaluate_rag(query, retrieved_context, generated_answer)\n",
    "\n",
    "# Print the evaluation\n",
    "print(\"🏆 RAG Evaluation:\")\n",
    "print(\"\\nCriteria: Context Relevance\")\n",
    "print(f\"Reasoning: {evaluation.context_relevance.explanation}\")\n",
    "print(f\"Score: {evaluation.context_relevance.score.value}/3\")\n",
    "\n",
    "print(\"\\nCriteria: Answer Relevance\")\n",
    "print(f\"Reasoning: {evaluation.answer_relevance.explanation}\")\n",
    "print(f\"Score: {evaluation.answer_relevance.score.value}/3\")\n",
    "\n",
    "print(\"\\nCriteria: Groundedness\")\n",
    "print(f\"Reasoning: {evaluation.groundedness.explanation}\")\n",
    "print(f\"Score: {evaluation.groundedness.score.value}/3\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9_f35mXT9jVF"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}