ā Back to Cookbook
mistral reference rag
Details
File: mistral/rag/mistral-reference-rag.ipynb
Type: Jupyter Notebook
Use Cases: RAG, Reference
Content
Notebook content (JSON format):
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Web Search with References " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "The primary objective of this cookbook is to illustrate how to effectively use the lastest [Mistral Large 2](https://docs.mistral.ai/getting-started/models/models_overview/#premier-models) model to conduct **web searches** and incorporate relevant sources into your responses. A common challenge with chatbots and Retrieval-Augmented Generation (RAG) systems is their **tendency to hallucinate sources or improperly format URLs**. Mistral's advanced capabilities address these issues, ensuring accurate and reliable information retrieval.\n", "\n", "## Mistral's Web Search Capabilities\n", "The new Mistral model `mistral-large-latest` integrates web search capabilities, allowing it to **reference sources accurately in its responses**. This feature enables you to retrieve the source content and present it correctly in your responses, enhancing the reliability and credibility of the information provided. By leveraging Mistral's advanced natural language processing and web search integration, you can build more robust and trustworthy applications.\n", "\n", "\n", "\n", "\n", "Here is a step-by-step description of the process depicted in the image above:\n", "1. **Query Initiation**: The process begins with a user query.\n", "\n", "2. **Function Calling with Mistral Large**: The query is processed by the Mistral Large model, which identifies that it needs to perform a function call to gather more information. This step involves determining the appropriate tool to use for the query.\n", "\n", "3. **Tool Identification**: The Mistral model identifies the relevant tool for the query, which in this case is `web_search_wikipedia`. The tool has the user query as an argument.\n", "\n", "4. **Wikipedia Search**: The tool is called and performs a search on Wikipedia using the query. \n", "\n", "5. **Extract Relevant Chunks**: The results from the Wikipedia search are processed to extract relevant chunks of information. These chunks are then prepared to be used as references in the final answer.\n", "\n", "6. **Final Answer with References**: The chat history is sent to the Mistral Large model which uses the extracted chunks to generate a final answer. The answer includes references to the Wikipedia articles, ensuring that the information provided is accurate and well-sourced.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install mistralai==1.2.3 wikipedia==1.4.0\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Step 1: Initialize the Mistral client\n", "\n", "In this step, we initialize the Mistral client with your API key. You can get or create your API key from the [Mistral API dashboard](https://console.mistral.ai/api-keys/). **Warning**: API Key can take up to 1 minute to be activated.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mistralai import Mistral\n", "from mistralai.models import UserMessage, SystemMessage\n", "import os\n", "\n", "client = Mistral(\n", " api_key=os.environ[\"MISTRAL_API_KEY\"],\n", ")\n", "\n", "\n", "query = \"Who won the Nobel Peace Prize in 2024?\"\n", "\n", "#Add the user message to the chat_history\n", "chat_history = [\n", " SystemMessage(content=\"You are a helpful assistant that can search the web for information. Use context to answer the question.\"),\n", " UserMessage(content=query),\n", "]\n", "print(chat_history)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[SystemMessage(content='You are a helpful assistant that can search the web for information. Use context to answer the question.', role='system'), UserMessage(content='Who won the Nobel Peace Prize in 2024?', role='user')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 : Define the function calling tool to search Wikipedia.\n", "\n", "[Function calling](https://docs.mistral.ai/capabilities/function_calling/) allows Mistral models to connect to external tools. By integrating Mistral models with external tools such as user defined functions or APIs, users can easily build applications catering to specific use cases and practical problems.\n", "\n", "First, we create a tool that will search the Wikipedia API and return the results in a specific format. Once we have the tool, we can use it in a chat completion request to Mistral. The result should contain:\n", "\n", "- Name of the tool\n", "- Tool call ID\n", "- Arguments which contains the user query\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "web_search_tool = {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"web_search\",\n", " \"description\": \"Search the web for a query for which you do not know the answer\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"query\": {\n", " \"type\": \"string\",\n", " \"description\": \"Query to search the web in keyword form.\",\n", " }\n", " },\n", " \"required\": [\"query\"],\n", " },\n", " },\n", "}\n", "\n", "\n", "\n", "chat_response = client.chat.complete(\n", " model=\"mistral-large-latest\",\n", " messages=chat_history,\n", " tools=[web_search_tool],\n", ")\n", "\n", "\n", "if hasattr(chat_response.choices[0].message, 'tool_calls'):\n", " tool_call = chat_response.choices[0].message.tool_calls[0]\n", " chat_history.append(chat_response.choices[0].message)\n", " print(tool_call)\n", "else:\n", " print(\"No tool call found in the response\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "function=FunctionCall(name='web_search', arguments='{\"query\": \"Who won the Nobel Peace Prize in 2024?\"}') id='3xdgHbIKY' type='function'\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Define Method to Search Wikipedia Associated with the Tool\n", "\n", "In the previous step, we created a tool called `web_search_wikipedia`. We need to create a function that will take the tool call ID and the arguments and return the results in the specific format.\n", "\n", "The format of the results should be:\n", "```python\n", "{\n", " \"url\": str | None, # Page URL\n", " \"title\": str | None, # Page title \n", " \"description\": str | None, # Page description\n", " \"snippets\": List[str], # Relevant text snippets in a list\n", " \"date\": str | None, # date\n", " \"source\": str | None, # Source/reference\n", " \"metadata\": Dict[str, Any] # Metadata\n", "}\n", "```" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import wikipedia\n", "import json\n", "from datetime import datetime\n", "\n", "\n", "def get_wikipedia_search(query: str) -> str:\n", " \"\"\"\n", " Search Wikipedia for a query and return the results in a specific format.\n", " \"\"\"\n", " result = wikipedia.search(query, results = 5)\n", " data={}\n", " for i, res in enumerate(result):\n", " pg= wikipedia.page(res, auto_suggest=False)\n", " data[i]={\n", " \"url\": pg.url,\n", " \"title\": pg.title,\n", " \"snippets\": [pg.summary.split('.')],\n", " \"description\": None,\n", " \"date\": datetime.now().isoformat(),\n", " \"source\": \"wikipedia\"\n", " }\n", " return json.dumps(data, indent=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Perform the Tool Call and Search Wikipedia\n", "Now that we have the tool call ID and the arguments, we can perform the tool call and search Wikipedia." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "import json\n", "from mistralai import ToolMessage\n", "\n", "\n", "query = json.loads(tool_call.function.arguments)[\"query\"]\n", "wb_result = get_wikipedia_search(query)\n", "\n", "tool_call_result = ToolMessage(\n", " content=wb_result,\n", " tool_call_id=tool_call.id,\n", " name=tool_call.function.name,\n", ")\n", "\n", "\n", "# Append the tool call message to the chat_history\n", "chat_history.append(tool_call_result)\n", "\n", "#See chunks in the response\n", "print(json.dumps(json.loads(wb_result), indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "{\n", " \"0\": {\n", " \"url\": \"https://en.wikipedia.org/wiki/2024_Nobel_Peace_Prize\",\n", " \"title\": \"2024 Nobel Peace Prize\",\n", " \"snippets\": [\n", " [\n", " \"The 2024 Nobel Peace Prize, an international peace prize established according to Alfred Nobel's will, was awarded to Nihon Hidankyo (the Japan Confederation of A- and H-Bomb Sufferers Organizations), for their activism against nuclear weapons, assisted by victim/survivors (known as Hibakusha) of the atomic bombings of Hiroshima and Nagasaki in 1945\",\n", " \" They will receive the prize at a ceremony on 10 December 2024 at Oslo, Norway\",\n", " \"\"\n", " ]\n", " ],\n", " \"description\": null,\n", " \"date\": \"2024-11-26T17:39:55.057454\",\n", " \"source\": \"wikipedia\"\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Call Mistral with the Tool Call Result\n", "The chat history now contains:\n", "\n", "- The `System` message which contains the instructions for the assistant\n", "- The `User` message which contains the original question\n", "- The `Assistant` message which contains a tool call to search Wikipedia\n", "- The `Tool call` result which contains the results of the Wikipedia search\n", "\n", "See more information about types of messages [here](https://docs.mistral.ai/capabilities/completion/#chat-messages).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for msg in chat_history:\n", " print(msg,end='\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "content='You are a helpful assistant that can search the web for information. Use context to answer the question.' role='system'\n", "content='Who won the Nobel Peace Prize in 2024?' role='user'\n", "content='' tool_calls=[ToolCall(function=FunctionCall(name='web_search', arguments='{\"query\": \"Who won the Nobel Peace Prize in 2024?\"}'), id='3xdgHbIKY', type='function')] prefix=False role='assistant'\n", "content='{\\n \"0\": {\\n \"url\": \"https://en.wikipedia.org/wiki/2024_Nobel_Peace_Prize\",\\n \"title\": \"2024 Nobel Peace Prize\",\\n \"snippets\": [\\n [\\n \"The 2024 Nobel Peace Prize, an international peace prize established according to Alfred Nobel\\'s will, was awarded to Nihon Hidankyo (the Japan Confederation of A- and H-Bomb Sufferers Organizations), for their activism against nuclear weapons, assisted by victim/survivors (known as Hibakusha) of the atomic bombings of Hiroshima and Nagasaki in 1945\",\\n \" They will receive the prize at a ceremony on 10 December 2024 at Oslo, Norway\",\\n \"\"\\n ]\\n ],\\n \"description\": null,\\n \"date\": \"2024-11-26T17:39:55.057454\",\\n \"source\": \"wikipedia\"\\n }', tool_call_id='3xdgHbIKY' name='web_search' role='tool'}'\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mistralai.models import TextChunk, ReferenceChunk\n", "\n", "def format_response(chat_response: list, wb_result:dict):\n", " print(\"\\nš¤ Answer:\\n\")\n", " refs_used = []\n", " \n", " # Print the main response\n", " for chunk in chat_response.choices[0].message.content:\n", " if isinstance(chunk, TextChunk):\n", " print(chunk.text, end=\"\")\n", " elif isinstance(chunk, ReferenceChunk):\n", " refs_used += chunk.reference_ids\n", " \n", " \n", " # Print references\n", " if refs_used:\n", " print(\"\\n\\nš Sources:\")\n", " for i, ref in enumerate(set(refs_used), 1):\n", " reference = json.loads(wb_result)[str(ref)]\n", " print(f\"\\n{i}. {reference['title']}: {reference['url']}\")\n", " \n", "\n", "# Use the formatter\n", "chat_response = client.chat.complete(\n", " model=\"mistral-large-latest\",\n", " messages=chat_history,\n", " tools=[web_search_tool],\n", ")\n", "format_response(chat_response, wb_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "š¤ Answer:\n", "\n", "The 2024 Nobel Peace Prize was awarded to Nihon Hidankyo, the Japan Confederation of A- and H-Bomb Sufferers Organizations, for their activism against nuclear weapons, supported by survivors of the 1945 atomic bombings of Hiroshima and Nagasaki. The award ceremony will take place on December 10, 2024, in Oslo, Norway.\n", "\n", "š Sources:\n", "\n", "1. 2024 Nobel Peace Prize : https://en.wikipedia.org/wiki/2024_Nobel_Peace_Prize\n", "\n", "2. Nobel Peace Prize: https://en.wikipedia.org/wiki/Nobel_Peace_Prize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6 : Streaming completion with references\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stream_response = client.chat.stream(\n", " model=\"mistral-large-2411\",\n", " messages=chat_history,\n", " tools=[web_search_tool],\n", ")\n", "\n", "last_reference_index = 0\n", "if stream_response is not None:\n", " for event in stream_response:\n", " chunk = event.data.choices[0]\n", " if chunk.delta.content:\n", " if isinstance(chunk.delta.content, list):\n", " # Check if TYPE of chunk is a reference\n", " references_ids = [\n", " ref_id\n", " for chunk_elem in chunk.delta.content\n", " if chunk_elem.TYPE == \"reference\"\n", " for ref_id in chunk_elem.reference_ids\n", " ]\n", " last_reference_index += len(references_ids)\n", "\n", " # Map the references ids to the references data stored in the chat history\n", " references_data = [json.loads(wb_result)[str(ref_id)] for ref_id in references_ids]\n", " urls = \" \" + \", \".join(\n", " [\n", " f\"[{i}]({reference['url']})\"\n", " for i, reference in enumerate(\n", " references_data,\n", " start=last_reference_index - len(references_ids) + 1,\n", " )\n", " ]\n", " )\n", " print(urls, end=\"\")\n", " else:\n", " print(chunk.delta.content, end=\"\")\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 2024 Nobel Peace Prize was awarded to Nihon Hidankyo (the Japan Confederation of A- and H-Bomb Sufferers Organizations) for their activism against nuclear weapons, \n", "assisted by victim/survivors (known as Hibakusha) of the atomic bombings of Hiroshima and Nagasaki in 1945 [1](https://en.wikipedia.org/wiki/2024_Nobel_Peace_Prize), [2](https://en.wikipedia.org/wiki/Nobel_Peace_Prize).\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 2 }