Reach out
← Back to Cookbook

milvus rag french parliament

Details

File: third_party/Milvus/milvus_rag_french_parliament.ipynb

Type: Jupyter Notebook

Use Cases: RAG

Integrations: Milvus

Content

Notebook content (JSON format):

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build a RAG application with Milvus Lite, Mistral and Llama-index\n",
    "\n",
    "In this notebook, we are showing how you can build a Retrieval Augmented Generation (RAG) application to interact with data from the French Parliament. It uses Ollama with Mistral for LLM operations, Llama-index for orchestration, and [Milvus](https://milvus.io/) for vector storage.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install Ollama\n",
    "\n",
    "Make sure to have Ollama installed and Running on your laptop --> https://ollama.com/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install the different dependencies "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U pymilvus ollama llama-index-llms-ollama llama-index-vector-stores-milvus llama-index-readers-file llama-index-embeddings-mistralai llama-index-llms-mistralai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download data\n",
    "\n",
    "Note: Run this cell only if you haven't cloned the repository."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget 'https://raw.githubusercontent.com/mistralai/cookbook/main/third_party/Milvus/data/french_parliament_discussion.xml' -O './data/french_parliament_discussion.xml'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use Mistral Embedding\n",
    "\n",
    "Make sure to create an [API Key](https://console.mistral.ai/api-keys/) on Mistral's platform and load it as an environment variable.\n",
    "\n",
    "On this tutorial, we are loading the environment variable stored in our `.env` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "import os\n",
    "load_dotenv()\n",
    "\n",
    "MISTRAL_API_KEY = os.environ.get(\"MISTRAL_API_KEY\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.mistralai import MistralAIEmbedding\n",
    "\n",
    "model_name = \"mistral-embed\"\n",
    "embed_model = MistralAIEmbedding(model_name=model_name, api_key=MISTRAL_API_KEY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare out data to be stored in Milvus\n",
    "\n",
    "This code makes it possible to process text embeddings using Mistral Embed & Mistral-7B and store those in Milvus.\n",
    "\n",
    "**!!Make sure to have Ollama running on your laptop!!**\n",
    "\n",
    "* Initialises Mistral-7B model using Ollama\n",
    "* Service Context: Configures a service context with Mistral and the embedding model defined above\n",
    "* Vector Store: Sets up a collection in Milvus to store text embeddings, specifying the database file, collection name, vector dimensions\n",
    "* Storage Context: Configures a storage context with the Milvus vector store\n",
    "\n",
    "This makes it possible to have efficient storage and retrieval of vector embeddings for text data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/ravitheja/Desktop/mistral/lib/python3.12/site-packages/milvus_lite/__init__.py:15: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n",
      "  from pkg_resources import DistributionNotFound, get_distribution\n",
      "2025-07-14 15:21:25,453 [DEBUG][_create_connection]: Created new connection using: async-milvus_mistral_rag.db (async_milvus_client.py:599)\n"
     ]
    }
   ],
   "source": [
    "from llama_index.llms.ollama import Ollama\n",
    "from llama_index.vector_stores.milvus import MilvusVectorStore\n",
    "\n",
    "from llama_index.core import StorageContext, Settings\n",
    "\n",
    "llm = Ollama(model=\"mistral\", request_timeout=120.0)\n",
    "\n",
    "Settings.llm = Ollama(model=\"mistral\", request_timeout=120.0)\n",
    "Settings.embed_model = embed_model\n",
    "Settings.chunk_size = 350\n",
    "Settings.chunk_overlap = 20\n",
    "\n",
    "vector_store = MilvusVectorStore(\n",
    "    uri=\"milvus_mistral_rag.db\",\n",
    "    collection_name=\"mistral_french_parliament\",\n",
    "    dim=1024, \n",
    "    overwrite=True  # drop table if exist and then create\n",
    "    \n",
    "    )\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using Mistral AI API\n",
    "\n",
    "If you prefer not to run models locally or need more powerful models, you can use Mistral's API instead of Ollama. The API offers:\n",
    "- Access to more powerful models like `mistral-large` and `mistral-small`\n",
    "- No local GPU/CPU requirements\n",
    "- Consistent performance and reliability\n",
    "- Production-ready deployment\n",
    "\n",
    "Make sure to create an [API Key](https://console.mistral.ai/api-keys/) on Mistral's platform first.\n",
    "```python\n",
    "from llama_index.llms.mistralai import MistralAI\n",
    "\n",
    "# Initialize Mistral LLM\n",
    "mistral_llm = MistralAI(api_key=MISTRAL_API_KEY, model=\"mistral-7B\")\n",
    "\n",
    "# Configure settings for Mistral\n",
    "Settings.llm = mistral_llm\n",
    "```\n",
    "\n",
    "The rest of the setup using Milvus would stay the same."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Process and load the Data "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n",
    "\n",
    "docs = SimpleDirectoryReader(input_files=['data/french_parliament_discussion.xml']).load_data()\n",
    "vector_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.tools import RetrieverTool, ToolMetadata\n",
    "\n",
    "milvus_tool_openai = RetrieverTool(\n",
    "    retriever=vector_index.as_retriever(similarity_top_k=3),  # retrieve top_k results\n",
    "    metadata=ToolMetadata(\n",
    "        name=\"CustomRetriever\",\n",
    "        description='Retrieve relevant information from provided documents.'\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Finally, ask questions to our RAG system"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The conversation in the French parliament centered around a motion and a method for action regarding the seventh wave of some issue. There was criticism towards the chosen method being considered as \"peu efficace\" (ineffective) and \"très disproportionnée\" (highly disproportionate). Additionally, there were comments about the parliament not acting democratically and without consulting other parties when it comes to implementing certain measures like the passe sanitaire or vaccinal. The session ended with applause from some groups, specifically LFI-NUPES.\n"
     ]
    }
   ],
   "source": [
    "query_engine = vector_index.as_query_engine()\n",
    "response = query_engine.query(\"What did the French parliament talk about the last time?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### If you like this tutorial, feel free to reach out on [LinkedIn](https://www.linkedin.com/in/stephen-batifol/), check out [Milvus](https://github.com/milvus-io/milvus) and join our [Discord](https://discord.gg/FG6hMJStWu)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "mistral",
   "language": "python",
   "name": "mistral"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}