azure ai search rag

Details

File: third_party/Azure_AI_Search/azure_ai_search_rag.ipynb
Type: Jupyter Notebook
Use Cases: Search, RAG
Integrations: Azure
Content

Notebook content (JSON format):
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RAG with Mistral AI, Azure AI Search and Azure AI Studio\n",
    "\n",
    "## Overview\n",
    "\n",
    "This notebook demonstrates how to integrate Mistral Embeddings with Azure AI Search as a vector store, and use the results to ground responses in the Mistral Chat Completion Model.\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "- Mistral AI API Key OR Azure AI Studio Deployed Mistral Chat Completion Model and Azure AI Studio API Key\n",
    "- Azure AI Search service\n",
    "- Python 3.x environment with necessary libraries installed\n",
    "\n",
    "## Steps\n",
    "\n",
    "1. Install required packages\n",
    "2. Load data and generate Mistral embeddings\n",
    "3. Index embeddings in Azure AI Search\n",
    "4. Perform search using Azure AI Search\n",
    "5. Ground search results in Mistral Chat Completion Model\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install Required Packages\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install Required Packages\n",
    "!pip install azure-search-documents==11.5.1\n",
    "!pip install azure-identity==1.16.0 datasets==2.19.1 mistralai==1.0.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data and Generate Mistral Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "data = load_dataset(\n",
    "    \"jamescalam/ai-arxiv2-semantic-chunks\",\n",
    "    split=\"train[:10000]\"\n",
    ")\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have 10K chunks, where each chunk is roughly the length of 1-2 paragraphs in length. Here is an example of a single record:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': '2401.04088#0',\n",
       " 'title': 'Mixtral of Experts',\n",
       " 'content': '4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine- tuned to follow instructions, Mixtral 8x7B â Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B â chat model on human bench- marks. Both the base and instruct models are released under the Apache 2.0 license.',\n",
       " 'prechunk_id': '',\n",
       " 'postchunk_id': '2401.04088#1',\n",
       " 'arxiv_id': '2401.04088',\n",
       " 'references': ['1905.07830']}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Format the data into the format we need, this will contain `id`, `title`, `content` (which we will embed), and `arxiv_id`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['id', 'title', 'content', 'arxiv_id'],\n",
       "    num_rows: 10000\n",
       "})"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.remove_columns([\"prechunk_id\", \"postchunk_id\", \"references\"])\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to define an embedding model to create our embedding vectors for retrieval, for that we will be using Mistral AI's `mistral-embed`. There is some cost associated with this model, so be aware of that (costs for running this notebook are <$1).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from mistralai import Mistral\n",
    "\n",
    "import getpass  # for securely inputting API key\n",
    "\n",
    "# Fetch the API key from environment variable or prompt the user\n",
    "mistral_api_key = os.getenv(\"MISTRAL_API_KEY\") or getpass.getpass(\"Enter your Mistral API key: \")\n",
    "\n",
    "# Initialize the Mistral client\n",
    "mistral = Mistral(api_key=mistral_api_key)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "embed_model = \"mistral-embed\"\n",
    "\n",
    "embeds = mistral.embeddings.create(\n",
    "    model=embed_model, inputs=[\"this is a test\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can view the dimensionality of our returned embeddings, which we'll need soon when initializing our vector index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1024"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dims = len(embeds.data[0].embedding)\n",
    "dims"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index Embeddings into Azure AI Search\n",
    "\n",
    "Now we create our vector DB to store our vectors. For this, we need to set up an [Azure AI Search service](https://portal.azure.com/#create/Microsoft.Search).\n",
    "\n",
    "There are two ways to authenticate to Azure AI Search:\n",
    "\n",
    "1. **Service Key**: The service key can be found in the \"Settings -> Keys\" section in the left navbar of the Azure portal dashboard. Make sure to select the ADMIN key.\n",
    "2. **Managed Identity**: Using Microsoft Entra ID (f.k.a. Azure Active Directory) is a more secure and recommended way to authenticate. You can follow the instructions in the [official Microsoft documentation](https://learn.microsoft.com/azure/search/search-security-rbac) to set up Managed Identity.\n",
    "\n",
    "For more detailed instructions on creating an Azure AI Search service, please refer to the [official Microsoft documentation](https://learn.microsoft.com/azure/search/search-create-service-portal).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Authenticate into Azure AI Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using AAD for authentication.\n"
     ]
    }
   ],
   "source": [
    "from getpass import getpass\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "import os\n",
    "\n",
    "# Configuration variable\n",
    "USE_AAD_FOR_SEARCH = True\n",
    "SEARCH_SERVICE_ENDPOINT = os.getenv(\"SEARCH_SERVICE_ENDPOINT\") or getpass(\"Enter your Azure AI Search Service Endpoint: \")\n",
    "\n",
    "def authenticate_azure_search(use_aad_for_search=False):\n",
    "    if use_aad_for_search:\n",
    "        print(\"Using AAD for authentication.\")\n",
    "        credential = DefaultAzureCredential()\n",
    "    else:\n",
    "        print(\"Using API keys for authentication.\")\n",
    "        api_key = os.getenv(\"SEARCH_SERVICE_API_KEY\") or getpass(\"Enter your Azure AI Search Service API Key: \")\n",
    "        if api_key is None:\n",
    "            raise ValueError(\"API key must be provided if not using AAD for authentication.\")\n",
    "        credential = AzureKeyCredential(api_key)\n",
    "    return credential\n",
    "\n",
    "azure_search_credential = authenticate_azure_search(\n",
    "    use_aad_for_search=USE_AAD_FOR_SEARCH\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a vector index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ai-arxiv2-semantic-chunks created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SimpleField, SearchFieldDataType, SearchableField, SearchField,\n",
    "    VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile,\n",
    "    SemanticConfiguration, SemanticPrioritizedFields, SemanticField,\n",
    "    SemanticSearch, SearchIndex\n",
    ")\n",
    "\n",
    "DIMENSIONS = 1024\n",
    "HNSW_PARAMETERS = {\"m\": 4, \"metric\": \"cosine\", \"ef_construction\": 400, \"ef_search\": 500}\n",
    "INDEX_NAME = \"ai-arxiv2-semantic-chunks\"\n",
    "\n",
    "# Create a search index\n",
    "index_client = SearchIndexClient(endpoint=SEARCH_SERVICE_ENDPOINT, credential=azure_search_credential)\n",
    "fields = [\n",
    "    SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True, sortable=False, filterable=True, facetable=False),\n",
    "    SearchableField(name=\"title\", type=SearchFieldDataType.String),\n",
    "    SearchableField(name=\"content\", type=SearchFieldDataType.String),\n",
    "    SearchableField(name=\"arxiv_id\", type=SearchFieldDataType.String, filterable=True),\n",
    "    SearchField(name=\"embedding\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
    "                searchable=True, vector_search_dimensions=DIMENSIONS, vector_search_profile_name=\"myHnswProfile\", hidden=False)\n",
    "]\n",
    "\n",
    "vector_search = VectorSearch(\n",
    "    algorithms=[HnswAlgorithmConfiguration(name=\"myHnsw\", parameters=HNSW_PARAMETERS)],\n",
    "    profiles=[VectorSearchProfile(name=\"myHnswProfile\", algorithm_configuration_name=\"myHnsw\")]\n",
    ")\n",
    "\n",
    "semantic_config = SemanticConfiguration(\n",
    "    name=\"my-semantic-config\",\n",
    "    prioritized_fields=SemanticPrioritizedFields(\n",
    "        title_field=SemanticField(field_name=\"title\"),\n",
    "        keywords_fields=[SemanticField(field_name=\"arxiv_id\")],\n",
    "        content_fields=[SemanticField(field_name=\"content\")]\n",
    "    )\n",
    ")\n",
    "\n",
    "semantic_search = SemanticSearch(configurations=[semantic_config])\n",
    "index = SearchIndex(name=INDEX_NAME, fields=fields, vector_search=vector_search, semantic_search=semantic_search)\n",
    "result = index_client.create_or_update_index(index)\n",
    "print(f\"{result.name} created\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimate Cost for Embedding Generation\n",
    "\n",
    "As per the information from [Lunary.ai's Mistral Tokenizer](https://lunary.ai/mistral-tokenizer), one token is approximately equivalent to five characters of text. \n",
    "\n",
    "According to [Mistral's Pricing](https://mistral.ai/technology/#pricing), the cost for using `mistral-embed` is $0.1 per 1M tokens for both inputs and outputs.\n",
    "\n",
    "In the following code block, we will calculate the estimated cost for generating embeddings based on the size of our dataset and these pricing details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Estimated cost for generating embeddings: $0.19047898000000002\n"
     ]
    }
   ],
   "source": [
    "# Estimate cost for generating embeddings\n",
    "def estimate_cost(data, cost_per_million_tokens=0.1):\n",
    "    total_characters = sum(len(entry['content']) for entry in data)\n",
    "    total_tokens = total_characters / 5  # 1 token is approximately 5 characters\n",
    "    total_cost = (total_tokens / 1_000_000) * cost_per_million_tokens\n",
    "    return total_cost\n",
    "\n",
    "estimated_cost = estimate_cost(data)\n",
    "print(f\"Estimated cost for generating embeddings: ${estimated_cost}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Transform Dataset for Azure AI Search Upload"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to transform your dataset into the format required by Azure AI Search\n",
    "def transform_to_search_document(record):\n",
    "    return {\n",
    "        \"id\": record[\"id\"],\n",
    "        \"arxiv_id\": record[\"arxiv_id\"],\n",
    "        \"title\": record[\"title\"],\n",
    "        \"content\": record[\"content\"]\n",
    "    }\n",
    "    \n",
    "# Transform all documents in the dataset\n",
    "transformed_documents = [transform_to_search_document(doc) for doc in data]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_embeddings(documents, model, batch_size=20):\n",
    "    for i in range(0, len(documents), batch_size):\n",
    "        batch = documents[i:i + batch_size]\n",
    "        contents = [doc['content'] for doc in batch]\n",
    "        embeds = mistral.embeddings.create(model=model, inputs=contents)\n",
    "        for j, document in enumerate(batch):\n",
    "            document['embedding'] = embeds.data[j].embedding\n",
    "    return documents\n",
    "\n",
    "embed_model = \"mistral-embed\"\n",
    "generate_embeddings(transformed_documents, embed_model)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Azure AI Search doesn't allow certain unsafe keys so we'll base64 encode `id` here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "\n",
    "# Base64 encode IDs for Azure AI Search compatibility\n",
    "def encode_key(key):\n",
    "    return base64.urlsafe_b64encode(key.encode()).decode()\n",
    "\n",
    "for document in transformed_documents:\n",
    "    document['id'] = encode_key(document['id'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploaded 10000 documents in total\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents import SearchIndexingBufferedSender\n",
    "\n",
    "# Upload documents\n",
    "def upload_documents(index_name, endpoint, credential, documents):\n",
    "    buffered_sender = SearchIndexingBufferedSender(endpoint=endpoint, index_name=index_name, credential=credential)\n",
    "    for document in documents:\n",
    "        buffered_sender.merge_or_upload_documents(documents=[document])\n",
    "    buffered_sender.flush()\n",
    "    print(f\"Uploaded {len(documents)} documents in total\")\n",
    "\n",
    "upload_documents(INDEX_NAME, SEARCH_SERVICE_ENDPOINT, azure_search_credential, transformed_documents)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a Vector Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ID: MjMxMC4wNjgyNSMw\n",
      "Arxiv ID: 2310.06825\n",
      "Title: Mistral 7B\n",
      "Score: 0.79391503\n",
      "Content: 3 2 0 2 t c O 0 1 ] L C . s c [ 1 v 5 2 8 6 0 . 0 1 3 2 : v i X r a # Mistral 7B Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, LÃ©lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mistral 7B, a 7â billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms the best open 13B model (Llama 2) across all evaluated benchmarks, and the best released 34B model (Llama 1) in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B â Instruct, that surpasses Llama 2 13B â chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license. Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/announcing-mistral-7b/ # Introduction In the rapidly evolving domain of Natural Language Processing (NLP), the race towards higher model performance often necessitates an escalation in model size. However, this scaling tends to increase computational costs and inference latency, thereby raising barriers to deployment in practical, real-world scenarios. In this context, the search for balanced models delivering both high-level performance and efficiency becomes critically essential. Our model, Mistral 7B, demonstrates that a carefully designed language model can deliver high performance while maintaining an efficient inference.\n",
      "--------------------------------------------------\n",
      "ID: MjMxMC4wNjgyNSMx\n",
      "Arxiv ID: 2310.06825\n",
      "Title: Mistral 7B\n",
      "Score: 0.7921863\n",
      "Content: Mistral 7B outperforms the previous best 13B model (Llama 2, [26]) across all tested benchmarks, and surpasses the best 34B model (LLaMa 34B, [25]) in mathematics and code generation. Furthermore, Mistral 7B approaches the coding performance of Code-Llama 7B [20], without sacrificing performance on non-code related benchmarks. Mistral 7B leverages grouped-query attention (GQA) [1], and sliding window attention (SWA) [6, 3]. GQA significantly accelerates the inference speed, and also reduces the memory requirement during decoding, allowing for higher batch sizes hence higher throughput, a crucial factor for real-time applications. In addition, SWA is designed to handle longer sequences more effectively at a reduced computational cost, thereby alleviating a common limitation in LLMs. These attention mechanisms collectively contribute to the enhanced performance and efficiency of Mistral 7B. Mistral 7B is released under the Apache 2.0 license. This release is accompanied by a reference implementation1 facilitating easy deployment either locally or on cloud platforms such as AWS, GCP, or Azure using the vLLM [17] inference server and SkyPilot 2. Integration with Hugging Face 3 is also streamlined for easier integration. Moreover, Mistral 7B is crafted for ease of fine-tuning across a myriad of tasks. As a demonstration of its adaptability and superior performance, we present a chat model fine-tuned from Mistral 7B that significantly outperforms the Llama 2 13B â Chat model. Mistral 7B takes a significant step in balancing the goals of getting high performance while keeping large language models efficient. Through our work, our aim is to help the community create more affordable, efficient, and high-performing language models that can be used in a wide range of real-world applications.\n",
      "--------------------------------------------------\n",
      "ID: MjQwMS4wNDA4OCMx\n",
      "Arxiv ID: 2401.04088\n",
      "Title: Mixtral of Experts\n",
      "Score: 0.7851202\n",
      "Content: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/ # Introduction In this paper, we present Mixtral 8x7B, a sparse mixture of experts model (SMoE) with open weights, licensed under Apache 2.0. Mixtral outperforms Llama 2 70B and GPT-3.5 on most benchmarks. As it only uses a subset of its parameters for every token, Mixtral allows faster inference speed at low batch-sizes, and higher throughput at large batch-sizes. Mixtral is a sparse mixture-of-experts network. It is a decoder-only model where the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the â\n",
      "--------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents import SearchClient\n",
    "\n",
    "search_client = SearchClient(endpoint=SEARCH_SERVICE_ENDPOINT, index_name=INDEX_NAME, credential=azure_search_credential)\n",
    "from azure.search.documents.models import VectorizedQuery\n",
    "\n",
    "# Generate query embedding and perform search\n",
    "def generate_query_embedding(query):\n",
    "    embed = mistral.embeddings.create(model=\"mistral-embed\", inputs=[query])\n",
    "    return embed.data[0].embedding\n",
    "\n",
    "query = \"where is Mistral AI headquartered?\"\n",
    "vector_query = VectorizedQuery(vector=generate_query_embedding(query), k_nearest_neighbors=3, fields=\"embedding\")\n",
    "results = search_client.search(search_text=None, vector_queries=[vector_query], select=[\"id\", \"arxiv_id\", \"title\", \"content\"])\n",
    "\n",
    "for result in results:\n",
    "    print(f\"ID: {result['id']}\\nArxiv ID: {result['arxiv_id']}\\nTitle: {result['title']}\\nScore: {result['@search.score']}\\nContent: {result['content']}\\n{'-' * 50}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ground retrieved results from Azure AI Search to Mistral-Large LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mistral AI is headquartered in Paris, France.\n"
     ]
    }
   ],
   "source": [
    "from mistralai import Mistral, SystemMessage, UserMessage\n",
    "\n",
    "# Initialize the client\n",
    "client = Mistral(api_key=mistral_api_key)\n",
    "context = \"\\n---\\n\".join([f\"ID: {result['id']}\\nArxiv ID: {result['arxiv_id']}\\nTitle: {result['title']}\\nScore: {result['@search.score']}\\nContent: {result['content']}\" for result in results])\n",
    "system_message = SystemMessage(content=\"You are a helpful assistant that answers questions about AI using the context provided below.\\n\\nCONTEXT:\\n\" + context)\n",
    "user_message = UserMessage(content=\"where is Mistral AI headquartered?\")\n",
    "\n",
    "# Generate the response\n",
    "messages = [system_message, user_message]\n",
    "chat_response = client.chat.complete(model=\"mistral-large-latest\", messages=messages, max_tokens=50)\n",
    "\n",
    "print(chat_response.choices[0].message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ground Results to Mistral-Large hosted in Azure AI Studio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mistral AI is headquartered in Paris, France.\n"
     ]
    }
   ],
   "source": [
    "import getpass\n",
    "from mistralai import Mistral, SystemMessage, UserMessage\n",
    "\n",
    "azure_ai_studio_mistral_base_url = os.getenv(\"AZURE_AI_STUDIO_MISTRAL_BASE_URL\") or getpass.getpass(\"Enter your Azure Mistral Deployed Endpoint Base URL: \")\n",
    "azure_ai_studio_mistral_api_key = os.getenv(\"AZURE_AI_STUDIO_MISTRAL_API_KEY\") or getpass.getpass(\"Enter your Azure Mistral API Key: \")\n",
    "\n",
    "# Initialize the client for Azure AI Studio\n",
    "client = Mistral(endpoint=azure_ai_studio_mistral_base_url, api_key=azure_ai_studio_mistral_api_key)\n",
    "context = \"\\n---\\n\".join([f\"ID: {result['id']}\\nArxiv ID: {result['arxiv_id']}\\nTitle: {result['title']}\\nScore: {result['@search.score']}\\nContent: {result['content']}\" for result in results])\n",
    "system_message = SystemMessage(content=\"You are a helpful assistant that answers questions about AI using the context provided below.\\n\\nCONTEXT:\\n\" + context)\n",
    "user_message = UserMessage(content=\"where is Mistral AI headquartered?\")\n",
    "\n",
    "# Generate the response\n",
    "messages = [system_message, user_message]\n",
    "chat_response = client.chat.complete(model=\"azureai\", messages=messages, max_tokens=50)\n",
    "\n",
    "print(chat_response.choices[0].message.content)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}