Reach out
← Back to Cookbook

RouterQueryEngine

Details

File: third_party/LlamaIndex/RouterQueryEngine.ipynb

Type: Jupyter Notebook

Use Cases: Query Engine

Integrations: Llamaindex

Content

Notebook content (JSON format):

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "56c23751",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/mistralai/cookbook/blob/main/third_party/LlamaIndex/RouterQueryEngine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02c3900a-6024-42ec-af62-af1001fd4af5",
   "metadata": {},
   "source": [
    "# Router Query Engine with Mistral AI and LlamaIndex\n",
    "\n",
    "A `VectorStoreIndex` is designed to handle queries related to specific contexts, while a `SummaryIndex` is optimized for answering summarization queries. However, in real-world scenarios, user queries may require either context-specific responses or summarizations. To address this, the system must effectively route user queries to the appropriate index to provide relevant answers.\n",
    "\n",
    "In this notebook, we will utilize the `RouterQueryEngine` to direct user queries to the appropriate index based on the query type."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b451e7df-78eb-457e-ae3d-d84af7fef8e3",
   "metadata": {},
   "source": [
    "### Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad1e167f-2332-4be0-844e-0fcb0e97d663",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index\n",
    "!pip install llama-index-llms-mistralai\n",
    "!pip install llama-index-embeddings-mistralai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa002b8e-4a22-4033-ab71-0ade6065d19a",
   "metadata": {},
   "source": [
    "### Setup API Key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e2447175-3cdf-4c41-8846-d678fed72e3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['MISTRAL_API_KEY'] = 'YOUR MISTRAL API KEY'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71dd4757-05e6-4357-a4de-681634d9a55b",
   "metadata": {},
   "source": [
    "### Set LLM and Embedding Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6572ec1c-bbbc-4d15-86a3-a7ecf72360e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a1ae8238-f53e-4e6c-8448-08c9ab3894a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n",
    "from llama_index.llms.mistralai import MistralAI\n",
    "from llama_index.embeddings.mistralai import MistralAIEmbedding\n",
    "from llama_index.core import Settings\n",
    "\n",
    "from llama_index.core.tools import QueryEngineTool, ToolMetadata\n",
    "from llama_index.core.query_engine.router_query_engine import RouterQueryEngine\n",
    "from llama_index.core.selectors.llm_selectors import LLMSingleSelector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d5ac41d1-2223-4577-a533-cbf1949607c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = MistralAI(model='mistral-large')\n",
    "embed_model = MistralAIEmbedding()\n",
    "\n",
    "Settings.llm = llm\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96d5cbc1-d469-4872-ab79-049ff5234819",
   "metadata": {},
   "source": [
    "### Download Data\n",
    "\n",
    "We will use `Uber 10K SEC Filings`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6f2f70e2-7a18-417e-82c0-158c784a265e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-03-31 00:24:17--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1880483 (1.8M) [application/octet-stream]\n",
      "Saving to: ‘data/10k/uber_2021.pdf’\n",
      "\n",
      "data/10k/uber_2021. 100%[===================>]   1.79M  --.-KB/s    in 0.05s   \n",
      "\n",
      "2024-03-31 00:24:17 (38.5 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/10k/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bde9293-51cf-4406-b628-a97891bc281b",
   "metadata": {},
   "source": [
    "### Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "7f93584f-b19d-4595-a2e6-f6124be7ff2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "uber_docs = SimpleDirectoryReader(input_files=[\"./data/10k/uber_2021.pdf\"]).load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "459d163f-f15f-4363-a657-857e288093eb",
   "metadata": {},
   "source": [
    "### Index and Query Engine creation\n",
    "\n",
    " 1. VectorStoreIndex -> Specific context queries\n",
    " 2. SummaryIndex -> Summarization queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "69dbbbdc-f0f3-408a-9a2c-dd65a6db9cb4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "uber_vector_index = VectorStoreIndex.from_documents(uber_docs)\n",
    "\n",
    "uber_summary_index = VectorStoreIndex.from_documents(uber_docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "14bb6f0b-5ccd-4ed0-8227-6d488d4e5adf",
   "metadata": {},
   "outputs": [],
   "source": [
    "uber_vector_query_engine = uber_vector_index.as_query_engine(similarity_top_k = 5)\n",
    "uber_summary_query_engine = uber_summary_index.as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8df11259-e0ab-496d-8b0b-cb26adfb6ef4",
   "metadata": {},
   "source": [
    "### Create Tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "8ffe1664-bc36-416a-a1b0-ded7d4468508",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine_tools = [\n",
    "    QueryEngineTool(\n",
    "        query_engine=uber_vector_query_engine,\n",
    "        metadata=ToolMetadata(\n",
    "            name=\"vector_engine\",\n",
    "            description=(\n",
    "                \"Provides information about Uber financials for year 2021.\"\n",
    "            ),\n",
    "        ),\n",
    "    ),\n",
    "    QueryEngineTool(\n",
    "        query_engine=uber_summary_query_engine,\n",
    "        metadata=ToolMetadata(\n",
    "            name=\"summary_engine\",\n",
    "            description=(\n",
    "                \"Provides Summary about Uber financials for year 2021.\"\n",
    "            ),\n",
    "        ),\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65f8b9d0-b1c1-4bf4-8f78-1d67d984bc7c",
   "metadata": {},
   "source": [
    "### Create Router Query Engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "1ae0a39e-c14b-4c90-a443-8ad80e5ccd71",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = RouterQueryEngine(\n",
    "    selector=LLMSingleSelector.from_defaults(),\n",
    "    query_engine_tools=query_engine_tools,\n",
    "    verbose = True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d382576-949d-43ce-8572-7d78e8215602",
   "metadata": {},
   "source": [
    "### Querying"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1109d5ec-48fd-4457-ab1a-64948fa8d913",
   "metadata": {},
   "source": [
    "#### Summarization Query\n",
    "\n",
    "You can see that it uses `SummaryIndex` to provide answer to the summarization query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "186cc0d1-5c2f-422d-9324-749306a049aa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;3;38;5;200mSelecting query engine 1: This choice specifically mentions a 'summary' of Uber's financials for the year 2021, which directly aligns with the question asked..\n",
      "\u001b[0mIn 2021, Uber's Gross Bookings increased by $32.5 billion, a 56% increase compared to 2020. This growth was driven by a 66% increase in Delivery Gross Bookings due to higher demand for food delivery and larger order sizes, as well as expansion in U.S. and international markets. Mobility Gross Bookings also grew by 36% due to increased trip volumes as the business recovered from COVID-19 impacts.\n",
      "\n",
      "Uber's revenue for the year was $17.5 billion, a 57% increase from the previous year. This growth was attributed to the overall expansion of the Delivery business and an increase in Freight revenue due to the acquisition of Transplace in the fourth quarter of 2021.\n",
      "\n",
      "The net loss attributable to Uber Technologies, Inc. was $496 million, a 93% improvement from the previous year. This improvement was driven by a $1.6 billion pre-tax gain on the sale of the ATG Business to Aurora, a $1.6 billion pre-tax net benefit related to Uber’s equity investments, as well as reductions in fixed costs and increased variable cost efficiencies. The net loss also included $1.2 billion in stock-based compensation expense.\n",
      "\n",
      "Adjusted EBITDA loss was $774 million, an improvement of $1.8 billion from 2020. Mobility Adjusted EBITDA profit was $1.6 billion, and Delivery Adjusted EBITDA loss was $348 million, an improvement of $525 million from the previous year.\n",
      "\n",
      "Uber ended the year with $4.3 billion in cash and cash equivalents. The company also completed several acquisitions in 2021, including the remaining 45% ownership interest in Cornershop and 100% ownership interest in Drizly, an on-demand alcohol marketplace in North America.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What is the summary of the Uber Financials in 2021?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3806f9b-ada8-43d0-8276-4b0a2a1c8835",
   "metadata": {},
   "source": [
    "#### Specific Context Query\n",
    "\n",
    "You can see it uses `VectorStoreIndex` to answer specific context type query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "dd6c321b-2c97-4701-82a1-53455307adc2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;3;38;5;200mSelecting query engine 0: This choice is more likely to contain detailed financial information about Uber in 2021, including revenue..\n",
      "\u001b[0mThe revenue of Uber in 2021 was $17,455 million.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What is the the revenue of Uber in 2021?\")\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3 (main, Apr  7 2023, 19:08:44) [Clang 13.0.0 (clang-1300.0.29.30)]"
  },
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}