← Back to Cookbook
llamaindex agentic rag
Details
File: third_party/LlamaIndex/llamaindex_agentic_rag.ipynb
Type: Jupyter Notebook
Use Cases: Agents, RAG
Integrations: Llamaindex
Content
Notebook content (JSON format):
{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "fsW90qFYn_yI" }, "source": [ "# Using Mistral AI with LlamaIndex\n", "\n", "<a href=\"https://colab.research.google.com/github/mistralai/cookbook/blob/main/llamaindex_agentic_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n", "\n", "In this notebook we're going to show how you can use LlamaIndex with the Mistral API to perform complex queries over multiple documents including answering questions that require multiple documents simultaneously. We'll do this using a ReAct agent, an autonomous LLM-powered agent capable of using tools." ] }, { "cell_type": "markdown", "metadata": { "id": "BQfBH4rIoKkn" }, "source": [ "First we install our dependencies. We need LlamaIndex, Mistral, and a PDF parser for later." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "X_SUqRATWdJJ", "outputId": "9b866579-6c9b-480a-cf27-b9d3d8988c6b", "tags": [] }, "outputs": [], "source": [ "!pip install llama-index-core \n", "!pip install llama-index-embeddings-mistralai\n", "!pip install llama-index-llms-mistralai\n", "!pip install llama-index-readers-file\n", "!pip install mistralai pypdf" ] }, { "cell_type": "markdown", "metadata": { "id": "PlgXfmf5oVCF" }, "source": [ "Now we set up our connection to Mistral. We need two things:\n", "1. An LLM, to answer questions\n", "2. An embedding model, to convert our data into vectors for retrieval by our index.\n", "Luckily, Mistral provides both!\n", "\n", "Once we have them, we put them into a ServiceContext, an object LlamaIndex uses to pass configuration around." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "GSJk-PHwy8M-" }, "outputs": [], "source": [ "from llama_index.llms.mistralai import MistralAI\n", "from llama_index.embeddings.mistralai import MistralAIEmbedding\n", "from llama_index.core.settings import Settings\n", "\n", "api_key = \"\"\n", "llm = MistralAI(api_key=api_key,model=\"mistral-large-latest\")\n", "embed_model = MistralAIEmbedding(model_name='mistral-embed', api_key=api_key)\n", "\n", "Settings.llm = llm\n", "Settings.embed_model = embed_model" ] }, { "cell_type": "markdown", "metadata": { "id": "JV8mXq7toZxm" }, "source": [ "Now let's download our dataset, 3 very large PDFs containing Lyft's annual reports from 2020-2022." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "VE8fEvO6z3Hs" }, "outputs": [], "source": [ "!wget \"https://www.dropbox.com/scl/fi/ywc29qvt66s8i97h1taci/lyft-10k-2020.pdf?rlkey=d7bru2jno7398imeirn09fey5&dl=0\" -q -O ./lyft_10k_2020.pdf\n", "!wget \"https://www.dropbox.com/scl/fi/lpmmki7a9a14s1l5ef7ep/lyft-10k-2021.pdf?rlkey=ud5cwlfotrii6r5jjag1o3hvm&dl=0\" -q -O ./lyft_10k_2021.pdf\n", "!wget \"https://www.dropbox.com/scl/fi/iffbbnbw9h7shqnnot5es/lyft-10k-2022.pdf?rlkey=grkdgxcrib60oegtp4jn8hpl8&dl=0\" -q -O ./lyft_10k_2022.pdf" ] }, { "cell_type": "markdown", "metadata": { "id": "LhdXst1Mogb2" }, "source": [ "Now we have our data, we're going to do three things:\n", "1. Load the PDF data into memory. It will be parsed into text as we do this. That's the `load_data()` line.\n", "2. Index the data. This will create a vector representation of each document. That's the `from_documents()` line. It stores the vectors in memory.\n", "3. Set up a query engine to retrieve information from the vector store and pass it to the LLM. That's the `as_query_engine()` line.\n", "\n", "We're going to do this once for each of the three documents. If we had more than 3 we would do this programmatically with a loop, but this keeps the code very simple if a little repetitive. We've included a query to one of the indexes at the end as a test." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2cmuU8kH0gQu", "outputId": "93c5216e-d061-4c47-bc6b-8dfbf2dcccb5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lyft did not make a profit in 2022. Instead, they incurred a net loss of $1,584,511.\n" ] } ], "source": [ "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n", "\n", "lyft_2020_docs = SimpleDirectoryReader(input_files=[\"./lyft_10k_2020.pdf\"]).load_data()\n", "lyft_2020_index = VectorStoreIndex.from_documents(lyft_2020_docs)\n", "lyft_2020_engine = lyft_2020_index.as_query_engine()\n", "\n", "lyft_2021_docs = SimpleDirectoryReader(input_files=[\"./lyft_10k_2021.pdf\"]).load_data()\n", "lyft_2021_index = VectorStoreIndex.from_documents(lyft_2021_docs)\n", "lyft_2021_engine = lyft_2021_index.as_query_engine()\n", "\n", "lyft_2022_docs = SimpleDirectoryReader(input_files=[\"./lyft_10k_2022.pdf\"]).load_data()\n", "lyft_2022_index = VectorStoreIndex.from_documents(lyft_2022_docs)\n", "lyft_2022_engine = lyft_2022_index.as_query_engine()\n", "\n", "response = lyft_2022_engine.query(\"What was Lyft's profit in 2022?\")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "bU6k9BFTouYF" }, "source": [ "Success! The 2022 index knows facts about 2022. We're almost ready to create our agent. Before we do, let's set up an array of tools for our agent to use. This turns each of the query engines we set up above into a tool, and indicates what each engine is best at answering questions about. The LLM will read these descriptions when deciding what tool to use." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "agpU5WLYxKcZ" }, "outputs": [], "source": [ "from llama_index.core.tools import QueryEngineTool, ToolMetadata\n", "\n", "query_engine_tools = [\n", " QueryEngineTool(\n", " query_engine=lyft_2020_engine,\n", " metadata=ToolMetadata(\n", " name=\"lyft_2020_10k_form\",\n", " description=\"Annual report of Lyft's financial activities in 2020\",\n", " ),\n", " ),\n", " QueryEngineTool(\n", " query_engine=lyft_2021_engine,\n", " metadata=ToolMetadata(\n", " name=\"lyft_2021_10k_form\",\n", " description=\"Annual report of Lyft's financial activities in 2021\",\n", " ),\n", " ),\n", " QueryEngineTool(\n", " query_engine=lyft_2022_engine,\n", " metadata=ToolMetadata(\n", " name=\"lyft_2022_10k_form\",\n", " description=\"Annual report of Lyft's financial activities in 2022\",\n", " ),\n", " ),\n", "]\n" ] }, { "cell_type": "markdown", "metadata": { "id": "ZufzTJKoo7RX" }, "source": [ "Now we create our agent from the tools we've set up and we can ask it complicated questions. It will reason through the process step by step, creating simpler questions, and use different tools to answer them. Then it'll take the information it gathers from each tool and combine it into a single answer to the more complex question." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.\n", "Action: lyft_2022_10k_form\n", "Action Input: {'input': 'risk factors'}\n", "\u001b[0m\u001b[1;3;34mObservation: The company faces a variety of risk factors that could adversely affect its business, financial condition, and results of operations. These risks include natural disasters such as earthquakes, fires, hurricanes, tornadoes, floods, or significant power outages, which could disrupt operations, mobile networks, the Internet, or the operations of third-party technology providers. The impact of climate change may increase these risks.\n", "\n", "In addition, public health crises such as the COVID-19 pandemic, other epidemics, political crises like terrorist attacks, war, and other political or social instability, and other geopolitical developments, could also adversely affect operations or the economy as a whole. The company has offices and employees in regions like Belarus and Ukraine that have been and may continue to be adversely affected by current wars in the region, including displacement of employees.\n", "\n", "The company's limited operating history and evolving business make it difficult to evaluate future prospects and the risks and challenges that may be encountered. The company regularly expands its platform features, offerings, and services, and changes its pricing methodologies. The company's evolving business, industry, and markets make it difficult to evaluate future prospects and the risks and challenges that may be encountered.\n", "\n", "The company's business operations are also subject to numerous risks, factors, and uncertainties, including those outside of its control. These include general macroeconomic conditions, competition in the industry, the ability to attract and retain qualified drivers and riders, changes to pricing practices, and the ability to manage growth.\n", "\n", "The company's reliance on third parties, such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers, also poses a risk. The development of new offerings on the platform and the management of the complexities of such expansion, as well as the ability to offer high-quality user support and deal with fraud, are additional risk factors.\n", "\n", "If the company fails to address these risks and difficulties, its business, financial condition, and results of operations could be adversely affected.\n", "\u001b[0m\u001b[1;3;38;5;200mThought: The current language of the user is: English. I have already used the tool 'lyft_2022_10k_form' and have enough information to answer the question without using any more tools.\n", "Answer: The risk factors for the company in 2022 include natural disasters such as earthquakes, fires, hurricanes, tornadoes, floods, or significant power outages, which could disrupt operations, mobile networks, the Internet, or the operations of third-party technology providers. The impact of climate change may increase these risks. Public health crises such as the COVID-19 pandemic, other epidemics, political crises like terrorist attacks, war, and other political or social instability, and other geopolitical developments, could also adversely affect operations or the economy as a whole. The company has offices and employees in regions like Belarus and Ukraine that have been and may continue to be adversely affected by current wars in the region, including displacement of employees. The company's limited operating history and evolving business make it difficult to evaluate future prospects and the risks and challenges that may be encountered. The company regularly expands its platform features, offerings, and services, and changes its pricing methodologies. The company's evolving business, industry, and markets make it difficult to evaluate future prospects and the risks and challenges that may be encountered. The company's business operations are also subject to numerous risks, factors, and uncertainties, including those outside of its control. These include general macroeconomic conditions, competition in the industry, the ability to attract and retain qualified drivers and riders, changes to pricing practices, and the ability to manage growth. The company's reliance on third parties, such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers, also poses a risk. The development of new offerings on the platform and the management of the complexities of such expansion, as well as the ability to offer high-quality user support and deal with fraud, are additional risk factors. If the company fails to address these risks and difficulties, its business, financial condition, and results of operations could be adversely affected.\n", "\u001b[0mThe risk factors for the company in 2022 include natural disasters such as earthquakes, fires, hurricanes, tornadoes, floods, or significant power outages, which could disrupt operations, mobile networks, the Internet, or the operations of third-party technology providers. The impact of climate change may increase these risks. Public health crises such as the COVID-19 pandemic, other epidemics, political crises like terrorist attacks, war, and other political or social instability, and other geopolitical developments, could also adversely affect operations or the economy as a whole. The company has offices and employees in regions like Belarus and Ukraine that have been and may continue to be adversely affected by current wars in the region, including displacement of employees. The company's limited operating history and evolving business make it difficult to evaluate future prospects and the risks and challenges that may be encountered. The company regularly expands its platform features, offerings, and services, and changes its pricing methodologies. The company's evolving business, industry, and markets make it difficult to evaluate future prospects and the risks and challenges that may be encountered. The company's business operations are also subject to numerous risks, factors, and uncertainties, including those outside of its control. These include general macroeconomic conditions, competition in the industry, the ability to attract and retain qualified drivers and riders, changes to pricing practices, and the ability to manage growth. The company's reliance on third parties, such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers, also poses a risk. The development of new offerings on the platform and the management of the complexities of such expansion, as well as the ability to offer high-quality user support and deal with fraud, are additional risk factors. If the company fails to address these risks and difficulties, its business, financial condition, and results of operations could be adversely affected.\n" ] } ], "source": [ "from llama_index.core.agent import ReActAgent\n", "\n", "lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)\n", "response = lyft_agent.chat(\"What are the risk factors in 2022?\")\n", "print(response)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XUWloxrl2_-K", "outputId": "d2c0e7e1-ace7-4f20-85ea-040b4e5bf57d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question. First, I will find out Lyft's profit in 2020.\n", "Action: lyft_2020_10k_form\n", "Action Input: {'input': 'profit'}\n", "\u001b[0m\u001b[1;3;34mObservation: The provided information does not indicate a profit for the periods mentioned. Instead, it shows a net loss for each of the given periods, with the percentage of net loss ranging from 34.9% to 146.7%. The company uses measures such as Contribution and Contribution Margin to evaluate its operating performance and trends, aiming to achieve profitability and increase it over time. However, based on the given data, the company has not yet achieved profitability.\n", "\u001b[0m\u001b[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question. First, I will find out Lyft's profit in 2022.\n", "Action: lyft_2022_10k_form\n", "Action Input: {'input': 'profit'}\n", "\u001b[0m\u001b[1;3;34mObservation: The company's gross profit for the year ended December 31, 2022, was $1,659.4 million, with a gross profit margin of 40.5%. After adjusting for certain items such as amortization of intangible assets, stock-based compensation expense, payroll tax expense related to stock-based compensation, net amount from claims ceded under the Reinsurance Agreement, transactions related to certain legacy auto insurance liabilities, and restructuring charges, the company's Contribution was $1,729.8 million, with a Contribution Margin of 42.2%.\n", "\u001b[0m\u001b[1;3;38;5;200mThought: The current language of the user is: English. Based on the information provided, Lyft did not achieve profitability in 2022. Instead, the company reported a gross profit of $1,659.4 million and a Contribution of $1,729.8 million.\n", "Answer: Lyft did not achieve profitability in 2022. Instead, the company reported a gross profit of $1,659.4 million and a Contribution of $1,729.8 million.\n", "\u001b[0mLyft did not achieve profitability in 2022. Instead, the company reported a gross profit of $1,659.4 million and a Contribution of $1,729.8 million.\n" ] } ], "source": [ "from llama_index.core.agent import ReActAgent\n", "\n", "lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)\n", "response = lyft_agent.chat(\"What is Lyft's profit in 2022 vs 2020? Generate only one step at a time. Use existing tools.\")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "ofYcUSVEpFS4" }, "source": [ "Cool! As you can see it got the 2022 profit from the 2022 10-K form and the 2020 data from the 2020 report. It took both those answers and combined them into the difference we asked for. Let's try another question, this time asking about textual answers rather than numbers:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JkIZf5Hi3YkI", "outputId": "6438ae66-c457-4359-af8d-26b85c96e580" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question. I will first retrieve the information about Lyft's R&D activities in 2022.\n", "Action: lyft_2022_10k_form\n", "Action Input: {'input': 'R&D'}\n", "\u001b[0m\u001b[1;3;34mObservation: Research and Development expenses for the year ended December 31, 2022, were $856,777, a decrease of $55.2 million, or 6%, compared to the prior year. This decrease was primarily due to a reduction in personnel-related costs and stock-based compensation, which were driven by reduced headcount following a transaction with Woven Planet in the third quarter of 2021. There were also reductions in Level 5 costs, web hosting fees, and autonomous vehicle research costs. However, these decreases were offset by restructuring costs related to an event in the fourth quarter of 2022, which included impairment costs of operating lease right-of-use assets, severance and benefits costs, and stock-based compensation.\n", "\u001b[0m\u001b[1;3;38;5;200mThought: The current language of the user is: English. I have retrieved the information about Lyft's R&D activities in 2022. Now, I need to retrieve the information about Lyft's R&D activities in 2021.\n", "Action: lyft_2021_10k_form\n", "Action Input: {'input': 'R&D'}\n", "\u001b[0m\u001b[1;3;34mObservation: Research and Development expenses for the company primarily consist of personnel-related compensation costs and facilities costs. These expenses also include costs related to autonomous vehicle technology initiatives. The company expenses these costs as they are incurred. There was a transaction completed with Woven Planet, a subsidiary of Toyota Motor Corporation, on July 13, 2021, which involved the divestiture of certain assets related to the company's self-driving vehicle division. As a result, certain costs related to the prior initiative to develop self-driving systems were eliminated beginning in the third quarter of 2021.\n", "\u001b[0m\u001b[1;3;38;5;200mThought: The current language of the user is: English. I have retrieved the information about Lyft's R&D activities in 2021. Now, I can compare the R&D activities in 2022 and 2021.\n", "Answer: In 2022, Lyft's R&D expenses decreased by 6% compared to the prior year, primarily due to a reduction in personnel-related costs and stock-based compensation, which were driven by reduced headcount following a transaction with Woven Planet in the third quarter of 2021. There were also reductions in Level 5 costs, web hosting fees, and autonomous vehicle research costs. However, these decreases were offset by restructuring costs related to an event in the fourth quarter of 2022, which included impairment costs of operating lease right-of-use assets, severance and benefits costs, and stock-based compensation. In 2021, Lyft's R&D expenses primarily consisted of personnel-related compensation costs and facilities costs. These expenses also included costs related to autonomous vehicle technology initiatives. The company expenses these costs as they are incurred. There was a transaction completed with Woven Planet, a subsidiary of Toyota Motor Corporation, on July 13, 2021, which involved the divestiture of certain assets related to the company's self-driving vehicle division. As a result, certain costs related to the prior initiative to develop self-driving systems were eliminated beginning in the third quarter of 2021.\n", "\u001b[0mIn 2022, Lyft's R&D expenses decreased by 6% compared to the prior year, primarily due to a reduction in personnel-related costs and stock-based compensation, which were driven by reduced headcount following a transaction with Woven Planet in the third quarter of 2021. There were also reductions in Level 5 costs, web hosting fees, and autonomous vehicle research costs. However, these decreases were offset by restructuring costs related to an event in the fourth quarter of 2022, which included impairment costs of operating lease right-of-use assets, severance and benefits costs, and stock-based compensation. In 2021, Lyft's R&D expenses primarily consisted of personnel-related compensation costs and facilities costs. These expenses also included costs related to autonomous vehicle technology initiatives. The company expenses these costs as they are incurred. There was a transaction completed with Woven Planet, a subsidiary of Toyota Motor Corporation, on July 13, 2021, which involved the divestiture of certain assets related to the company's self-driving vehicle division. As a result, certain costs related to the prior initiative to develop self-driving systems were eliminated beginning in the third quarter of 2021.\n" ] } ], "source": [ "from llama_index.core.agent import ReActAgent\n", "\n", "lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)\n", "response = lyft_agent.chat(\"What did Lyft do in R&D in 2022 versus 2021? Generate only one step at a time. Use existing tools.\")\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "2NznaneKpKUs" }, "source": [ "Great! It correctly itemized the risks, noticed the differences, and summarized them.\n", "\n", "You can try this on any number of documents with any number of query engines to answer really complex questions. You can even have the query engines themselves be agents." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "mistral_cookbook", "language": "python", "name": "mistral_cookbook" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 4 }