← Back to Cookbook
ollama mistral llamaindex
Details
File: third_party/LlamaIndex/ollama_mistral_llamaindex.ipynb
Type: Jupyter Notebook
Use Cases:
Integrations: Llamaindex, Ollama
Content
Notebook content (JSON format):
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# RAG Pipeline with Ollama, Mistral and LlamaIndex\n", "\n", "In this notebook, we will demonstrate how to build a RAG pipeline using Ollama, Mistral models, and LlamaIndex. The following topics will be covered:\n", "\n", "1.\tIntegrating Mistral with Ollama and LlamaIndex.\n", "2.\tImplementing RAG with Ollama and LlamaIndex using the Mistral model.\n", "3.\tRouting queries with RouterQueryEngine.\n", "4.\tHandling complex queries with SubQuestionQueryEngine.\n", "\n", "Before running this notebook, you need to set up Ollama. Please follow the instructions [here](https://ollama.com/library/mistral:instruct)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import nest_asyncio\n", "\n", "nest_asyncio.apply()\n", "\n", "from IPython.display import display, HTML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup LLM" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from llama_index.llms.ollama import Ollama\n", "\n", "llm = Ollama(model=\"mistral:instruct\", request_timeout=60.0)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "\n", "from llama_index.core.llms import ChatMessage\n", "\n", "messages = [\n", " ChatMessage(role=\"system\", content=\"You are a helpful assistant.\"),\n", " ChatMessage(role=\"user\", content=\"What is the capital city of France?\"),\n", "]\n", "response = llm.chat(messages)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<p style=\"font-size:20px\">assistant: The capital city of France is Paris. It is located in the north-central part of the country and is one of the most populous cities in Europe. Paris is famous for its iconic landmarks such as the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and the Champs-Élysées. The city is also known for its rich history, art, culture, fashion, and cuisine.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML(f'<p style=\"font-size:20px\">{response}</p>'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Embedding Model" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n", " warnings.warn(\n" ] } ], "source": [ "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n", "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import Settings\n", "\n", "Settings.llm = llm\n", "Settings.embed_model = embed_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download Data\n", "\n", "We will use Uber and Lyft 10K SEC filings for the demostration." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zsh:1: command not found: wget\n", "zsh:1: command not found: wget\n" ] } ], "source": [ "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import SimpleDirectoryReader\n", "\n", "uber_docs = SimpleDirectoryReader(input_files=[\"./uber_2021.pdf\"]).load_data()\n", "lyft_docs = SimpleDirectoryReader(input_files=[\"./lyft_2021.pdf\"]).load_data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Index and Query Engines" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import VectorStoreIndex\n", "from llama_index.core import SummaryIndex\n", "\n", "uber_vector_index = VectorStoreIndex.from_documents(uber_docs)\n", "uber_vector_query_engine = uber_vector_index.as_query_engine(similarity_top_k=2)\n", "\n", "lyft_vector_index = VectorStoreIndex.from_documents(lyft_docs)\n", "lyft_vector_query_engine = lyft_vector_index.as_query_engine(similarity_top_k=2)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<p style=\"font-size:20px\"> The revenue of Uber in 2021 was $17,455 million.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "response = uber_vector_query_engine.query(\"What is the revenue of uber in 2021 in millions?\")\n", "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<p style=\"font-size:20px\"> In the provided context, it can be found that the revenue for Lyft in the year ended December 31, 2021 was approximately $3,208 million (or $3,208,323 thousand).</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "response = lyft_vector_query_engine.query(\"What is the revenue of lyft in 2021 in millions?\")\n", "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RouterQueryEngine\n", "\n", "We will utilize the `RouterQueryEngine` to direct user queries to the appropriate index based on the query related to either Uber/ Lyft." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create QueryEngine tools" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from llama_index.core.tools import QueryEngineTool, ToolMetadata\n", "from llama_index.core.query_engine.router_query_engine import RouterQueryEngine\n", "from llama_index.core.selectors.llm_selectors import LLMSingleSelector\n", "\n", "query_engine_tools = [\n", " QueryEngineTool(\n", " query_engine=lyft_vector_query_engine,\n", " metadata=ToolMetadata(\n", " name=\"vector_lyft_10k\",\n", " description=\"Provides information about Lyft financials for year 2021\",\n", " ),\n", " ),\n", " QueryEngineTool(\n", " query_engine=uber_vector_query_engine,\n", " metadata=ToolMetadata(\n", " name=\"vector_uber_10k\",\n", " description=\"Provides information about Uber financials for year 2021\",\n", " ),\n", " ),\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create `RouterQueryEnine`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "\n", "query_engine = RouterQueryEngine(\n", " selector=LLMSingleSelector.from_defaults(),\n", " query_engine_tools=query_engine_tools,\n", " verbose = True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;3;38;5;200mSelecting query engine 1: The provided choices are summaries of financial reports for the year 2021. Investments made by a company are not typically included in a financial report. However, the financial report may provide information about investments through capital expenditures or acquisition costs. As such, to find out about Uber's investments, one would need to look at a separate report or section that discusses these topics..\n", "\u001b[0m" ] }, { "data": { "text/html": [ "<p style=\"font-size:20px\"> Uber invests in a variety of financial instruments, as evidenced by the references to investments and equity method investments in the provided context. However, specific details about the nature or type of these investments are not directly disclosed in the information you've given. To determine the exact nature of these investments, one would need to review more detailed sections of the financial statements, such as Note 8 - Investments.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "response = query_engine.query(\"What are the investments made by Uber?\")\n", "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;3;38;5;200mSelecting query engine 0: The given choices do not provide information about investments made by Lyft in 2021. They only provide information about the financials of Lyft and Uber for the year 2021..\n", "\u001b[0m" ] } ], "source": [ "response = query_engine.query(\"What are the investments made by the Lyft in 2021?\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<p style=\"font-size:20px\"> The context provided indicates that Lyft invested in various areas in 2021. These include developing and launching new offerings and platform features, expanding in existing and new markets, continued investment in their platform and customer engagement, efforts to mitigate the impact of the COVID-19 pandemic, expansion of asset-intensive offerings such as Light Vehicles, Flexdrive, Lyft Rentals, Lyft Auto Care, and Driver Hubs, driver-centric service centers, and community spaces. They also expanded environmental programs, specifically their commitment to 100% EVs on their platform by the end of 2030. These offerings and programs require significant capital investments and recurring costs.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SubQuestionQueryEngine\n", "\n", "We will explore how the `SubQuestionQueryEngine` can be leveraged to tackle complex queries by generating and addressing sub-queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create `SubQuestionQueryEngine`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "from llama_index.core.query_engine import SubQuestionQueryEngine\n", "\n", "sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools,\n", " verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generated 2 sub questions.\n", "\u001b[1;3;38;2;237;90;200m[vector_uber_10k] Q: What is the revenue of Uber for year 2021\n", "\u001b[0m\u001b[1;3;38;2;90;149;237m[vector_lyft_10k] Q: What is the revenue of Lyft for year 2021\n", "\u001b[0m\u001b[1;3;38;2;90;149;237m[vector_lyft_10k] A: In the provided context, the revenue for Lyft in the year 2021 is $3,208,323 thousand. This can be found on page 79 (file_path: lyft_2021.pdf).\n", "\u001b[0m\u001b[1;3;38;2;237;90;200m[vector_uber_10k] A: The revenue for Uber in the year 2021, as per the provided context, was $17,455 million.\n", "\u001b[0m" ] }, { "data": { "text/html": [ "<p style=\"font-size:20px\"> In the year 2021, the revenue for Uber was $17,455 million and for Lyft it was $3,208,323 thousand. To compare, you can convert both revenues to the same units. In this case, since Uber's revenue is in millions, we can convert Lyft's revenue by dividing it by 1,000. Therefore, the comparable revenue for Lyft would be $3,208,323 / 1,000 = $3,208.323 million. So, Uber had a higher revenue than Lyft in 2021.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "response = sub_question_query_engine.query(\"Compare the revenues of Uber and Lyft in 2021?\")\n", "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generated 2 sub questions.\n", "\u001b[1;3;38;2;237;90;200m[vector_uber_10k] Q: What investments were made by Uber in year 2021\n", "\u001b[0m\u001b[1;3;38;2;90;149;237m[vector_lyft_10k] Q: What investments were made by Lyft in year 2021\n", "\u001b[0m\u001b[1;3;38;2;237;90;200m[vector_uber_10k] A: In year 2021, Uber made purchases of non-marketable equity securities for 982 million dollars as per the provided cash flow statement. However, it is important to note that the information does not specify the details or names of the specific investments made.\n", "\u001b[0m\u001b[1;3;38;2;90;149;237m[vector_lyft_10k] A: Based on the provided context, it appears that in the year 2021, Lyft made significant investments in several areas. These include developing and launching new offerings and platform features, expanding in existing and new markets, investing in their platform and customer engagement, investing in environmental programs such as their commitment to 100% EVs on their platform by the end of 2030, and expanding support services for drivers through Driver Hubs, Driver Centers, Mobile Services, Lyft AutoCare, and the Express Drive vehicle rental program. Additionally, they have invested in asset-intensive offerings like their network of Light Vehicles, Flexdrive, Lyft Rentals, and Lyft Auto Care. Furthermore, they have incurred and will continue to incur costs associated with Proposition 22 in California and the COVID-19 pandemic.\n", "\u001b[0m" ] }, { "data": { "text/html": [ "<p style=\"font-size:20px\"> In the year 2021, Uber made purchases of non-marketable equity securities for 982 million dollars as previously mentioned. On the other hand, Lyft invested in various areas such as developing and launching new offerings and platform features, expanding in existing and new markets, investing in their platform and customer engagement, environmental programs like a commitment to 100% EVs on their platform by the end of 2030, and support services for drivers through Driver Hubs, Driver Centers, Mobile Services, Lyft AutoCare, Express Drive vehicle rental program. Furthermore, Lyft has invested in asset-intensive offerings like their network of Light Vehicles, Flexdrive, Lyft Rentals, and Lyft Auto Care. Additionally, they have incurred and will continue to incur costs associated with Proposition 22 in California and the COVID-19 pandemic.</p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "response = sub_question_query_engine.query(\"What are the investments made by Uber and Lyft in 2021?\")\n", "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))" ] } ], "metadata": { "kernelspec": { "display_name": "llamaindex", "language": "python", "name": "llamaindex" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 2 }