← Back to Cookbook
IndustrialKnowledgeAgent
Details
File: mistral/agents/non_framework/industrial_knowledge_agent/IndustrialKnowledgeAgent.ipynb
Type: Jupyter Notebook
Use Cases: Agents
Content
Notebook content (JSON format):
{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "bIeNDnJztY6D" }, "source": [ "# IndustrialKnowledgeAgent: The Smart Industrial Equipment Knowledge Agent\n", "\n", "<a href=\"https://colab.research.google.com/github/mistralai/cookbook/blob/main/mistral/agents/non_framework/industrial_knowledge_agent/IndustrialKnowledgeAgent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n", "\n", "## Problem Statement\n", "\n", "In industrial settings, engineers and technicians often struggle to manage and retrieve comprehensive information about various equipment. This information is scattered across technical manuals, maintenance logs, safety protocols, troubleshooting guides, and parts inventories. The fragmented nature of this data makes it difficult to access and utilize effectively, leading to inefficiencies and potential safety risks. This problem requires an intelligent, adaptive solution to provide real-time, context-aware responses to queries.\n", "\n", "## Proposed Solution\n", "\n", "To address these challenges, we propose an agentic workflow that integrates a Retrieval-Augmented Generation (RAG) system with a database querying system (FunctionCalling). This solution leverages LLMs (including structured output mechanism), embedding models and structured data retrieval to provide contextually relevant and precise information. The workflow is orchestrated by multiple agents, each with a specific role:\n", "\n", "1. **RAGAgent**: Utilizes LLMs and Embedding models to retrieve and generate contextually relevant information from technical documents.\n", "2. **DatabaseQueryAgent**: Handles precise and structured data retrieval from databases containing maintenance logs, technical specifications, parts inventories, and compliance records.\n", "3. **WorkflowOrchestrator**: Orchestrates interactions between the RAGSearchAgent and DatabaseAgent, ensuring seamless and efficient query resolution.\n", "\n", "## Dataset Details\n", "\n", "### PDF Documents\n", "\n", "The PDF documents contain detailed information about various industrial equipment, categorized into:\n", "1. **Technical Manuals**: Operation and maintenance guides.\n", "2. **Maintenance Guides**: Routine and preventive maintenance tasks.\n", "3. **Troubleshooting Guides**: Solutions to common issues.\n", "4. **Safety Protocols**: Safety procedures and guidelines.\n", "\n", "### Databases\n", "\n", "The databases contain structured information that complements the PDF documents:\n", "1. **Compliance Database (`compliance_db`)**: Safety certifications and compliance statuses.\n", "2. **Maintenance Database (`maintenance_db`)**: Logs of maintenance activities.\n", "3. **Technical Specifications Database (`technical_specifications_db`)**: Detailed technical specifications.\n", "4. **Parts Inventory and Compatibility Database (`parts_inventory_compatibility_db`)**: Information on parts, compatibility, and inventory status.\n", "\n", "By integrating these datasets, the proposed agentic workflow aims to provide a comprehensive and efficient system for managing and retrieving industrial equipment information, ensuring that engineers and technicians have access to the most relevant and up-to-date information.\n", "\n", "*NOTE*: Please note that all data used in this demonstration has been synthetically generated.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "P_ZTL7hRc1o4" }, "source": [ "### Technical Architecture:\n", "\n", "<img src=\"solution_architecture.png\" alt=\"Alt text\" width=\"10000\"/>" ] }, { "cell_type": "markdown", "metadata": { "id": "_qQw2NObtY6F" }, "source": [ "### Installation\n", "\n", "Installs the necessary Python packages for the IndusAgent system." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://colab.research.google.com/github/mistralai/cookbook/blob/main/mistral/agents/industrial_knowledge_agent/IndustrialKnowledgeAgent.ipynb" }, "id": "iGZDgJ8itY6F", "outputId": "770230d9-9f58-4462-d66a-d189a9e8e7a9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting mistralai==1.5.1\n", " Downloading mistralai-1.5.1-py3-none-any.whl.metadata (29 kB)\n", "Collecting eval-type-backport>=0.2.0 (from mistralai==1.5.1)\n", " Downloading eval_type_backport-0.2.2-py3-none-any.whl.metadata (2.2 kB)\n", "Requirement already satisfied: httpx>=0.27.0 in /usr/local/lib/python3.11/dist-packages (from mistralai==1.5.1) (0.28.1)\n", "Collecting jsonpath-python>=1.0.6 (from mistralai==1.5.1)\n", " Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)\n", "Requirement already satisfied: pydantic>=2.9.0 in /usr/local/lib/python3.11/dist-packages (from mistralai==1.5.1) (2.10.6)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from mistralai==1.5.1) (2.8.2)\n", "Collecting typing-inspect>=0.9.0 (from mistralai==1.5.1)\n", " Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n", "Requirement already satisfied: anyio in /usr/local/lib/python3.11/dist-packages (from httpx>=0.27.0->mistralai==1.5.1) (3.7.1)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx>=0.27.0->mistralai==1.5.1) (2025.1.31)\n", "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx>=0.27.0->mistralai==1.5.1) (1.0.7)\n", "Requirement already satisfied: idna in /usr/local/lib/python3.11/dist-packages (from httpx>=0.27.0->mistralai==1.5.1) (3.10)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx>=0.27.0->mistralai==1.5.1) (0.14.0)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.9.0->mistralai==1.5.1) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.9.0->mistralai==1.5.1) (2.27.2)\n", "Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.9.0->mistralai==1.5.1) (4.12.2)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->mistralai==1.5.1) (1.17.0)\n", "Collecting mypy-extensions>=0.3.0 (from typing-inspect>=0.9.0->mistralai==1.5.1)\n", " Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\n", "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.11/dist-packages (from anyio->httpx>=0.27.0->mistralai==1.5.1) (1.3.1)\n", "Downloading mistralai-1.5.1-py3-none-any.whl (278 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m278.3/278.3 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)\n", "Downloading jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)\n", "Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n", "Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n", "Installing collected packages: mypy-extensions, jsonpath-python, eval-type-backport, typing-inspect, mistralai\n", "Successfully installed eval-type-backport-0.2.2 jsonpath-python-1.0.6 mistralai-1.5.1 mypy-extensions-1.0.0 typing-inspect-0.9.0\n", "Collecting qdrant-client==1.13.2\n", " Downloading qdrant_client-1.13.2-py3-none-any.whl.metadata (10 kB)\n", "Requirement already satisfied: grpcio>=1.41.0 in /usr/local/lib/python3.11/dist-packages (from qdrant-client==1.13.2) (1.70.0)\n", "Collecting grpcio-tools>=1.41.0 (from qdrant-client==1.13.2)\n", " Downloading grpcio_tools-1.70.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)\n", "Requirement already satisfied: httpx>=0.20.0 in /usr/local/lib/python3.11/dist-packages (from httpx[http2]>=0.20.0->qdrant-client==1.13.2) (0.28.1)\n", "Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.11/dist-packages (from qdrant-client==1.13.2) (1.26.4)\n", "Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client==1.13.2)\n", " Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)\n", "Requirement already satisfied: pydantic>=1.10.8 in /usr/local/lib/python3.11/dist-packages (from qdrant-client==1.13.2) (2.10.6)\n", "Requirement already satisfied: urllib3<3,>=1.26.14 in /usr/local/lib/python3.11/dist-packages (from qdrant-client==1.13.2) (2.3.0)\n", "Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-tools>=1.41.0->qdrant-client==1.13.2)\n", " Downloading protobuf-5.29.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from grpcio-tools>=1.41.0->qdrant-client==1.13.2) (75.1.0)\n", "Requirement already satisfied: anyio in /usr/local/lib/python3.11/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (3.7.1)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (2025.1.31)\n", "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (1.0.7)\n", "Requirement already satisfied: idna in /usr/local/lib/python3.11/dist-packages (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (3.10)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (0.14.0)\n", "Requirement already satisfied: h2<5,>=3 in /usr/local/lib/python3.11/dist-packages (from httpx[http2]>=0.20.0->qdrant-client==1.13.2) (4.2.0)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic>=1.10.8->qdrant-client==1.13.2) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.11/dist-packages (from pydantic>=1.10.8->qdrant-client==1.13.2) (2.27.2)\n", "Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.11/dist-packages (from pydantic>=1.10.8->qdrant-client==1.13.2) (4.12.2)\n", "Requirement already satisfied: hyperframe<7,>=6.1 in /usr/local/lib/python3.11/dist-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (6.1.0)\n", "Requirement already satisfied: hpack<5,>=4.1 in /usr/local/lib/python3.11/dist-packages (from h2<5,>=3->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (4.1.0)\n", "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.11/dist-packages (from anyio->httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client==1.13.2) (1.3.1)\n", "Downloading qdrant_client-1.13.2-py3-none-any.whl (306 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m306.6/306.6 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading grpcio_tools-1.70.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.5/2.5 MB\u001b[0m \u001b[31m68.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading portalocker-2.10.1-py3-none-any.whl (18 kB)\n", "Downloading protobuf-5.29.3-cp38-abi3-manylinux2014_x86_64.whl (319 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m319.7/319.7 kB\u001b[0m \u001b[31m28.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hInstalling collected packages: protobuf, portalocker, grpcio-tools, qdrant-client\n", " Attempting uninstall: protobuf\n", " Found existing installation: protobuf 4.25.6\n", " Uninstalling protobuf-4.25.6:\n", " Successfully uninstalled protobuf-4.25.6\n", "Successfully installed grpcio-tools-1.70.0 portalocker-2.10.1 protobuf-5.29.3 qdrant-client-1.13.2\n", "Requirement already satisfied: gdown==5.2.0 in /usr/local/lib/python3.11/dist-packages (5.2.0)\n", "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.11/dist-packages (from gdown==5.2.0) (4.13.3)\n", "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from gdown==5.2.0) (3.17.0)\n", "Requirement already satisfied: requests[socks] in /usr/local/lib/python3.11/dist-packages (from gdown==5.2.0) (2.32.3)\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from gdown==5.2.0) (4.67.1)\n", "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.11/dist-packages (from beautifulsoup4->gdown==5.2.0) (2.6)\n", "Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.11/dist-packages (from beautifulsoup4->gdown==5.2.0) (4.12.2)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown==5.2.0) (3.4.1)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown==5.2.0) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown==5.2.0) (2.3.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown==5.2.0) (2025.1.31)\n", "Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown==5.2.0) (1.7.1)\n" ] } ], "source": [ "!pip install mistralai==1.5.1 # Mistral AI client\n", "!pip install qdrant-client==1.13.2 # Vector database client\n", "!pip install gdown==5.2.0 # Google Drive download" ] }, { "cell_type": "markdown", "metadata": { "id": "Fc0z9vNsVjwX" }, "source": [ "### Imports\n", "\n", "Imports required libraries for LLM operations, data processing, vector database management, and utility functions." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "sBTiQxImVjMB" }, "outputs": [], "source": [ "# Core libraries\n", "import os\n", "import json\n", "import functools\n", "import warnings\n", "from typing import List, Dict, Any, Tuple\n", "\n", "# LLM and Data Processing\n", "from mistralai import Mistral\n", "from pydantic import BaseModel\n", "import pandas as pd\n", "import sqlite3\n", "from tqdm import tqdm\n", "\n", "# Vector Database\n", "from qdrant_client import QdrantClient\n", "from qdrant_client.models import (\n", " PointStruct, VectorParams, Distance,\n", " Filter, FieldCondition, MatchValue\n", ")\n", "\n", "# Data Download\n", "import gdown\n", "import zipfile\n", "\n", "# Suppress warnings\n", "warnings.filterwarnings('ignore', category=DeprecationWarning)" ] }, { "cell_type": "markdown", "metadata": { "id": "OHsKUyTUu3Cr" }, "source": [ "### Download Data\n", "\n", "Downloads the dataset from Google Drive, extracts it to a data directory, and sets up the working environment. The dataset contains CSV files for database operations and PDFs for document processing.\n", "\n", "By the end of the process, you should be able to see the downloaded data, as shown in the image below.\n", "\n", "<img src=\"https://github.com/mistralai/cookbook/blob/main/mistral/agents/industrial_equipment_agent/downloaded_data.png?raw=1\" alt=\"Alt text\" width=\"10000\"/>" ] }, { "cell_type": "markdown", "metadata": { "id": "b9x9kr-NVL-Q" }, "source": [ "#### Download data from Google Drive" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "M3Un1A2Zu5WA", "outputId": "d1324abc-8fe3-47f4-a4fc-aca06c3364f9" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Downloading...\n", "From: https://drive.google.com/uc?id=1lwYSN6ry3JOA7pw3WAx72a_IXGqqmR8y\n", "To: /content/data.zip\n", "100%|██████████| 396k/396k [00:00<00:00, 6.10MB/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "✅ File downloaded: data.zip\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "file_id = \"1lwYSN6ry3JOA7pw3WAx72a_IXGqqmR8y\"\n", "output_file = \"data.zip\" # Change this if your file is not a ZIP file\n", "\n", "# Google Drive direct download URL\n", "gdrive_url = f\"https://drive.google.com/uc?id={file_id}\"\n", "\n", "# Download the file\n", "gdown.download(gdrive_url, output_file, quiet=False)\n", "\n", "print(f\"✅ File downloaded: {output_file}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "nb_dMGqFVT8o" }, "source": [ "#### Extract and setup data directory" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZF0OmWdV-ArI", "outputId": "9ef1a449-b454-47cc-a423-7e0f843024cb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ Files extracted to: /content\n", "📂 Current directory: /content/data\n" ] } ], "source": [ "# Unzip the file into the current directory\n", "with zipfile.ZipFile(output_file, 'r') as zip_ref:\n", " zip_ref.extractall(\".\") # Extracts directly to the current directory\n", "\n", "print(f\"✅ Files extracted to: {os.getcwd()}\") # Confirm extraction path\n", "\n", "\n", "output_dir = \"data\"\n", "\n", "# Change working directory to the extracted folder\n", "os.chdir(output_dir)\n", "\n", "# Verify the new working directory\n", "print(f\"📂 Current directory: {os.getcwd()}\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "l8U2SZIRVbue", "outputId": "537bb343-fc85-4f3e-cf1a-6707b3b8d2dd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📜 Extracted files: ['csv_data', 'pdf_data']\n" ] } ], "source": [ "# List files in the extracted folder\n", "print(\"📜 Extracted files:\", os.listdir())" ] }, { "cell_type": "markdown", "metadata": { "id": "GJpK5TdgtY6G" }, "source": [ "### Set up environment variables\n", "\n", "Sets up the Mistral API key as an environment variable for authentication." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "Rqc49HTNtY6H" }, "outputs": [], "source": [ "os.environ[\"MISTRAL_API_KEY\"] = \"<YOUR MISTRAL API KEY>\" # Get your Mistral API key from https://console.mistral.ai/api-keys/" ] }, { "cell_type": "markdown", "metadata": { "id": "_ZBUMoFdtY6H" }, "source": [ "### Initialize Mistral LLM and Qdrant Vector Database\n", "\n", "Initializes the Mistral LLM client for text generation and Qdrant vector database client for similarity search operations.\n", "\n", "*Note*:\n", "\n", "1. We will use our latest model, `Mistral Small 3` for demonstration.\n", "2. You need to set up Qdrant Cloud or a Docker setup before proceeding. You can refer to the [documentation](https://qdrant.tech/cloud/) for the setup instructions." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "qNtBtcMJtY6H" }, "outputs": [], "source": [ "model = \"mistral-small-latest\"\n", "mistral_client = Mistral(api_key=os.environ[\"MISTRAL_API_KEY\"])\n", "qdrant_client = QdrantClient(\n", " url= \"<URL>\",\n", " api_key= \"<API KEY>\",\n", ") # Replace with your Qdrant API key and URL if you are using Qdrant Cloud - https://cloud.qdrant.io/" ] }, { "cell_type": "markdown", "metadata": { "id": "GcOPTLjXtY6H" }, "source": [ "### System Prompts\n", "\n", "The system uses three different types of prompts to guide the LLMs for response generation:\n", "\n", "1. *PDF Summarization Prompt*: `summarization_prompt` is used to create concise summaries of PDF documents.\n", "2. *Response Generation Prompt*: `response_generation_prompt` is used to generate responses based on retrieved context.\n", "3. *Final Response Integration Prompt*: `final_response_generation_prompt` is used to summarize responses from multiple sources - PDFs and different databases.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "Vi_Cs56wtY6I" }, "outputs": [], "source": [ "# Define the prompt for generating a response\n", "response_generation_prompt = '''Based on the following context answer the query:\\n\\n Context: {context}\\n\\n Query: {query}'''\n", "\n", "# Prompt for summarizing the PDF text\n", "summarization_prompt = '''Your task is to summarize the following text focusing on the core essence of the text in maximum of 2-3 sentences.'''\n", "\n", "# Prompt for final response summarization\n", "final_response_summarization_prompt = \"\"\"You are an expert technical assistant. Your task is to create a comprehensive,\n", "coherent response by combining information from multiple sources: database records and documentation.\n", "\n", "Consider the following guidelines:\n", "1. Integrate information from both sources seamlessly\n", "2. Resolve any conflicts between sources, if they exist\n", "3. Present information in a logical, step-by-step manner when applicable\n", "4. Include specific technical details, measurements, and procedures when available\n", "5. Prioritize safety-related information when present\n", "6. Add relevant maintenance intervals or schedules if mentioned\n", "7. Reference specific part numbers or specifications when provided\n", "\n", "The user's query is: {query}\n", "\n", "Based on the following responses from different sources, create a unified, clear answer:\n", "{responses}\n", "\n", "Remember to:\n", "- Focus on accuracy and completeness\n", "- Maintain technical precision\n", "- Use clear, professional language\n", "- Address all aspects of the query\n", "- Highlight any important warnings or precautions\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "id": "PoKYiYGAOd7v" }, "source": [ "## DataProcessor\n", "\n", "The `DataProcessor` class is a comprehensive component that handles all data processing operations in the system. It manages both unstructured (PDFs) and structured (CSV) data, along with embedding generation and storage.\n", "\n", "- PDF document processing and text extraction using Mistral OCR.\n", "- CSV to database ingestion\n", "- Embedding generation and vector storage\n", "- Batch processing of documents and data\n", "\n", "### Main Components\n", "\n", "#### 1. Document Processing\n", "- `get_categorized_filepaths`: Walks through the directory structure to get categorized PDF file paths\n", "- `parse_pdf`: Extracts text from all pages of a PDF file using Mistral OCR.\n", "- `process_single_pdf`: Processes individual PDFs through the complete pipeline\n", "- `process_documents`: Handles sequential processing of multiple documents\n", "\n", "#### 2. Summarization and Embeddings\n", "- `summarize`: Generates concise summaries of text using the Mistral model\n", "- `get_text_embedding`: Creates text embeddings using Mistral's embedding model\n", "- `qdrant_insert_embeddings`: Stores embeddings with metadata in Qdrant vector database\n", "- `process_and_store_embeddings`: Handles batch processing of embeddings\n", "\n", "#### 3. Database Operations\n", "- `insert_csv_to_table`: Loads a single CSV file into a specified database table\n", "- `insert_data_database`: Handles multiple CSV files insertion into their respective tables" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "vkk4yIdAtY6I" }, "outputs": [], "source": [ "class DataProcessor:\n", " \"\"\"\n", " Handles all data processing operations including:\n", " - PDF parsing and text extraction\n", " - CSV to database ingestion\n", " - Embedding generation and storage\n", " - Batch processing of documents and data\n", " \"\"\"\n", " def __init__(self, mistral_client: Mistral, qdrant_client: QdrantClient):\n", " self.mistral_client = mistral_client\n", " self.qdrant_client = qdrant_client\n", "\n", " def get_categorized_filepaths(self, root_dir: str) -> List[Dict[str, str]]:\n", " \"\"\"\n", " Walk through the directory structure and get file paths with their categories.\n", " \"\"\"\n", " categorized_files = []\n", "\n", " for category in os.listdir(root_dir):\n", " category_path = os.path.join(root_dir, category)\n", "\n", " if not os.path.isdir(category_path):\n", " continue\n", "\n", " for root, _, files in os.walk(category_path):\n", " for file in files:\n", " if file.lower().endswith('.pdf'):\n", " filepath = os.path.join(root, file)\n", " categorized_files.append({\n", " 'filepath': filepath,\n", " 'category': category\n", " })\n", "\n", " return categorized_files\n", "\n", " def parse_pdf(self, file_path: str) -> str:\n", " \"\"\"Parse a PDF file and extract text from all pages using Mistral OCR.\"\"\"\n", "\n", " # Upload a file\n", "\n", " uploaded_pdf = self.mistral_client.files.upload(\n", " file={\n", " \"file_name\": file_path,\n", " \"content\": open(file_path, \"rb\"),\n", " },\n", " purpose=\"ocr\"\n", " )\n", "\n", " # Get a signed URL for the uploaded file\n", "\n", " signed_url = self.mistral_client.files.get_signed_url(file_id=uploaded_pdf.id)\n", "\n", " # Get OCR results\n", "\n", " ocr_response = self.mistral_client.ocr.process(\n", " model=\"mistral-ocr-latest\",\n", " document={\n", " \"type\": \"document_url\",\n", " \"document_url\": signed_url.url,\n", " }\n", " )\n", "\n", " # Extract text from the OCR response\n", "\n", " text = \"\\n\".join([x.markdown for x in (ocr_response.pages)])\n", "\n", " return text\n", "\n", " def summarize(self, text: str, summarization_prompt: str = summarization_prompt) -> str:\n", " \"\"\"Summarize the given text using the Mistral model.\"\"\"\n", " chat_response = self.mistral_client.chat.complete(\n", " model=model,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", " \"content\": summarization_prompt\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": text\n", " },\n", " ],\n", " temperature=0\n", " )\n", " return chat_response.choices[0].message.content\n", "\n", " def get_text_embedding(self, inputs: List[str]) -> List[float]:\n", " \"\"\"Get the text embedding for the given inputs.\"\"\"\n", " embeddings_batch_response = self.mistral_client.embeddings.create(\n", " model=\"mistral-embed\",\n", " inputs=inputs\n", " )\n", " return embeddings_batch_response.data[0].embedding\n", "\n", " def qdrant_insert_embeddings(self, summaries: List[str], texts: List[str],\n", " filepaths: List[str], categories: List[str]):\n", " \"\"\"Insert embeddings into Qdrant with metadata.\"\"\"\n", " embeddings = [self.get_text_embedding([t]) for t in summaries]\n", "\n", " if not self.qdrant_client.collection_exists(\"embeddings\"):\n", " self.qdrant_client.create_collection(\n", " collection_name=\"embeddings\",\n", " vectors_config=VectorParams(size=1024, distance=Distance.COSINE),\n", " )\n", "\n", " self.qdrant_client.upsert(\n", " collection_name=\"embeddings\",\n", " points=[\n", " PointStruct(\n", " id=idx,\n", " vector=embedding,\n", " payload={\n", " \"filepath\": filepaths[idx],\n", " \"category\": categories[idx],\n", " \"text\": texts[idx]\n", " }\n", " ) for idx, embedding in enumerate(embeddings)\n", " ]\n", " )\n", "\n", " def process_single_pdf(self, file_info: Dict[str, str]) -> Dict[str, any]:\n", " \"\"\"Process a single PDF file through the pipeline.\"\"\"\n", " filepath = file_info['filepath']\n", " category = file_info['category']\n", "\n", " pdf_text = self.parse_pdf(filepath)\n", " summary = self.summarize(pdf_text)\n", "\n", " return {\n", " 'filepath': filepath,\n", " 'category': category,\n", " 'full_text': pdf_text,\n", " 'summary': summary\n", " }\n", "\n", " def process_documents(self, file_list: List[Dict[str, str]]) -> List[Dict[str, any]]:\n", " \"\"\"Process documents sequentially.\"\"\"\n", " processed_docs = []\n", "\n", " for file_info in tqdm(file_list, desc=\"Processing PDFs\"):\n", " try:\n", " processed_doc = self.process_single_pdf(file_info)\n", " processed_docs.append(processed_doc)\n", " except Exception as e:\n", " print(f\"Error processing {file_info['filepath']}: {str(e)}\")\n", " continue\n", "\n", " return processed_docs\n", "\n", " def insert_csv_to_table(self, file_path: str, db_path: str, table_name: str):\n", " \"\"\"\n", " Insert CSV data into a table of SQLite database.\n", "\n", " Args:\n", " file_path (str): Path to the CSV file\n", " db_path (str): Path to the SQLite database\n", " table_name (str): Name of the table to create/update\n", " \"\"\"\n", " df = pd.read_csv(file_path)\n", " conn = sqlite3.connect(db_path)\n", " df.to_sql(table_name, conn, if_exists='replace', index=False)\n", " conn.close()\n", "\n", " def insert_data_database(self, db_path: str, file_mappings: Dict[str, str]):\n", " \"\"\"\n", " Bulk insert multiple CSV files into their respective database tables.\n", "\n", " Args:\n", " db_path (str): Path to the SQLite database\n", " file_mappings (Dict[str, str]): Dictionary mapping table names to CSV file paths\n", " \"\"\"\n", " for table_name, file_path in file_mappings.items():\n", " try:\n", " self.insert_csv_to_table(file_path, db_path, table_name)\n", " print(f\"Successfully inserted data into {table_name}\")\n", " except Exception as e:\n", " print(f\"Error inserting data into {table_name}: {str(e)}\")\n", "\n", " def process_and_store_embeddings(self, docs: List[Dict[str, any]], batch_size: int = 10):\n", " \"\"\"Generate embeddings and store them in Qdrant in batches.\"\"\"\n", " for i in range(0, len(docs), batch_size):\n", " batch = docs[i:i + batch_size]\n", "\n", " texts = [doc['full_text'] for doc in batch]\n", " summaries = [doc['summary'] for doc in batch]\n", " filepaths = [doc['filepath'] for doc in batch]\n", " categories = [doc['category'] for doc in batch]\n", "\n", " try:\n", " self.qdrant_insert_embeddings(summaries, texts, filepaths, categories)\n", " print(f\"Processed batch {i//batch_size + 1}/{(len(docs) + batch_size - 1)//batch_size}\")\n", " except Exception as e:\n", " print(f\"Error processing batch starting at index {i}: {str(e)}\")\n", " continue" ] }, { "cell_type": "markdown", "metadata": { "id": "Hh2hbtJrPnku" }, "source": [ "## RAGAgent\n", "\n", "The `RAGAgent` class implements Retrieval-Augmented Generation (RAG) to provide intelligent search and response generation. It combines vector search capabilities with the LLM to give contextually relevant answers.\n", "\n", "- Query categorization and classification\n", "- Vector similarity search in Qdrant\n", "- Context-aware response generation\n", "- Document citation handling\n", "\n", "### Main Components\n", "\n", "#### 1. Query Processing\n", "- `query_categorization`: Classifies queries into predefined categories (technical manual, safety protocol, etc.)\n", "- `query`: Orchestrates the complete RAG pipeline from query to final response\n", "\n", "#### 2. Search and Retrieval \n", "- `qdrant_search`: Performs semantic search using query embeddings, filters results by document category and returns top-k most relevant documents.\n", "\n", "#### 3. Response Generation\n", "- `generate_response`: Creates natural language responses using retrieved context, uses LLM with specialized prompts, Provides citations to source documents." ] }, { "cell_type": "markdown", "metadata": { "id": "tb7vHeUlUOKf" }, "source": [ "#### Query Category Model\n", "\n", "A Pydantic model that defines the structure for query categorization, used by RAGAgent to classify queries into relevant categories (technical_manual, safety_protocol, etc.)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "8j8rxX9RUSa5" }, "outputs": [], "source": [ "# Define category model for query classification\n", "class Category(BaseModel):\n", " category: str" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "3yh_XBGpAK6j" }, "outputs": [], "source": [ "class RAGAgent:\n", " \"\"\"\n", " Agent responsible for Retrieval-Augmented Generation (RAG) operations.\n", " \"\"\"\n", " def __init__(self, mistral_client: Mistral, qdrant_client: QdrantClient):\n", " self.mistral_client = mistral_client\n", " self.qdrant_client = qdrant_client\n", "\n", " def generate_response(self, context: str, query: str) -> str:\n", " \"\"\"Generate a response based on the given context and query.\"\"\"\n", " chat_response = self.mistral_client.chat.complete(\n", " model=model,\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": response_generation_prompt.format(context=context, query=query)\n", " },\n", " ]\n", " )\n", " return chat_response.choices[0].message.content\n", "\n", " def query_categorization(self, query: str) -> str:\n", " \"\"\"Categorize the query into predefined categories.\"\"\"\n", " chat_response = self.mistral_client.chat.parse(\n", " model=model,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", " \"content\": \"Classify the query into one or more categories of the following list: ['technical_manual', 'safety_protocol', 'maintenance_guide', 'troubleshooting_guide']\"\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": query,\n", " },\n", " ],\n", " response_format=Category,\n", " max_tokens=256,\n", " temperature=0\n", " )\n", " return json.loads(chat_response.choices[0].message.content)\n", "\n", " def qdrant_search(self, query: str, category: str = None, top_k: int = 5) -> List[Dict[str, Any]]:\n", " \"\"\"Search for similar texts in Qdrant based on the query and category.\"\"\"\n", " query_vector = DataProcessor(self.mistral_client, self.qdrant_client).get_text_embedding([query])\n", "\n", " retrieval_results = self.qdrant_client.search(\n", " collection_name=\"embeddings\",\n", " query_vector=query_vector,\n", " query_filter=Filter(\n", " must=[\n", " FieldCondition(\n", " key='category',\n", " match=MatchValue(value=category)\n", " )\n", " ]\n", " ),\n", " limit=top_k\n", " )\n", " return retrieval_results\n", "\n", " def query(self, query_text: str, top_k: int = 3) -> Tuple[str, str]:\n", " \"\"\"Process a natural language query using RAG.\"\"\"\n", " category = self.query_categorization(query_text)[\"category\"]\n", "\n", " results = self.qdrant_search(query_text, category, top_k=top_k)\n", "\n", " file_paths = [result.payload[\"filepath\"] for result in results]\n", "\n", " retrieved_text = \"\\n\".join([result.payload[\"text\"] for result in results])\n", " citations = \",\".join([result.payload[\"filepath\"] for result in results])\n", "\n", " return self.generate_response(retrieved_text, query_text), citations" ] }, { "cell_type": "markdown", "metadata": { "id": "p7ry2mbGOXnG" }, "source": [ "### Database Query Tools\n", "\n", "Defines database query function tools. These tools define the function calling interface for the DatabaseQueryAgent, enabling structured querying of different database tables." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "LhnkL3oWOTpe" }, "outputs": [], "source": [ "tools = [\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"query_compliance\",\n", " \"description\": '''Query compliance records with filters. \\n\\n A sample example of columns and corresponding values from db are:\\n\\n EquipmentID,EquipmentName,Manufacturer,Model,ComplianceType,Certification,IssueDate,ExpiryDate,ComplianceStatus,ResponsiblePerson\n", "1,CNC Machine,ABC Corp,Model X,Safety,ISO 9001,2020-01-15,2025-01-15,Active,John Doe''',\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"filters\": {\n", " \"type\": \"object\",\n", " \"description\": '''Dictionary of column names and values to filter by.''',\n", " \"additionalProperties\": {\n", " \"type\": \"string\"\n", " }\n", " }\n", " },\n", " \"required\": [\"filters\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"query_maintenance\",\n", " \"description\": '''Query maintenance records with filters. \\n\\n A sample example of columns and corresponding values from db are:\\n\\n EquipmentID,EquipmentName,Manufacturer,Model,InstallationDate,LastMaintenanceDate,NextMaintenanceDate,MaintenanceType,MaintenanceDetails,MaintenanceStatus,ResponsibleTechnician\n", "1,CNC Machine,ABC Corp,Model X,2020-01-15,2023-09-01,2023-12-01,Preventive,Oil change,Completed,John Doe''',\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"filters\": {\n", " \"type\": \"object\",\n", " \"description\": \"Dictionary of column names and values to filter by\",\n", " \"additionalProperties\": {\n", " \"type\": \"string\"\n", " }\n", " }\n", " },\n", " \"required\": [\"filters\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"query_technical_specs\",\n", " \"description\": '''Query technical specifications with filters.\\n\\n A sample example of columns and corresponding values from db are:\\n\\n EquipmentID,EquipmentName,Manufacturer,Model,SpecificationType,SpecificationDetail,Unit,Value,DateMeasured,MeasuredBy\n", "1,CNC Machine,ABC Corp,Model X,Power,Motor Power,kW,15,2023-01-15,John Doe''',\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"filters\": {\n", " \"type\": \"object\",\n", " \"description\": \"Dictionary of column names and values to filter by\",\n", " \"additionalProperties\": {\n", " \"type\": \"string\"\n", " }\n", " }\n", " },\n", " \"required\": [\"filters\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"query_parts_inventory_compatibility\",\n", " \"description\": '''Query parts, inventory and compatibility with filters.\\n\\n A sample example of columns and corresponding values from db are:\\n\\n PartID,PartName,EquipmentID,EquipmentName,Manufacturer,Model,PartType,Quantity,Compatibility,Supplier,LastOrderDate,NextOrderDate,PartStatus\n", "1,Oil Filter,1,CNC Machine,ABC Corp,Model X,Filter,50,Compatible,Supplier A,2023-01-15,2023-12-01,In Stock''',\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"filters\": {\n", " \"type\": \"object\",\n", " \"description\": \"Dictionary of column names and values to filter by\",\n", " \"additionalProperties\": {\n", " \"type\": \"string\"\n", " }\n", " }\n", " },\n", " \"required\": [\"filters\"],\n", " },\n", " },\n", " }\n", "]" ] }, { "cell_type": "markdown", "metadata": { "id": "LOZP96bpQX8t" }, "source": [ "## DatabaseQueryAgent\n", "\n", "The `DatabaseQueryAgent` class manages interactions with the SQLite database, handling structured data queries through function calling on various databases containing maintenance logs, technical specifications, parts inventories, and compliance records. It provides specialized querying capabilities for different database tables.\n", "\n", "- Natural language query processing\n", "- Structured database querying\n", "- Function calling for query execution\n", "- JSON response formatting\n", "\n", "### Main Components\n", "\n", "#### 1. Table-Specific Queries\n", "- `query_compliance`: Retrieves filtered compliance records\n", "- `query_maintenance`: Accesses maintenance-related information\n", "- `query_technical_specs`: Fetches technical specifications\n", "- `query_parts_inventory_compatibility`: Retrieves parts and compatibility data\n", "\n", "#### 2. Query Processing\n", "- `query`: Processes natural language queries using function calling, Handles tool calls for appropriate database operations, Tracks database tool citations, Formats responses with query results." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "cUXRxB73tY6I" }, "outputs": [], "source": [ "class DatabaseQueryAgent:\n", " \"\"\"\n", " Agent responsible for interacting with the SQLite database.\n", " \"\"\"\n", "\n", " def __init__(self, db_path: str, mistral_client: Mistral):\n", " self.db_path = db_path\n", " self.tools = tools\n", " self.names_to_functions = {\n", " 'query_compliance': functools.partial(self.query_compliance),\n", " 'query_maintenance': functools.partial(self.query_maintenance),\n", " 'query_technical_specs': functools.partial(self.query_technical_specs),\n", " 'query_parts_inventory_compatibility': functools.partial(self.query_parts_inventory_compatibility)\n", " }\n", " self.mistral_client = mistral_client\n", "\n", " def query_compliance(self, filters: Dict[str, str]) -> str:\n", " \"\"\"\n", " Query compliance table with filters.\n", "\n", " Args:\n", " filters (Dict[str, str]): Dictionary of column names and values to filter by.\n", "\n", " Returns:\n", " str: The query result in JSON format.\n", " \"\"\"\n", " try:\n", " conn = sqlite3.connect(self.db_path)\n", " cursor = conn.cursor()\n", " where_conditions = []\n", " params = []\n", " for column, value in filters.items():\n", " where_conditions.append(f\"{column} = ?\")\n", " params.append(value)\n", " where_clause = \" AND \".join(where_conditions)\n", " query = f\"SELECT * FROM compliance WHERE {where_clause}\"\n", " cursor.execute(query, params)\n", " columns = [description[0] for description in cursor.description]\n", " result = cursor.fetchone()\n", " if result:\n", " record = dict(zip(columns, result))\n", " return json.dumps({'result': record})\n", " return json.dumps({'error': 'No matching records found'})\n", " except Exception as e:\n", " return json.dumps({'error': str(e)})\n", " finally:\n", " conn.close()\n", "\n", " def query_maintenance(self, filters: Dict[str, str]) -> str:\n", " \"\"\"\n", " Query maintenance table with filters.\n", "\n", " Args:\n", " filters (Dict[str, str]): Dictionary of column names and values to filter by.\n", "\n", " Returns:\n", " str: The query result in JSON format.\n", " \"\"\"\n", " try:\n", " conn = sqlite3.connect(self.db_path)\n", " cursor = conn.cursor()\n", " where_conditions = []\n", " params = []\n", " for column, value in filters.items():\n", " where_conditions.append(f\"{column} = ?\")\n", " params.append(value)\n", " where_clause = \" AND \".join(where_conditions)\n", " query = f\"SELECT * FROM maintenance WHERE {where_clause}\"\n", " cursor.execute(query, params)\n", " columns = [description[0] for description in cursor.description]\n", " result = cursor.fetchone()\n", " if result:\n", " record = dict(zip(columns, result))\n", " return json.dumps({'result': record})\n", " return json.dumps({'error': 'No matching records found'})\n", " except Exception as e:\n", " return json.dumps({'error': str(e)})\n", " finally:\n", " conn.close()\n", "\n", " def query_technical_specs(self, filters: Dict[str, str]) -> str:\n", " \"\"\"\n", " Query technical specifications table with filters.\n", "\n", " Args:\n", " filters (Dict[str, str]): Dictionary of column names and values to filter by.\n", "\n", " Returns:\n", " str: The query result in JSON format.\n", " \"\"\"\n", " try:\n", " conn = sqlite3.connect(self.db_path)\n", " cursor = conn.cursor()\n", " where_conditions = []\n", " params = []\n", " for column, value in filters.items():\n", " where_conditions.append(f\"{column} = ?\")\n", " params.append(value)\n", " where_clause = \" AND \".join(where_conditions)\n", " query = f\"SELECT * FROM technical_specifications WHERE {where_clause}\"\n", " cursor.execute(query, params)\n", " columns = [description[0] for description in cursor.description]\n", " result = cursor.fetchone()\n", " if result:\n", " record = dict(zip(columns, result))\n", " return json.dumps({'result': record})\n", " return json.dumps({'error': 'No matching records found'})\n", " except Exception as e:\n", " return json.dumps({'error': str(e)})\n", " finally:\n", " conn.close()\n", "\n", " def query_parts_inventory_compatibility(self, filters: Dict[str, str]) -> str:\n", " \"\"\"\n", " Query parts inventory and compatibility table with filters.\n", "\n", " Args:\n", " filters (Dict[str, str]): Dictionary of column names and values to filter by.\n", "\n", " Returns:\n", " str: The query result in JSON format.\n", " \"\"\"\n", " try:\n", " conn = sqlite3.connect(self.db_path)\n", " cursor = conn.cursor()\n", " where_conditions = []\n", " params = []\n", " for column, value in filters.items():\n", " where_conditions.append(f\"{column} = ?\")\n", " params.append(value)\n", " where_clause = \" AND \".join(where_conditions)\n", " query = f\"SELECT * FROM parts_inventory_compatibility WHERE {where_clause}\"\n", " cursor.execute(query, params)\n", " columns = [description[0] for description in cursor.description]\n", " result = cursor.fetchone()\n", " if result:\n", " record = dict(zip(columns, result))\n", " return json.dumps({'result': record})\n", " return json.dumps({'error': 'No matching records found'})\n", " except Exception as e:\n", " return json.dumps({'error': str(e)})\n", " finally:\n", " conn.close()\n", "\n", " def query(self, query_text: str) -> str:\n", " \"\"\"\n", " Process a natural language query using the database tools.\n", "\n", " Args:\n", " query_text (str): Natural language query\n", "\n", " Returns:\n", " str: Response from the database query\n", " \"\"\"\n", " messages = [{\"role\": \"user\", \"content\": query_text}]\n", "\n", " # Get initial response with potential tool calls\n", " response = self.mistral_client.chat.complete(\n", " model=model,\n", " messages=messages,\n", " tools=self.tools,\n", " tool_choice=\"any\",\n", " )\n", " messages.append(response.choices[0].message)\n", "\n", " citations = set()\n", "\n", " # Handle any tool calls\n", " if hasattr(response.choices[0].message, 'tool_calls') and response.choices[0].message.tool_calls:\n", " for tool_call in response.choices[0].message.tool_calls:\n", " function_name = tool_call.function.name\n", " print(f\"Tool call: {function_name}\")\n", " citations.add(function_name)\n", " function_params = json.loads(tool_call.function.arguments)\n", " print(f\"Tool call parameters: {function_params}\")\n", " function_result = self.names_to_functions[function_name](**function_params)\n", " messages.append({\n", " \"role\": \"tool\",\n", " \"name\": function_name,\n", " \"content\": function_result,\n", " \"tool_call_id\": tool_call.id\n", " })\n", "\n", " # Get final response\n", " final_response = self.mistral_client.chat.complete(\n", " model=\"mistral-small-latest\",\n", " messages=messages\n", " )\n", " return final_response.choices[0].message.content, \",\".join(list((citations)))" ] }, { "cell_type": "markdown", "metadata": { "id": "Z3J_bIpmtY6J" }, "source": [ "## WorkflowOrchestrator\n", "\n", "The `WorkflowOrchestrator` class orchestrates the interaction between RAGAgent and DatabaseQueryAgent to provide comprehensive responses by combining information from both structured and unstructured data sources.\n", "\n", "- Workflow orchestration and coordination\n", "- Response combination and integration\n", "- Final response summarization\n", "- Source citation management\n", "\n", "### Main Components\n", "\n", "#### 1. Workflow Execution \n", "- `workflow`: Manages the complete query processing pipeline, Coordinates responses from both agents, Generates final unified response, Maintains traceability through citations.\n", "\n", "#### 2. Response summarization\n", "- `combine_and_summarize_responses`: Merges and summarizes responses from both agents, Applies structured formatting to combined responses, Uses summarization prompts for coherent output." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "AFS9LwN68r0P" }, "outputs": [], "source": [ "class WorkflowOrchestrator:\n", " \"\"\"\n", " WorkflowOrchestrator is responsible for orchestrating the workflow between RAGSearchAgent and DatabaseQueryAgent.\n", " Handles query processing, response combination, and final summarization.\n", " \"\"\"\n", " def __init__(self,\n", " rag_agent: RAGAgent,\n", " db_query_agent: DatabaseQueryAgent,\n", " client: Mistral):\n", " \"\"\"\n", " Initialize WorkflowOrchestrator with necessary components.\n", "\n", " Args:\n", " rag_agent: RAGSearchAgent for document retrieval and generation\n", " db_query_agent: DatabaseQueryAgent for structured data queries\n", " client: Mistral client for text generation\n", " \"\"\"\n", " self.rag_agent = rag_agent\n", " self.db_query_agent = db_query_agent\n", " self.client = client\n", "\n", " def combine_and_summarize_responses(self,\n", " responses: Dict[str, str],\n", " query: str,\n", " summarization_prompt: str = summarization_prompt) -> str:\n", " \"\"\"\n", " Combine and summarize multiple responses into a coherent final response.\n", "\n", " Args:\n", " responses: Dictionary of response types and their content\n", " query: Original user query\n", " summarization_prompt: Template for summarization\n", "\n", " Returns:\n", " str: Summarized and combined response\n", " \"\"\"\n", " # Format responses into a structured text\n", " combined_text = \"\\n\\n\".join([\n", " f\"{source}: {content}\"\n", " for source, content in responses.items()\n", " ])\n", "\n", " # Generate summarized response\n", " chat_response = self.client.chat.complete(\n", " model=model,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", " \"content\": summarization_prompt\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": f\"Query: {query}\\n\\nResponses:\\n{combined_text}\"\n", " },\n", " ],\n", " temperature=0\n", " )\n", " return chat_response.choices[0].message.content\n", "\n", " def workflow(self, query: str) -> str:\n", " \"\"\"\n", " Execute the workflow for processing a query.\n", "\n", " Args:\n", " query: User query\n", "\n", " Returns:\n", " str: Final response with citations\n", " \"\"\"\n", " # Get responses from both agents\n", " db_response, tools_citations = self.db_query_agent.query(query)\n", " rag_response, rag_citations = self.rag_agent.query(query)\n", "\n", " # Combine responses into a dictionary\n", " responses = {\n", " \"Database Response\": db_response,\n", " \"RAG Response\": rag_response\n", " }\n", "\n", " # Generate final summarized response\n", " final_response = self.combine_and_summarize_responses(\n", " responses=responses,\n", " query=query,\n", " summarization_prompt=final_response_summarization_prompt\n", " )\n", "\n", " # Add citations\n", " citations = (\n", " f\"\\n\\nSources:\\n\"\n", " f\"- Database Tools: {((tools_citations))}\\n\"\n", " f\"- PDF Sources: {rag_citations}\"\n", " )\n", "\n", " return final_response + citations" ] }, { "cell_type": "markdown", "metadata": { "id": "JHUeoXY-9EkC" }, "source": [ "### Initialize and Process Documents\n", "\n", " Initializes the DataProcessor and processes PDF documents through the complete pipeline - from file ingestion to embedding storage." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iGmkQh3b7fs_", "outputId": "5af78a21-20fb-4390-e5fc-944ed432ac00" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Processing PDFs: 100%|██████████| 12/12 [00:24<00:00, 2.07s/it]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Processed batch 1/2\n", "Processed batch 2/2\n" ] } ], "source": [ "# Initialize the processor\n", "doc_processor = DataProcessor(mistral_client, qdrant_client)\n", "\n", "# Process documents\n", "file_list = doc_processor.get_categorized_filepaths(root_dir='./pdf_data')\n", "processed_docs = doc_processor.process_documents(file_list)\n", "doc_processor.process_and_store_embeddings(processed_docs)" ] }, { "cell_type": "markdown", "metadata": { "id": "EIMDmlNhSaMN" }, "source": [ "### Insert Data into Database tables.\n", "\n", "Loads multiple CSV files into their corresponding database tables in SQLite." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GrsNnaat7vSy", "outputId": "4bbfcec4-07b2-4d70-9610-b8759ace9715" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Successfully inserted data into compliance\n", "Successfully inserted data into maintenance\n", "Successfully inserted data into technical_specifications\n", "Successfully inserted data into parts_inventory_compatibility\n" ] } ], "source": [ "# Insert data into tables\n", "\n", "db_path = \"./database.db\"\n", "\n", "file_mappings = {\n", " \"compliance\": \"./csv_data/compliance_db.csv\",\n", " \"maintenance\": \"./csv_data/maintenance_db.csv\",\n", " \"technical_specifications\": \"./csv_data/technical_specifications_db.csv\",\n", " \"parts_inventory_compatibility\": \"./csv_data/parts_inventory_compatibility_db.csv\"\n", "}\n", "\n", "doc_processor.insert_data_database(db_path, file_mappings)" ] }, { "cell_type": "markdown", "metadata": { "id": "mCvinS4ItY6J" }, "source": [ "### Initialise the Agents\n", "\n", " Initializes the three core agents:\n", "\n", " - RAGAgent for document search and response\n", " - DatabaseQueryAgent for structured data querying.\n", " - WorkflowAgent for orchestrating responses." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "eFtjS8CotY6J" }, "outputs": [], "source": [ "rag_agent = RAGAgent(mistral_client, qdrant_client)\n", "db_query_agent = DatabaseQueryAgent(db_path, mistral_client)\n", "workflow_orchestrator = WorkflowOrchestrator(rag_agent, db_query_agent, mistral_client)" ] }, { "cell_type": "markdown", "metadata": { "id": "tiY4XQsAtY6J" }, "source": [ "### Example Queries" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mZwyPijttY6J", "outputId": "b7d0aac2-a954-4593-a85e-3e0c2f9eea72" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Query: What are the troubleshooting steps for inaccurate machining in CNC Machine (Model X) and when was its last maintenance performed?\n", "----------------------\n", "Tool call: query_maintenance\n", "Tool call parameters: {'filters': {'EquipmentName': 'CNC Machine', 'Model': 'Model X'}}\n", "Tool call: query_technical_specs\n", "Tool call parameters: {'filters': {'EquipmentName': 'CNC Machine', 'Model': 'Model X', 'SpecificationType': 'Accuracy'}}\n", "------------Answer----------\n", "### Troubleshooting Steps for Inaccurate Machining in CNC Machine (Model X)\n", "\n", "To address inaccurate machining in the CNC Machine (Model X), follow these comprehensive troubleshooting steps:\n", "\n", "1. **Check the Program**:\n", " - Ensure that the CNC program is correct and free of errors. Even minor errors in the code can lead to significant inaccuracies in machining.\n", "\n", "2. **Inspect the Tooling**:\n", " - Verify that the correct tools are being used and that they are in good condition. Worn or damaged tools can cause inaccuracies. Replace tools as needed.\n", "\n", "3. **Verify the Setup**:\n", " - Check the workpiece setup to ensure it is secure and correctly positioned. Any movement or misalignment can lead to machining inaccuracies.\n", "\n", "4. **Calibrate the Machine**:\n", " - Regular calibration of the machine's axes and spindles can help maintain accuracy. If the machine hasn't been calibrated recently, it may be time to do so.\n", "\n", "5. **Check for Wear and Tear**:\n", " - Inspect the machine for any signs of wear and tear, such as worn bearings or guides. These components can affect the machine's accuracy over time.\n", "\n", "6. **Environmental Factors**:\n", " - Ensure that the machining environment is stable. Factors such as temperature, humidity, and vibration can affect the machine's performance.\n", "\n", "7. **Machine Maintenance**:\n", " - Regular maintenance, including lubrication and cleaning, can help prevent inaccuracies. Ensure that the machine is well-maintained and that all scheduled maintenance tasks are completed on time.\n", "\n", "8. **Software and Firmware**:\n", " - Ensure that the machine's software and firmware are up-to-date. Outdated software can sometimes cause inaccuracies.\n", "\n", "9. **Backlash Compensation**:\n", " - Check the backlash compensation settings. Incorrect settings can lead to inaccuracies, especially in high-precision machining.\n", "\n", "10. **Consult the Manual**:\n", " - Refer to the machine's manual for any model-specific troubleshooting steps or recommendations.\n", "\n", "### Last Maintenance Performed\n", "\n", "The last maintenance for the CNC Machine (Model X) was performed on September 1, 2023. This maintenance was preventive and included an oil change. The next scheduled maintenance is set for December 1, 2023. For more detailed information, refer to the Maintenance Log in the Appendices section of the machine's documentation.\n", "\n", "### Important Safety Precautions\n", "\n", "- Always ensure the machine is turned off and locked out before performing any maintenance or inspection tasks.\n", "- Wear appropriate personal protective equipment (PPE) when handling tools and machinery.\n", "- Follow the manufacturer's guidelines for tool replacement and machine calibration to avoid any potential hazards.\n", "\n", "By following these steps and maintaining a regular maintenance schedule, you can help ensure the accuracy and reliability of the CNC Machine (Model X).\n", "\n", "Sources:\n", "- Database Tools: query_technical_specs,query_maintenance\n", "- PDF Sources: ./pdf_data/troubleshooting_guide/CNC_Machine_(Model X)_Troubleshooting_Guide.pdf,./pdf_data/troubleshooting_guide/Robotic_Arm_(Unit 7)_Troubleshooting_Guide.pdf,./pdf_data/troubleshooting_guide/Cooling_System_(Model Y)_Troubleshooting_Guide.pdf\n" ] } ], "source": [ "query = \"What are the troubleshooting steps for inaccurate machining in CNC Machine (Model X) and when was its last maintenance performed?\"\n", "\n", "print(f\"Query: {query}\")\n", "print(\"----------------------\")\n", "\n", "answer = workflow_orchestrator.workflow(query)\n", "\n", "print(\"------------Answer----------\")\n", "print(answer)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9SLxmepYboFI", "outputId": "20728980-c5ff-4e04-cf7e-576f6aec4ebf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Query: What are the safety protocols for the Cooling System (Model Y), and when is its next scheduled maintenance?\n", "----------------------\n", "Tool call: query_compliance\n", "Tool call parameters: {'filters': {'EquipmentName': 'Cooling System', 'Model': 'Model Y', 'ComplianceType': 'Safety'}}\n", "Tool call: query_maintenance\n", "Tool call parameters: {'filters': {'EquipmentName': 'Cooling System', 'Model': 'Model Y'}}\n", "------------Answer----------\n", "### Safety Protocols for Cooling System (Model Y)\n", "\n", "The Cooling System (Model Y) is compliant with OSHA safety standards, issued on March 10, 2021, and is set to expire on March 10, 2026. The system is currently active and Alice Johnson is responsible for its compliance. Below are the detailed safety protocols and maintenance schedule for the Cooling System (Model Y).\n", "\n", "#### Personal Protective Equipment (PPE)\n", "- **Required PPE:**\n", " - Safety Glasses\n", " - Gloves\n", " - Ear Protection\n", " - Safety Shoes\n", "- **PPE Usage:**\n", " - Always wear the required PPE when operating or maintaining the system.\n", " - Ensure PPE is in good condition and fits properly.\n", "\n", "#### System Operation Safety\n", "- **Pre-Operation Checks:**\n", " - Inspect System: Check for any visible damage or issues.\n", " - Check Coolant Levels: Ensure coolant levels are adequate.\n", " - Check Fans: Ensure fans are functioning properly.\n", "- **During Operation:**\n", " - Monitor System: Keep a close eye on the system during operation.\n", " - Listen for Unusual Noises: Stop the system if you hear any unusual noises.\n", " - Check Cooling Performance: Ensure the system is cooling effectively.\n", "- **Post-Operation Checks:**\n", " - Clean System: Clean the system and work area.\n", " - Inspect Components: Check components for wear and replace as needed.\n", " - Record Issues: Document any issues or anomalies observed during operation.\n", "\n", "#### Emergency Procedures\n", "- **System Failure:**\n", " 1. Stop Operation: Immediately stop the system.\n", " 2. Notify Supervisor: Inform your supervisor or maintenance team.\n", " 3. Document Issue: Record the details of the issue for further investigation.\n", "- **Injury:**\n", " 1. Provide First Aid: Administer first aid as needed.\n", " 2. Notify Safety Team: Inform the safety team or emergency services.\n", " 3. Document Incident: Record the details of the incident for further investigation.\n", "- **Fire:**\n", " 1. Evacuate Area: Clear the area around the system.\n", " 2. Use Fire Extinguisher: Use the appropriate fire extinguisher to put out the fire.\n", " 3. Notify Safety Team: Inform the safety team or emergency services.\n", "\n", "#### Hazardous Materials\n", "- **Coolant:**\n", " - Handling: Wear gloves and safety glasses when handling coolant.\n", " - Disposal: Dispose of used coolant according to local regulations.\n", "- **Lubricants:**\n", " - Handling: Wear gloves and safety glasses when handling lubricants.\n", " - Disposal: Dispose of used lubricants according to local regulations.\n", "\n", "#### Environmental Safety\n", "- **Ventilation:**\n", " - Ensure the work area has adequate ventilation to prevent the buildup of fumes and heat.\n", "- **Noise Levels:**\n", " - Use ear protection to protect hearing from loud noises.\n", "- **Waste Management:**\n", " - Dispose of waste materials according to local regulations.\n", "\n", "### Next Scheduled Maintenance\n", "The next scheduled maintenance for the Cooling System (Model Y) is on October 10, 2023. This maintenance is preventive in nature and involves filter cleaning. The maintenance is currently scheduled, and Alice Johnson is the responsible technician.\n", "\n", "Sources:\n", "- Database Tools: query_compliance,query_maintenance\n", "- PDF Sources: ./pdf_data/safety_protocol/Cooling_System_(Model Y)_Safety_Protocol.pdf,./pdf_data/safety_protocol/CNC_Machine_(Model X)_Safety_Protocol.pdf,./pdf_data/safety_protocol/Robotic_Arm_(Unit 7)_Safety_Protocol.pdf\n" ] } ], "source": [ "query = \"What are the safety protocols for the Cooling System (Model Y), and when is its next scheduled maintenance?\"\n", "\n", "print(f\"Query: {query}\")\n", "print(\"----------------------\")\n", "\n", "answer = workflow_orchestrator.workflow(query)\n", "\n", "print(\"------------Answer----------\")\n", "print(answer)" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "mistral", "language": "python", "name": "mistral" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 0 }