mistral pgsql function calling

Details

File: third_party/MS_Autogen_pgsql/mistral_pgsql_function_calling.ipynb
Type: Jupyter Notebook
Use Cases: Function calling
Integrations: Autogen, PostgreSQL, Distilabel
Content

Notebook content (JSON format):
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "41122cd9",
   "metadata": {},
   "source": [
    "# Microsoft AutoGen & Mistral Large 2407 - Retrieve information from a Postgresql Database\n",
    "\n",
    "This tutorial is written as example how to use Microsoft AutoGen in Combination with Mistral Large V2 to query a Postgres Database.\n",
    "It gives a short and simple overview how to do this, it is also possible to write extra tools and add them to query other DB's or maybe VectorDB's with a perfect prompt ;)! \n",
    "\n",
    "Besides query's and prompting a good context is also really important, thats why I added a extra function to always provide the context of all avaible tables.\n",
    "\n",
    "**PLEASE NOTE:** Try to prevent using to many tools and contexts together, but use different 'chat models' instead of a single big model to do everything.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Preparation steps:\n",
    "\n",
    "In the first few steps we will install Postgresql, after that we will import a database dumb.\n",
    "I did go for the following database dumb; name.basics.tsv, found here; https://datasets.imdbws.com/\n",
    "\n",
    "- https://wiki.postgresql.org/wiki/Sample_Databases\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab418a2f-cb71-4276-a805-a434a40944e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo apt update\n",
    "!sudo apt install dirmngr ca-certificates software-properties-common gnupg gnupg2 apt-transport-https curl -y\n",
    "!curl -fSsL https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor | sudo tee /usr/share/keyrings/postgresql.gpg > /dev/null\n",
    "!echo 'deb [arch=amd64,arm64,ppc64el signed-by=/usr/share/keyrings/postgresql.gpg] http://apt.postgresql.org/pub/repos/apt/ jammy-pgdg main' | sudo tee /etc/apt/sources.list.d/pgdg.list\n",
    "!sudo apt update\n",
    "!sudo apt install postgresql-client-16 postgresql-16 -y\n",
    "!sudo service postgresql start\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c693f042",
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo -u postgres psql -U postgres -c \"ALTER ROLE postgres WITH PASSWORD 'super_secret_postgres_password';\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15e1d711",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the dataset from IMDB\n",
    "!cd /\n",
    "!sudo mkdir data\n",
    "!sudo wget https://datasets.imdbws.com/name.basics.tsv.gz\n",
    "!sudo gunzip name.basics.tsv.gz\n",
    "\n",
    "# Create a super user\n",
    "!sudo -u postgres psql -U postgres -c \"CREATE ROLE root WITH SUPERUSER;\"\n",
    "!sudo -u postgres psql -U postgres -c \"ALTER ROLE root WITH LOGIN;\"\n",
    "!sudo -u postgres psql -U postgres -c \"CREATE ROLE postgres WITH PASSWORD 'super_secret_postgres_password';\"\n",
    "\n",
    "# Import the dataset\n",
    "!sudo -u postgres psql -U postgres -c \"CREATE TABLE imdb ( nconst TEXT, primaryName TEXT, birthYear INT, deathYear INT, primaryProfession TEXT, knownForTitles TEXT);\"\n",
    "\n",
    "#It is possible you have to change the directory which containts this file\n",
    "!sudo -u postgres psql -U postgres -c \"COPY imdb FROM '/content/name.basics.tsv' WITH (HEADER true);\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da899631-32e4-4612-8a07-ae7a9bcb7195",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pyautogen psycopg2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86b2b919-397b-416d-9ca5-69e618e8121e",
   "metadata": {},
   "source": [
    "# Import all required packages\n",
    "\n",
    "Lets importat all required packages, in this case we need autogen, the postgresql package and some other libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "938180a6-7c14-4acd-b062-992b04309baa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen import ConversableAgent, register_function\n",
    "from typing import List, Optional, Union, Dict, Any\n",
    "import psycopg2\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46b3e4ac-1b93-4387-90b7-8658fbe99e68",
   "metadata": {},
   "source": [
    "# The Postgress Function\n",
    "\n",
    "The Postgress function takes a valid json input, based on this input the query is executed.\n",
    "After that the function returns the output to the LLM, which will respond on that with a message to the user.\n",
    "\n",
    "Because we pre-defined how to use the tool, it is not possible to delete , update or create any records, read access only!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d82c2eee-a081-49f4-9fad-24f1f9657907",
   "metadata": {},
   "outputs": [],
   "source": [
    "def execute_postgres_query(\n",
    "    table_name: str,\n",
    "    columns: List[str],\n",
    "    filters: Optional[Dict[str, Any]] = None,\n",
    "    sort_column: Optional[str] = None,\n",
    "    sort_order: Optional[str] = None,\n",
    "    limit: Optional[int] = 150,  # Default limit of 150 rows, you can edit this yourself if needed, the AI will also be able to change this.\n",
    "):\n",
    "    # Validate input\n",
    "    if not table_name:\n",
    "        return \"Error: table_name is required\"\n",
    "    if not columns:\n",
    "        return \"Error: columns is required\"\n",
    "    if sort_column and not sort_order:\n",
    "        return \"Error: sort_order is required when sort_column is specified\"\n",
    "\n",
    "    # Generate SQL query\n",
    "    query = f\"SELECT {', '.join(columns)} FROM {table_name}\"\n",
    "    params = []\n",
    "\n",
    "    if filters:\n",
    "        filter_conditions = []\n",
    "        for column, value in filters.items():\n",
    "            if isinstance(value, str) and value.startswith('%') and value.endswith('%'):\n",
    "                filter_conditions.append(f\"{column} LIKE %s\")\n",
    "                params.append(value)\n",
    "            elif isinstance(value, list):\n",
    "                filter_conditions.append(f\"{column} NOT IN %s\")\n",
    "                params.append(tuple(value))\n",
    "            else:\n",
    "                filter_conditions.append(f\"{column} = %s\")\n",
    "                params.append(value)\n",
    "        query += \" WHERE \" + \" AND \".join(filter_conditions)\n",
    "\n",
    "    if sort_column and sort_order:\n",
    "        query += f\" ORDER BY {sort_column} {sort_order}\"\n",
    "    if limit:\n",
    "        query += f\" LIMIT {limit}\"\n",
    "\n",
    "    # Execute SQL query\n",
    "    conn = psycopg2.connect(database=\"postgres\", user=\"postgres\", password=\"super_secret_postgres_password\", host=\"localhost\", port=\"5432\")\n",
    "    cur = conn.cursor()\n",
    "    cur.execute(query, params)\n",
    "    results = cur.fetchall()\n",
    "    cur.close()\n",
    "    conn.close()\n",
    "    return results"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "282111b8-eda6-4b37-a316-9635b6678d71",
   "metadata": {},
   "source": [
    "# Define API keys\n",
    "\n",
    "Fill in your Mistral API key to access Mistral-large-2407! :D\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da4ba448-12d0-4865-8099-3b0c41b446c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "mistral_key = input(\"Enter your Mistral AI key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d60345db-d288-40d3-aa6a-63305e1a4758",
   "metadata": {},
   "source": [
    "# Get All tables\n",
    "\n",
    "We want to prompt the LLM with the context of all tables, this has to be up to date, so we create a sepperate function which queries the postgres tool but with pre defined input to query all tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33aec574-4e6d-480e-869c-7f3e82cdadaa",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_all_tables():\n",
    "    # Exclude default PostgreSQL schemas\n",
    "    excluded_schemas = ['information_schema', 'pg_catalog']\n",
    "\n",
    "    # Query to get all tables excluding the default schemas\n",
    "    table_columns = ['table_schema', 'table_name']\n",
    "    table_name = 'information_schema.tables'\n",
    "    filters = {'table_schema': excluded_schemas}\n",
    "    sort_column = 'table_schema'\n",
    "    sort_order = 'ASC'\n",
    "\n",
    "    # Execute the query to get all tables\n",
    "    tables_query_result = execute_postgres_query(\n",
    "        table_name,\n",
    "        table_columns,\n",
    "        filters,\n",
    "        sort_column,\n",
    "        sort_order\n",
    "    )\n",
    "\n",
    "    # Parse the results of the tables query\n",
    "    tables = [{'table_schema': row[0], 'table_name': row[1]} for row in tables_query_result]\n",
    "\n",
    "    # Prepare a list to store table information with columns\n",
    "    table_info = []\n",
    "\n",
    "    # Iterate over each table to get its columns\n",
    "    for table in tables:\n",
    "        schema_name = table['table_schema']\n",
    "        table_name = table['table_name']\n",
    "\n",
    "        # Query to get columns for the current table\n",
    "        columns_columns = ['column_name']\n",
    "        columns_table_name = 'information_schema.columns'\n",
    "        columns_filters = {'table_schema': schema_name, 'table_name': table_name}\n",
    "        columns_sort_column = 'ordinal_position'\n",
    "        columns_sort_order = 'ASC'\n",
    "\n",
    "        # Execute the query to get columns\n",
    "        columns_query_result = execute_postgres_query(\n",
    "            columns_table_name,\n",
    "            columns_columns,\n",
    "            columns_filters,\n",
    "            columns_sort_column,\n",
    "            columns_sort_order\n",
    "        )\n",
    "\n",
    "        # Parse the results of the columns query\n",
    "        columns = [row[0] for row in columns_query_result]\n",
    "\n",
    "        # Add table information with columns to the list\n",
    "        table_info.append({\n",
    "            'table_schema': schema_name,\n",
    "            'table_name': table_name,\n",
    "            'columns': columns\n",
    "        })\n",
    "\n",
    "    return table_info"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb49afae-cf21-4a71-a313-708cefddfd3e",
   "metadata": {},
   "source": [
    "# Execute user queries\n",
    "\n",
    "Now everything is set to use the chat and query Postgresql Database using Mistral-Large-2407"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31357af0-ddb1-4d83-8599-a72a4a55cd79",
   "metadata": {},
   "outputs": [],
   "source": [
    "def chatbot(mistral_key):\n",
    "    config_list = [\n",
    "        {\n",
    "            'model': 'mistral-large-2407', # If the responses are very slow, change this model to open-mixtral-8x22b\n",
    "            'base_url': 'https://api.mistral.ai/v1',\n",
    "            \"api_key\": mistral_key,\n",
    "            \"tool_choice\": \"auto\",\n",
    "        },\n",
    "    ]\n",
    "\n",
    "    llm_config={\n",
    "        \"config_list\": config_list,\n",
    "        \"temperature\": 0.1\n",
    "    }\n",
    "\n",
    "    user = ConversableAgent(\n",
    "        \"user\",\n",
    "        llm_config=False,\n",
    "        is_termination_msg=lambda msg: \"tool_calls\" not in msg,\n",
    "        human_input_mode=\"NEVER\",\n",
    "        max_consecutive_auto_reply=3,\n",
    "    )\n",
    "    \n",
    "    assistant = ConversableAgent(\n",
    "    name=\"assistant\",\n",
    "        llm_config=llm_config,\n",
    "        system_message=\"You are an helpful AI assistant, you use your Postgres tool to query the database. Keep in mind the possibility of to long contexts lengths when using limits wrong.\"\n",
    "    )\n",
    "\n",
    "    assistant.register_for_llm(name=\"postgres_query\", description=\"Useful for when you need query the postgres db\")(execute_postgres_query)\n",
    "    user.register_for_execution(name=\"postgres_query\")(execute_postgres_query)\n",
    "\n",
    "    LLM_CONTEXT = get_all_tables()\n",
    "\n",
    "    user.send(f\"This are all the available tables; \\n\\n  {LLM_CONTEXT} \\n\\n \", assistant, request_reply=False)\n",
    "    assistant.send(\"Thanks for the additonal context of all existing tables!\", user, request_reply=False)\n",
    "    \n",
    "    while True:\n",
    "        task = input(\"Enter the query for the LLM ('exit' to quit): \")\n",
    "        if task.lower() == 'exit':\n",
    "            break\n",
    "        context_task = f\"{task}\"\n",
    "        user.initiate_chat(assistant, message=context_task, clear_history=False)\n",
    "\n",
    "chatbot(mistral_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "047c0b8d-2076-4b89-8ae7-cd41aa983744",
   "metadata": {},
   "source": [
    "# Example questions & Conclusion\n",
    "\n",
    "Some good example questions to ask this model are;\n",
    "- Get me all people named Mistral\n",
    "- Get me all actors born in 2000, limit them to 10\n",
    "- Get me all actors from before 1950, limit them to 13\n",
    "\n",
    "\n",
    "So like you can see Mistral Large V2 or any equivelant model it is relatively easy to create a function to query a Postgres DB without giving it full access to delete records. \n",
    "\n",
    "This can give people a safer way to access different databases, without having to worry to make mistakes.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}