Reach out
← Back to Cookbook

image description prompting pixtral

Details

File: mistral/image_understanding/image_description_prompting_pixtral.ipynb

Type: Jupyter Notebook

Use Cases: Vision, Image understanding, Data Extraction

Content

Notebook content (JSON format):

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Image Description Extraction using Mistral's Pixtral API"
      ],
      "metadata": {
        "id": "paGlb8xLgl6c"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Image Description Extraction using Mistral's Pixtral API\n",
        "\n",
        "In this notebook, we'll use the `Mistral` API to extract structured image descriptions in JSON format using the `Pixtral-12b-2409` model. We'll send an image URL and prompt the model to return key elements with descriptions.\n",
        "\n",
        "## Prerequisites\n",
        "Make sure you have an API key for the Mistral AI platform. We'll also show you how to load it from environment variables."
      ],
      "metadata": {
        "id": "IEzYEp52gowH"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Install the Mistral Python SDK\n",
        "!pip install mistralai"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "s3t8USmmgvHC",
        "outputId": "2f580b3e-288d-47ee-f104-bffdd662c61a"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting mistralai\n",
            "  Downloading mistralai-1.1.0-py3-none-any.whl.metadata (23 kB)\n",
            "Requirement already satisfied: eval-type-backport<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (0.2.0)\n",
            "Collecting httpx<0.28.0,>=0.27.0 (from mistralai)\n",
            "  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)\n",
            "Collecting jsonpath-python<2.0.0,>=1.0.6 (from mistralai)\n",
            "  Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)\n",
            "Requirement already satisfied: pydantic<3.0.0,>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.9.2)\n",
            "Requirement already satisfied: python-dateutil==2.8.2 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.8.2)\n",
            "Collecting typing-inspect<0.10.0,>=0.9.0 (from mistralai)\n",
            "  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil==2.8.2->mistralai) (1.16.0)\n",
            "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.7.1)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (2024.8.30)\n",
            "Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->mistralai)\n",
            "  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)\n",
            "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.10)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (1.3.1)\n",
            "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->mistralai)\n",
            "  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (2.23.4)\n",
            "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (4.12.2)\n",
            "Collecting mypy-extensions>=0.3.0 (from typing-inspect<0.10.0,>=0.9.0->mistralai)\n",
            "  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\n",
            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<0.28.0,>=0.27.0->mistralai) (1.2.2)\n",
            "Downloading mistralai-1.1.0-py3-none-any.whl (229 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m229.7/229.7 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.0/78.0 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)\n",
            "Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
            "Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
            "Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: mypy-extensions, jsonpath-python, h11, typing-inspect, httpcore, httpx, mistralai\n",
            "Successfully installed h11-0.14.0 httpcore-1.0.6 httpx-0.27.2 jsonpath-python-1.0.6 mistralai-1.1.0 mypy-extensions-1.0.0 typing-inspect-0.9.0\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Setup\n",
        "We'll load the Mistral API key from environment variables and initialize the client. Make sure your API key is saved in your environment variables as `MISTRAL_API_KEY`.\n"
      ],
      "metadata": {
        "id": "7X4dfUvAgzjA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%env MISTRAL_API_KEY="
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pLD2umN1hpDh",
        "outputId": "14c5a452-72e6-4695-eacd-14911f453712"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "env: MISTRAL_API_KEY=\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from mistralai import Mistral\n",
        "\n",
        "# Load Mistral API key from environment variables\n",
        "api_key = os.environ[\"MISTRAL_API_KEY\"]\n",
        "\n",
        "# Model specification\n",
        "model = \"pixtral-12b-2409\"\n",
        "\n",
        "# Initialize the Mistral client\n",
        "client = Mistral(api_key=api_key)\n"
      ],
      "metadata": {
        "id": "Q57RScgEg1gS"
      },
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Sending Image URL for Description\n",
        "We'll prompt the model to describe the image by providing an image URL. The response will be returned in a structured JSON format with the key elements described.\n"
      ],
      "metadata": {
        "id": "mJuIiO9Ag5Sr"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Define the messages for the chat API\n",
        "messages = [\n",
        "    {\n",
        "        \"role\": \"system\",\n",
        "        \"content\": \"Return the answer in a JSON object with the next structure: \"\n",
        "                   \"{\\\"elements\\\": [{\\\"element\\\": \\\"some name of element1\\\", \"\n",
        "                   \"\\\"description\\\": \\\"some description of element 1\\\"}, \"\n",
        "                   \"{\\\"element\\\": \\\"some name of element2\\\", \\\"description\\\": \"\n",
        "                   \"\\\"some description of element 2\\\"}]}\"\n",
        "    },\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": \"Describe the image\"\n",
        "    },\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": [\n",
        "            {\n",
        "                \"type\": \"image_url\",\n",
        "                \"image_url\": \"https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg\"\n",
        "            }\n",
        "        ]\n",
        "    }\n",
        "]\n",
        "\n",
        "# Call the Mistral API to complete the chat\n",
        "chat_response = client.chat.complete(\n",
        "    model=model,\n",
        "    messages=messages,\n",
        "    response_format={\n",
        "        \"type\": \"json_object\",\n",
        "    }\n",
        ")\n",
        "\n",
        "# Get the content of the response\n",
        "content = chat_response.choices[0].message.content\n",
        "\n",
        "# Output the raw JSON response\n",
        "print(content)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KILew1ucg79F",
        "outputId": "f3696e0b-7c52-4da0-f0f9-e3759614c016"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            " {\n",
            "    \"elements\": [\n",
            "        {\n",
            "            \"element\": \"Eiffel Tower\",\n",
            "            \"description\": \"A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\"\n",
            "        },\n",
            "        {\n",
            "            \"element\": \"Snow-covered Trees\",\n",
            "            \"description\": \"Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\"\n",
            "        },\n",
            "        {\n",
            "            \"element\": \"Snow\",\n",
            "            \"description\": \"A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\"\n",
            "        },\n",
            "        {\n",
            "            \"element\": \"Lamppost\",\n",
            "            \"description\": \"A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\"\n",
            "        }\n",
            "    ]\n",
            "}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Parsing the JSON Response\n",
        "We'll now parse the JSON response from the API and print the elements and their corresponding descriptions.\n"
      ],
      "metadata": {
        "id": "0f3TUsuag9WH"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import json\n",
        "\n",
        "# Parse the JSON content\n",
        "json_object = json.loads(content)\n",
        "elements = json_object[\"elements\"]\n",
        "\n",
        "# Print each element and its description\n",
        "for element in elements:\n",
        "    print(\"Element:\", element[\"element\"])\n",
        "    print(\"Description:\", element[\"description\"])\n",
        "    print()\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6pUlXBC6g_WD",
        "outputId": "4ec87424-87c0-40d1-df40-0aeb8bf88b9f"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Element: Eiffel Tower\n",
            "Description: A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\n",
            "\n",
            "Element: Snow-covered Trees\n",
            "Description: Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\n",
            "\n",
            "Element: Snow\n",
            "Description: A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\n",
            "\n",
            "Element: Lamppost\n",
            "Description: A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Conclusion\n",
        "In this notebook, we used the Mistral Pixtral model to describe an image by sending an image URL and receiving a structured JSON response. The descriptions provided by the model offer insights into the key elements of the image.\n"
      ],
      "metadata": {
        "id": "ET9JhkczhBst"
      }
    }
  ]
}