← Back to Cookbook
image description prompting pixtral
Details
File: mistral/image_understanding/image_description_prompting_pixtral.ipynb
Type: Jupyter Notebook
Use Cases: Vision, Image understanding, Data Extraction
Content
Notebook content (JSON format):
{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Image Description Extraction using Mistral's Pixtral API" ], "metadata": { "id": "paGlb8xLgl6c" } }, { "cell_type": "markdown", "source": [ "# Image Description Extraction using Mistral's Pixtral API\n", "\n", "In this notebook, we'll use the `Mistral` API to extract structured image descriptions in JSON format using the `Pixtral-12b-2409` model. We'll send an image URL and prompt the model to return key elements with descriptions.\n", "\n", "## Prerequisites\n", "Make sure you have an API key for the Mistral AI platform. We'll also show you how to load it from environment variables." ], "metadata": { "id": "IEzYEp52gowH" } }, { "cell_type": "code", "source": [ "# Install the Mistral Python SDK\n", "!pip install mistralai" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "s3t8USmmgvHC", "outputId": "2f580b3e-288d-47ee-f104-bffdd662c61a" }, "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Collecting mistralai\n", " Downloading mistralai-1.1.0-py3-none-any.whl.metadata (23 kB)\n", "Requirement already satisfied: eval-type-backport<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (0.2.0)\n", "Collecting httpx<0.28.0,>=0.27.0 (from mistralai)\n", " Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)\n", "Collecting jsonpath-python<2.0.0,>=1.0.6 (from mistralai)\n", " Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.9.2)\n", "Requirement already satisfied: python-dateutil==2.8.2 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.8.2)\n", "Collecting typing-inspect<0.10.0,>=0.9.0 (from mistralai)\n", " Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil==2.8.2->mistralai) (1.16.0)\n", "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.7.1)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (2024.8.30)\n", "Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->mistralai)\n", " Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)\n", "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.10)\n", "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (1.3.1)\n", "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->mistralai)\n", " Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (2.23.4)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (4.12.2)\n", "Collecting mypy-extensions>=0.3.0 (from typing-inspect<0.10.0,>=0.9.0->mistralai)\n", " Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\n", "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<0.28.0,>=0.27.0->mistralai) (1.2.2)\n", "Downloading mistralai-1.1.0-py3-none-any.whl (229 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m229.7/229.7 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.0/78.0 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)\n", "Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n", "Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n", "Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hInstalling collected packages: mypy-extensions, jsonpath-python, h11, typing-inspect, httpcore, httpx, mistralai\n", "Successfully installed h11-0.14.0 httpcore-1.0.6 httpx-0.27.2 jsonpath-python-1.0.6 mistralai-1.1.0 mypy-extensions-1.0.0 typing-inspect-0.9.0\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Setup\n", "We'll load the Mistral API key from environment variables and initialize the client. Make sure your API key is saved in your environment variables as `MISTRAL_API_KEY`.\n" ], "metadata": { "id": "7X4dfUvAgzjA" } }, { "cell_type": "code", "source": [ "%env MISTRAL_API_KEY=" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pLD2umN1hpDh", "outputId": "14c5a452-72e6-4695-eacd-14911f453712" }, "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "env: MISTRAL_API_KEY=\n" ] } ] }, { "cell_type": "code", "source": [ "import os\n", "from mistralai import Mistral\n", "\n", "# Load Mistral API key from environment variables\n", "api_key = os.environ[\"MISTRAL_API_KEY\"]\n", "\n", "# Model specification\n", "model = \"pixtral-12b-2409\"\n", "\n", "# Initialize the Mistral client\n", "client = Mistral(api_key=api_key)\n" ], "metadata": { "id": "Q57RScgEg1gS" }, "execution_count": 3, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Sending Image URL for Description\n", "We'll prompt the model to describe the image by providing an image URL. The response will be returned in a structured JSON format with the key elements described.\n" ], "metadata": { "id": "mJuIiO9Ag5Sr" } }, { "cell_type": "code", "source": [ "# Define the messages for the chat API\n", "messages = [\n", " {\n", " \"role\": \"system\",\n", " \"content\": \"Return the answer in a JSON object with the next structure: \"\n", " \"{\\\"elements\\\": [{\\\"element\\\": \\\"some name of element1\\\", \"\n", " \"\\\"description\\\": \\\"some description of element 1\\\"}, \"\n", " \"{\\\"element\\\": \\\"some name of element2\\\", \\\"description\\\": \"\n", " \"\\\"some description of element 2\\\"}]}\"\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Describe the image\"\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": [\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": \"https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg\"\n", " }\n", " ]\n", " }\n", "]\n", "\n", "# Call the Mistral API to complete the chat\n", "chat_response = client.chat.complete(\n", " model=model,\n", " messages=messages,\n", " response_format={\n", " \"type\": \"json_object\",\n", " }\n", ")\n", "\n", "# Get the content of the response\n", "content = chat_response.choices[0].message.content\n", "\n", "# Output the raw JSON response\n", "print(content)\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KILew1ucg79F", "outputId": "f3696e0b-7c52-4da0-f0f9-e3759614c016" }, "execution_count": 4, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " {\n", " \"elements\": [\n", " {\n", " \"element\": \"Eiffel Tower\",\n", " \"description\": \"A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\"\n", " },\n", " {\n", " \"element\": \"Snow-covered Trees\",\n", " \"description\": \"Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\"\n", " },\n", " {\n", " \"element\": \"Snow\",\n", " \"description\": \"A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\"\n", " },\n", " {\n", " \"element\": \"Lamppost\",\n", " \"description\": \"A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\"\n", " }\n", " ]\n", "}\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Parsing the JSON Response\n", "We'll now parse the JSON response from the API and print the elements and their corresponding descriptions.\n" ], "metadata": { "id": "0f3TUsuag9WH" } }, { "cell_type": "code", "source": [ "import json\n", "\n", "# Parse the JSON content\n", "json_object = json.loads(content)\n", "elements = json_object[\"elements\"]\n", "\n", "# Print each element and its description\n", "for element in elements:\n", " print(\"Element:\", element[\"element\"])\n", " print(\"Description:\", element[\"description\"])\n", " print()\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6pUlXBC6g_WD", "outputId": "4ec87424-87c0-40d1-df40-0aeb8bf88b9f" }, "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Element: Eiffel Tower\n", "Description: A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\n", "\n", "Element: Snow-covered Trees\n", "Description: Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\n", "\n", "Element: Snow\n", "Description: A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\n", "\n", "Element: Lamppost\n", "Description: A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\n", "\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Conclusion\n", "In this notebook, we used the Mistral Pixtral model to describe an image by sending an image URL and receiving a structured JSON response. The descriptions provided by the model offer insights into the key elements of the image.\n" ], "metadata": { "id": "ET9JhkczhBst" } } ] }