← Back to Cookbook
image description prompting pixtral
Details
File: mistral/image_understanding/image_description_prompting_pixtral.ipynb
Type: Jupyter Notebook
Use Cases: Vision, Image understanding, Data Extraction
Content
Notebook content (JSON format):
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Image Description Extraction using Mistral's Pixtral API"
],
"metadata": {
"id": "paGlb8xLgl6c"
}
},
{
"cell_type": "markdown",
"source": [
"# Image Description Extraction using Mistral's Pixtral API\n",
"\n",
"In this notebook, we'll use the `Mistral` API to extract structured image descriptions in JSON format using the `Pixtral-12b-2409` model. We'll send an image URL and prompt the model to return key elements with descriptions.\n",
"\n",
"## Prerequisites\n",
"Make sure you have an API key for the Mistral AI platform. We'll also show you how to load it from environment variables."
],
"metadata": {
"id": "IEzYEp52gowH"
}
},
{
"cell_type": "code",
"source": [
"# Install the Mistral Python SDK\n",
"!pip install mistralai"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "s3t8USmmgvHC",
"outputId": "2f580b3e-288d-47ee-f104-bffdd662c61a"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting mistralai\n",
" Downloading mistralai-1.1.0-py3-none-any.whl.metadata (23 kB)\n",
"Requirement already satisfied: eval-type-backport<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (0.2.0)\n",
"Collecting httpx<0.28.0,>=0.27.0 (from mistralai)\n",
" Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)\n",
"Collecting jsonpath-python<2.0.0,>=1.0.6 (from mistralai)\n",
" Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)\n",
"Requirement already satisfied: pydantic<3.0.0,>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.9.2)\n",
"Requirement already satisfied: python-dateutil==2.8.2 in /usr/local/lib/python3.10/dist-packages (from mistralai) (2.8.2)\n",
"Collecting typing-inspect<0.10.0,>=0.9.0 (from mistralai)\n",
" Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil==2.8.2->mistralai) (1.16.0)\n",
"Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.7.1)\n",
"Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (2024.8.30)\n",
"Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->mistralai)\n",
" Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)\n",
"Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (3.10)\n",
"Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx<0.28.0,>=0.27.0->mistralai) (1.3.1)\n",
"Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->mistralai)\n",
" Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (2.23.4)\n",
"Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.9.0->mistralai) (4.12.2)\n",
"Collecting mypy-extensions>=0.3.0 (from typing-inspect<0.10.0,>=0.9.0->mistralai)\n",
" Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\n",
"Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<0.28.0,>=0.27.0->mistralai) (1.2.2)\n",
"Downloading mistralai-1.1.0-py3-none-any.whl (229 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m229.7/229.7 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.0/78.0 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hDownloading jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)\n",
"Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
"Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
"Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: mypy-extensions, jsonpath-python, h11, typing-inspect, httpcore, httpx, mistralai\n",
"Successfully installed h11-0.14.0 httpcore-1.0.6 httpx-0.27.2 jsonpath-python-1.0.6 mistralai-1.1.0 mypy-extensions-1.0.0 typing-inspect-0.9.0\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Setup\n",
"We'll load the Mistral API key from environment variables and initialize the client. Make sure your API key is saved in your environment variables as `MISTRAL_API_KEY`.\n"
],
"metadata": {
"id": "7X4dfUvAgzjA"
}
},
{
"cell_type": "code",
"source": [
"%env MISTRAL_API_KEY="
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pLD2umN1hpDh",
"outputId": "14c5a452-72e6-4695-eacd-14911f453712"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"env: MISTRAL_API_KEY=\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import os\n",
"from mistralai import Mistral\n",
"\n",
"# Load Mistral API key from environment variables\n",
"api_key = os.environ[\"MISTRAL_API_KEY\"]\n",
"\n",
"# Model specification\n",
"model = \"pixtral-12b-2409\"\n",
"\n",
"# Initialize the Mistral client\n",
"client = Mistral(api_key=api_key)\n"
],
"metadata": {
"id": "Q57RScgEg1gS"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Sending Image URL for Description\n",
"We'll prompt the model to describe the image by providing an image URL. The response will be returned in a structured JSON format with the key elements described.\n"
],
"metadata": {
"id": "mJuIiO9Ag5Sr"
}
},
{
"cell_type": "code",
"source": [
"# Define the messages for the chat API\n",
"messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"Return the answer in a JSON object with the next structure: \"\n",
" \"{\\\"elements\\\": [{\\\"element\\\": \\\"some name of element1\\\", \"\n",
" \"\\\"description\\\": \\\"some description of element 1\\\"}, \"\n",
" \"{\\\"element\\\": \\\"some name of element2\\\", \\\"description\\\": \"\n",
" \"\\\"some description of element 2\\\"}]}\"\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"Describe the image\"\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": \"https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg\"\n",
" }\n",
" ]\n",
" }\n",
"]\n",
"\n",
"# Call the Mistral API to complete the chat\n",
"chat_response = client.chat.complete(\n",
" model=model,\n",
" messages=messages,\n",
" response_format={\n",
" \"type\": \"json_object\",\n",
" }\n",
")\n",
"\n",
"# Get the content of the response\n",
"content = chat_response.choices[0].message.content\n",
"\n",
"# Output the raw JSON response\n",
"print(content)\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KILew1ucg79F",
"outputId": "f3696e0b-7c52-4da0-f0f9-e3759614c016"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" {\n",
" \"elements\": [\n",
" {\n",
" \"element\": \"Eiffel Tower\",\n",
" \"description\": \"A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\"\n",
" },\n",
" {\n",
" \"element\": \"Snow-covered Trees\",\n",
" \"description\": \"Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\"\n",
" },\n",
" {\n",
" \"element\": \"Snow\",\n",
" \"description\": \"A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\"\n",
" },\n",
" {\n",
" \"element\": \"Lamppost\",\n",
" \"description\": \"A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\"\n",
" }\n",
" ]\n",
"}\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Parsing the JSON Response\n",
"We'll now parse the JSON response from the API and print the elements and their corresponding descriptions.\n"
],
"metadata": {
"id": "0f3TUsuag9WH"
}
},
{
"cell_type": "code",
"source": [
"import json\n",
"\n",
"# Parse the JSON content\n",
"json_object = json.loads(content)\n",
"elements = json_object[\"elements\"]\n",
"\n",
"# Print each element and its description\n",
"for element in elements:\n",
" print(\"Element:\", element[\"element\"])\n",
" print(\"Description:\", element[\"description\"])\n",
" print()\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6pUlXBC6g_WD",
"outputId": "4ec87424-87c0-40d1-df40-0aeb8bf88b9f"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Element: Eiffel Tower\n",
"Description: A iconic wrought-iron lattice tower located in Paris, France, standing tall amidst a snowy landscape.\n",
"\n",
"Element: Snow-covered Trees\n",
"Description: Trees surrounding the Eiffel Tower, their branches laden with fresh snow, creating a serene and picturesque winter scene.\n",
"\n",
"Element: Snow\n",
"Description: A blanket of snow covering the ground, trees, and other structures, giving the scene a tranquil and chilly atmosphere.\n",
"\n",
"Element: Lamppost\n",
"Description: A traditional lamppost located in the foreground, partially covered in snow, adding to the winter ambiance.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Conclusion\n",
"In this notebook, we used the Mistral Pixtral model to describe an image by sending an image URL and receiving a structured JSON response. The descriptions provided by the model offer insights into the key elements of the image.\n"
],
"metadata": {
"id": "ET9JhkczhBst"
}
}
]
}