incremental prompt engineering and model comparison

Details

File: third_party/Pixeltable/incremental_prompt_engineering_and_model_comparison.ipynb
Type: Jupyter Notebook
Use Cases: Prompt Engineering
Integrations: Pixeltable
Content

Notebook content (JSON format):
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cAujZD3XDxf4"
      },
      "source": [
        "[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/mistralai/cookbook/blob/main/third_party/Pixeltable/incremental_prompt_engineering_and_model_comparison.ipynb)&nbsp;&nbsp;\n",
        "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mistralai/cookbook/blob/main/third_party/Pixeltable/incremental_prompt_engineering_and_model_comparison.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_4GL-qmL2TaP"
      },
      "source": [
        "\n",
        "\n",
        "<br>\n",
        "<img src=\"https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png\" alt=\"Pixeltable\" width=\"30%\" />\n",
        "\n",
        "# Incremental Prompt Engineering and Model Comparison with Mistral using Pixeltable\n",
        "\n",
        "\n",
        "This notebook shows how to use Pixeltable for iterative prompt engineering and model comparison with Mistral AI models. It showcases persistent storage, incremental updates, and how to benchmark different prompts and models easily.\n",
        "\n",
        "[Pixeltable](https://github.com/pixeltable/pixeltable) is data infrastructure that provides a declarative, incremental approach for multimodal AI.\n",
        "\n",
        "**Category:** Prompt Engineering & Model Comparison"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k4afnK2UEJQo"
      },
      "source": [
        "## 1. Setup and Installation"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kD6qpz632VT0",
        "outputId": "a2699b1e-72ff-454e-9d84-0a8d2a009956"
      },
      "outputs": [],
      "source": [
        "%pip install -qU pixeltable mistralai textblob nltk"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "bfFgX-taESIQ"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import getpass\n",
        "import pixeltable as pxt\n",
        "from pixeltable.functions.mistralai import chat_completions\n",
        "from textblob import TextBlob\n",
        "import nltk\n",
        "from nltk.tokenize import word_tokenize\n",
        "from nltk.corpus import stopwords\n",
        "import re"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KI8wpVxmRBRV",
        "outputId": "5950dbec-7c49-467a-de57-f8d93ab1c5b9"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "nltk.download('punkt', quiet=True)\n",
        "nltk.download('stopwords', quiet=True)\n",
        "nltk.download('punkt_tab', quiet=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GTq3IfJ42TaS",
        "outputId": "fd1637cd-2685-46ac-cb94-55b078908804"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Mistral AI API Key:··········\n"
          ]
        }
      ],
      "source": [
        "if 'MISTRAL_API_KEY' not in os.environ:\n",
        "    os.environ['MISTRAL_API_KEY'] = getpass.getpass('Mistral AI API Key:')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-bSsPGQe2TaT"
      },
      "source": [
        "## 2. Create a Pixeltable Table and Insert Examples\n",
        "\n",
        "First, Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database. When you reset a notebook kernel or start a new Python session, you'll have access to all the data you've stored previously in Pixeltable."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "rA-yqJMi2TaT",
        "outputId": "71707d3a-8c36-4b0b-9ab5-c5aca1645094"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Creating a Pixeltable instance at: /root/.pixeltable\n",
            "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/root/.pixeltable/pgdata\n",
            "Created table `mistral_prompts`.\n",
            "Inserting rows into `mistral_prompts`: 3 rows [00:00, 391.64 rows/s]\n",
            "Inserted 3 rows with 0 errors.\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "UpdateStatus(num_rows=3, num_computed_values=9, num_excs=0, updated_cols=[], cols_with_excs=[])"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Create a table to store prompts and results\n",
        "pxt.drop_table('mistral_prompts', ignore_errors=True)\n",
        "t = pxt.create_table('mistral_prompts', {\n",
        "    'task': pxt.StringType(),\n",
        "    'system': pxt.StringType(),\n",
        "    'input_text': pxt.StringType()\n",
        "})\n",
        "\n",
        "# Insert sample data\n",
        "t.insert([\n",
        "    {'task': 'summarization',\n",
        "     'system': 'Summarize the following text:',\n",
        "     'input_text': 'Mistral AI is a French artificial intelligence (AI) research and development company that focuses on creating and applying AI technologies to various industries.'},\n",
        "    {'task': 'sentiment',\n",
        "     'system': 'Analyze the sentiment of this text:',\n",
        "     'input_text': 'I love using Mistral for my AI projects! They provide great LLMs and it is really easy to work with.'},\n",
        "    {'task': 'question_answering',\n",
        "     'system': 'Answer the following question:',\n",
        "     'input_text': 'What are the main benefits of using Mistral AI over other LLMs providers?'}\n",
        "])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8nQt1hGK2TaT"
      },
      "source": [
        "## 3. Run Mistral Inference Functions\n",
        "\n",
        "We create **computed columns** to instruct Pixeltable to run the Mistral `chat_completions` function and store the output. Because computed columns are a permanent part of the table, they will be automatically updated any time new data is added to the table. For more information, see our [tutorial](https://docs.pixeltable.com/docs/computed-columns).\n",
        "\n",
        "In this particular example we are running the `open_mistral_nemo` and `mistral_medium` models and make the output available in their respective columns."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9_HwbVp82TaU",
        "outputId": "ba3949fc-eddc-43e2-8ee0-f293cd91e910"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:06<00:00,  2.28s/ cells]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:09<00:00,  3.31s/ cells]\n",
            "Added 3 column values with 0 errors.\n"
          ]
        }
      ],
      "source": [
        "# We are referencing columns from the 'mistral_prompts' table to dynamically compose the message for the Inference API.\n",
        "msgs = [\n",
        "    {'role': 'system', 'content': t.system},\n",
        "    {'role': 'user', 'content': t.input_text}\n",
        "]\n",
        "\n",
        "# Run inference with open-mistral-nemo model\n",
        "t['open_mistral_nemo'] = chat_completions(\n",
        "    messages=msgs,\n",
        "    model='open-mistral-nemo',\n",
        "    max_tokens=300,\n",
        "    top_p=0.9,\n",
        "    temperature=0.7\n",
        ")\n",
        "\n",
        "# Run inference with mistral-medium model\n",
        "t['mistral_medium'] = chat_completions(\n",
        "    messages=msgs,\n",
        "    model='mistral-medium',\n",
        "    max_tokens=300,\n",
        "    top_p=0.9,\n",
        "    temperature=0.7\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kCklovbiG1Rl"
      },
      "source": [
        "The respective response columns have the JSON column type and we can now use JSON path expressions to extract the relevant pieces of data and make them available as additional computed columns."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wzrdn851G0Sd",
        "outputId": "9adc6716-8110-485d-d678-8e1eacfc2001"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 120.75 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 121.09 cells/s]\n",
            "Added 3 column values with 0 errors.\n"
          ]
        }
      ],
      "source": [
        "# Extract the response content as a string (by default JSON)\n",
        "t['omn_response'] = t.open_mistral_nemo.choices[0].message.content.astype(pxt.StringType())\n",
        "t['ml_response'] = t.mistral_medium.choices[0].message.content.astype(pxt.StringType())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 473
        },
        "id": "Tmow8WkU4LJI",
        "outputId": "6023752b-d225-4f14-83ab-fee6e6f32d03"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th>omn_response</th>\n",
              "      <th>ml_response</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>Mistral AI is a French company specializing in AI research and development, aiming to create and apply AI technologies across diverse industries.</td>\n",
              "      <td>Mistral AI is a French firm specializing in AI research and development, with a focus on implementing AI technologies across various industries.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>The sentiment of the text &quot;I love using Mistral for my AI projects! They provide great LLMs and it is really easy to work with.&quot; is overwhelmingly positive. Here&#x27;s a breakdown of the indicators:\n",
              "\n",
              "1. **Positive words and phrases:**\n",
              "   - &quot;love&quot; (strong positive emotion)\n",
              "   - &quot;great&quot; (implies high quality or excellence)\n",
              "   - &quot;really easy&quot; (implies simplicity and convenience)\n",
              "\n",
              "2. **No negative words or phrases:** There are no negative words or phrases that could counteract the positive sentiment.\n",
              "\n",
              "3. **Exclamation mark:** The use of an exclamation mark at the end of the sentence emphasizes the positive sentiment, making it more enthusiastic.\n",
              "\n",
              "Based on these points, the overall sentiment of the text is extremely positive. The speaker is expressing strong approval and satisfaction with using Mistral for their AI projects.</td>\n",
              "      <td>The sentiment of this text is positive. The author expresses their love for using Mistral for their AI projects and highlights the quality of the large language models (LLMs) provided by Mistral, as well as the ease of working with them.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>Using Mistral AI over other Large Language Model (LLM) providers offers several main benefits:\n",
              "\n",
              "1. **Advanced Language Models**: Mistral AI offers state-of-the-art language models like Mixtral 8x7B and Codestral, which are designed to understand and generate human-like text more effectively than many other LLMs.\n",
              "\n",
              "2. **Context Window**: Mistral AI&#x27;s models have a larger context window, allowing them to maintain context over longer sequences of text. This is particularly useful for tasks like  ......  emphasis on safety and moderation. Their models are designed to minimize harmful, biased, or offensive outputs, making them safer to use.\n",
              "\n",
              "6. **Transparency and Documentation**: Mistral AI provides clear documentation and is transparent about their models&#x27; capabilities and limitations, making it easier for developers to integrate and use their models.\n",
              "\n",
              "7. **Pricing and Accessibility**: While pricing can vary, Mistral AI aims to provide competitive pricing and offers free tiers, making their</td>\n",
              "      <td>Mistral AI is a cutting-edge company based in Paris, France, developing large language models (LLMs). While I don&#x27;t have real-time access to specific details about Mistral AI, I can share some potential benefits based on the information available on their website and general industry trends. Keep in mind that these benefits are not guaranteed and may vary depending on your specific needs and use case.\n",
              "\n",
              "1. Customization and Adaptability: Mistral AI emphasizes the importance of customization a ...... ean Perspective: As a European company, Mistral AI may have a better understanding of European data privacy regulations, such as GDPR, and cultural nuances, providing more relevant and compliant solutions for European businesses.\n",
              "4. Research and Development: Mistral AI invests heavily in research and development to stay at the forefront of AI and language processing technology. This commitment to innovation may result in more advanced and capable LLMs compared to other providers.\n",
              "5. Collabor</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "                                        omn_response  \\\n",
              "0  Mistral AI is a French company specializing in...   \n",
              "1  The sentiment of the text \"I love using Mistra...   \n",
              "2  Using Mistral AI over other Large Language Mod...   \n",
              "\n",
              "                                         ml_response  \n",
              "0  Mistral AI is a French firm specializing in AI...  \n",
              "1  The sentiment of this text is positive. The au...  \n",
              "2  Mistral AI is a cutting-edge company based in ...  "
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Display the responses\n",
        "t.select(t.omn_response, t.ml_response).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jGnjaTiZHfSF"
      },
      "source": [
        "We can see how data is computed across the different columns in our table."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 186
        },
        "id": "iL7V84hrF7H7",
        "outputId": "ad7b352a-700e-4ac6-fc0b-b2683ffcb60d"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<style type=\"text/css\">\n",
              "#T_d715f th {\n",
              "  text-align: center;\n",
              "}\n",
              "#T_d715f_row0_col0, #T_d715f_row0_col1, #T_d715f_row0_col2, #T_d715f_row1_col0, #T_d715f_row1_col1, #T_d715f_row1_col2, #T_d715f_row2_col0, #T_d715f_row2_col1, #T_d715f_row2_col2, #T_d715f_row3_col0, #T_d715f_row3_col1, #T_d715f_row3_col2, #T_d715f_row4_col0, #T_d715f_row4_col1, #T_d715f_row4_col2, #T_d715f_row5_col0, #T_d715f_row5_col1, #T_d715f_row5_col2, #T_d715f_row6_col0, #T_d715f_row6_col1, #T_d715f_row6_col2 {\n",
              "  white-space: pre-wrap;\n",
              "  text-align: left;\n",
              "}\n",
              "</style>\n",
              "<table id=\"T_d715f\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th id=\"T_d715f_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
              "      <th id=\"T_d715f_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
              "      <th id=\"T_d715f_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row0_col0\" class=\"data row0 col0\" >task</td>\n",
              "      <td id=\"T_d715f_row0_col1\" class=\"data row0 col1\" >string</td>\n",
              "      <td id=\"T_d715f_row0_col2\" class=\"data row0 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row1_col0\" class=\"data row1 col0\" >system</td>\n",
              "      <td id=\"T_d715f_row1_col1\" class=\"data row1 col1\" >string</td>\n",
              "      <td id=\"T_d715f_row1_col2\" class=\"data row1 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row2_col0\" class=\"data row2 col0\" >input_text</td>\n",
              "      <td id=\"T_d715f_row2_col1\" class=\"data row2 col1\" >string</td>\n",
              "      <td id=\"T_d715f_row2_col2\" class=\"data row2 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row3_col0\" class=\"data row3 col0\" >open_mistral_nemo</td>\n",
              "      <td id=\"T_d715f_row3_col1\" class=\"data row3 col1\" >json</td>\n",
              "      <td id=\"T_d715f_row3_col2\" class=\"data row3 col2\" >chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='open-mistral-nemo', temperature=0.7, max_tokens=300)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row4_col0\" class=\"data row4 col0\" >mistral_medium</td>\n",
              "      <td id=\"T_d715f_row4_col1\" class=\"data row4 col1\" >json</td>\n",
              "      <td id=\"T_d715f_row4_col2\" class=\"data row4 col2\" >chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='mistral-medium', temperature=0.7, max_tokens=300)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row5_col0\" class=\"data row5 col0\" >omn_response</td>\n",
              "      <td id=\"T_d715f_row5_col1\" class=\"data row5 col1\" >string</td>\n",
              "      <td id=\"T_d715f_row5_col2\" class=\"data row5 col2\" >open_mistral_nemo.choices[0].message.content.astype(string)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d715f_row6_col0\" class=\"data row6 col0\" >ml_response</td>\n",
              "      <td id=\"T_d715f_row6_col1\" class=\"data row6 col1\" >string</td>\n",
              "      <td id=\"T_d715f_row6_col2\" class=\"data row6 col2\" >mistral_medium.choices[0].message.content.astype(string)</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n"
            ],
            "text/plain": [
              "table 'mistral_prompts'\n",
              "\n",
              "      Column Name   Type                                                                                                                                                             Computed With\n",
              "             task string                                                                                                                                                                          \n",
              "           system string                                                                                                                                                                          \n",
              "       input_text string                                                                                                                                                                          \n",
              "open_mistral_nemo   json chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='open-mistral-nemo', temperature=0.7, max_tokens=300)\n",
              "   mistral_medium   json    chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='mistral-medium', temperature=0.7, max_tokens=300)\n",
              "     omn_response string                                                                                                               open_mistral_nemo.choices[0].message.content.astype(string)\n",
              "      ml_response string                                                                                                                  mistral_medium.choices[0].message.content.astype(string)"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "t"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u2XZdda82TaU"
      },
      "source": [
        "## 4. Leveraging User-Defined Functions (UDFs) for Further Analysis\n",
        "\n",
        "UDFs allow you to extend Pixeltable with custom Python code, enabling you to integrate any computation or analysis into your workflow. See our [tutorial](https://docs.pixeltable.com/docs/user-defined-functions-udfs) regarding UDFs to learn more.\n",
        "\n",
        "We define three UDFs to compute two metrics (sentiment and readability scores) that give us insights into the quality of the LLM outputs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "WOlBJFIm2bdq"
      },
      "outputs": [],
      "source": [
        "@pxt.udf\n",
        "def get_sentiment_score(text: str) -> float:\n",
        "    return TextBlob(text).sentiment.polarity\n",
        "\n",
        "@pxt.udf\n",
        "def extract_keywords(text: str, num_keywords: int = 5) -> list:\n",
        "    stop_words = set(stopwords.words('english'))\n",
        "    words = word_tokenize(text.lower())\n",
        "    keywords = [word for word in words if word.isalnum() and word not in stop_words]\n",
        "    return sorted(set(keywords), key=keywords.count, reverse=True)[:num_keywords]\n",
        "\n",
        "@pxt.udf\n",
        "def calculate_readability(text: str) -> float:\n",
        "    words = len(re.findall(r'\\w+', text))\n",
        "    sentences = len(re.findall(r'\\w+[.!?]', text)) or 1\n",
        "    average_words_per_sentence = words / sentences\n",
        "    return 206.835 - 1.015 * average_words_per_sentence"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-RrWAUP5M58Z"
      },
      "source": [
        "For each model we want to compare we are adding the metrics as new computed columns, using the UDFs we created."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "aa6Ed4WSL1b4",
        "outputId": "55974ce6-7ee7-432d-d487-b912a2e8c83a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 23.99 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 22.87 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 68.75 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 93.24 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 46.49 cells/s]\n",
            "Added 3 column values with 0 errors.\n",
            "Computing cells: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 61.91 cells/s]\n",
            "Added 3 column values with 0 errors.\n"
          ]
        }
      ],
      "source": [
        "t['large_sentiment_score'] = get_sentiment_score(t.ml_response)\n",
        "t['large_keywords'] = extract_keywords(t.ml_response)\n",
        "t['large_readability_score'] = calculate_readability(t.ml_response)\n",
        "\n",
        "t['open_sentiment_score'] = get_sentiment_score(t.omn_response)\n",
        "t['open_keywords'] = extract_keywords(t.omn_response)\n",
        "t['open_readability_score'] = calculate_readability(t.omn_response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qwsQOS6BHNHs"
      },
      "source": [
        " Once a UDF is defined and used in a computed column, Pixeltable automatically applies it to all relevant rows.\n",
        "\n",
        " You don't need to write loops or worry about applying the function to each row manually."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 361
        },
        "id": "0CARZAHu9BQ0",
        "outputId": "2b92e7dc-0681-42d6-8e2d-c94c889e6a76"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th>task</th>\n",
              "      <th>system</th>\n",
              "      <th>input_text</th>\n",
              "      <th>open_mistral_nemo</th>\n",
              "      <th>mistral_medium</th>\n",
              "      <th>omn_response</th>\n",
              "      <th>ml_response</th>\n",
              "      <th>large_sentiment_score</th>\n",
              "      <th>large_keywords</th>\n",
              "      <th>large_readability_score</th>\n",
              "      <th>open_sentiment_score</th>\n",
              "      <th>open_keywords</th>\n",
              "      <th>open_readability_score</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>summarization</td>\n",
              "      <td>Summarize the following text:</td>\n",
              "      <td>Mistral AI is a French artificial intelligence (AI) research and development company that focuses on creating and applying AI technologies to various industries.</td>\n",
              "      <td>{&quot;id&quot;: &quot;3ba3f2ae28094f5fb29240423f26cd1b&quot;, &quot;model&quot;: &quot;open-mistral-nemo&quot;, &quot;usage&quot;: {&quot;total_tokens&quot;: 65, &quot;prompt_tokens&quot;: 38, &quot;completion_tokens&quot;: 27}, &quot;object&quot;: &quot;chat.completion&quot;, &quot;choices&quot;: [{&quot;index&quot;: 0, &quot;message&quot;: {&quot;role&quot;: &quot;assistant&quot;, &quot;prefix&quot;: false, &quot;content&quot;: &quot;Mistral AI is a French company specializing in AI research and development, aiming to create and apply AI technologies across diverse industries.&quot;, &quot;tool_calls&quot;: null}, &quot;finish_reason&quot;: &quot;stop&quot;}], &quot;created&quot;: 1728587238}</td>\n",
              "      <td>{&quot;id&quot;: &quot;ecc5e83f1d364b36b6cf342752b44a45&quot;, &quot;model&quot;: &quot;mistral-medium&quot;, &quot;usage&quot;: {&quot;total_tokens&quot;: 82, &quot;prompt_tokens&quot;: 50, &quot;completion_tokens&quot;: 32}, &quot;object&quot;: &quot;chat.completion&quot;, &quot;choices&quot;: [{&quot;index&quot;: 0, &quot;message&quot;: {&quot;role&quot;: &quot;assistant&quot;, &quot;prefix&quot;: false, &quot;content&quot;: &quot;Mistral AI is a French firm specializing in AI research and development, with a focus on implementing AI technologies across various industries.&quot;, &quot;tool_calls&quot;: null}, &quot;finish_reason&quot;: &quot;stop&quot;}], &quot;created&quot;: 1728587244}</td>\n",
              "      <td>Mistral AI is a French company specializing in AI research and development, aiming to create and apply AI technologies across diverse industries.</td>\n",
              "      <td>Mistral AI is a French firm specializing in AI research and development, with a focus on implementing AI technologies across various industries.</td>\n",
              "      <td>-0.067</td>\n",
              "      <td>[&quot;ai&quot;, &quot;implementing&quot;, &quot;industries&quot;, &quot;french&quot;, &quot;mistral&quot;]</td>\n",
              "      <td>184.505</td>\n",
              "      <td>0.</td>\n",
              "      <td>[&quot;ai&quot;, &quot;company&quot;, &quot;industries&quot;, &quot;aiming&quot;, &quot;french&quot;]</td>\n",
              "      <td>184.505</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "            task                         system  \\\n",
              "0  summarization  Summarize the following text:   \n",
              "\n",
              "                                          input_text  \\\n",
              "0  Mistral AI is a French artificial intelligence...   \n",
              "\n",
              "                                   open_mistral_nemo  \\\n",
              "0  {'id': '3ba3f2ae28094f5fb29240423f26cd1b', 'mo...   \n",
              "\n",
              "                                      mistral_medium  \\\n",
              "0  {'id': 'ecc5e83f1d364b36b6cf342752b44a45', 'mo...   \n",
              "\n",
              "                                        omn_response  \\\n",
              "0  Mistral AI is a French company specializing in...   \n",
              "\n",
              "                                         ml_response  large_sentiment_score  \\\n",
              "0  Mistral AI is a French firm specializing in AI...              -0.066667   \n",
              "\n",
              "                                    large_keywords  large_readability_score  \\\n",
              "0  [ai, implementing, industries, french, mistral]                  184.505   \n",
              "\n",
              "   open_sentiment_score                              open_keywords  \\\n",
              "0                   0.0  [ai, company, industries, aiming, french]   \n",
              "\n",
              "   open_readability_score  \n",
              "0                 184.505  "
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "t.head(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xlUAq2QQ2TaV"
      },
      "source": [
        "## 5. Experiment with Different Prompts\n",
        "\n",
        "We are inserting an additional two rows, and Pixeltable will automatically populate the computed columns."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SMMM-ISR2TaV",
        "outputId": "0be5136b-23ec-487c-b846-f2960fba9b81"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Computing cells: 100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.17 cells/s]\n",
            "Inserting rows into `mistral_prompts`: 2 rows [00:00, 95.14 rows/s]\n",
            "Computing cells: 100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.12 cells/s]\n",
            "Inserted 2 rows with 0 errors.\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "UpdateStatus(num_rows=2, num_computed_values=30, num_excs=0, updated_cols=[], cols_with_excs=[])"
            ]
          },
          "execution_count": 13,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "t.insert([\n",
        "    {\n",
        "        'task': 'summarization',\n",
        "        'system': 'Provide a concise summary of the following text in one sentence:',\n",
        "        'input_text': 'Mistral AI is a company that develops AI models and has been in the news for its partnerships and latest models.'\n",
        "    },\n",
        "    {\n",
        "        'task': 'translation',\n",
        "        'system': 'Translate the following English text to French:',\n",
        "        'input_text': 'Hello, how are you today?'\n",
        "    }\n",
        "])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OMgvq5uhBWcE"
      },
      "source": [
        "Often you want to select only certain rows and/or certain columns in a table. You can do this with `where()`.\n",
        "\n",
        "You can learn more about the available table and data operations [here](https://docs.pixeltable.com/docs/tables-and-data-operations)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 181
        },
        "id": "LEGZTsL6-2pQ",
        "outputId": "acdba826-a04b-477e-deae-8f3eddc7be0c"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th>task</th>\n",
              "      <th>omn_response</th>\n",
              "      <th>ml_response</th>\n",
              "      <th>large_readability_score</th>\n",
              "      <th>open_readability_score</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>summarization</td>\n",
              "      <td>Mistral AI is a French company specializing in AI research and development, aiming to create and apply AI technologies across diverse industries.</td>\n",
              "      <td>Mistral AI is a French firm specializing in AI research and development, with a focus on implementing AI technologies across various industries.</td>\n",
              "      <td>184.505</td>\n",
              "      <td>184.505</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>summarization</td>\n",
              "      <td>Mistral AI is a prominent developer of AI models, known for its strategic partnerships and recent model releases.</td>\n",
              "      <td>Mistral AI is a company known for creating advanced AI models and has recently gained attention for its collaborations and new model releases.</td>\n",
              "      <td>183.49</td>\n",
              "      <td>188.565</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "            task                                       omn_response  \\\n",
              "0  summarization  Mistral AI is a French company specializing in...   \n",
              "1  summarization  Mistral AI is a prominent developer of AI mode...   \n",
              "\n",
              "                                         ml_response  large_readability_score  \\\n",
              "0  Mistral AI is a French firm specializing in AI...                  184.505   \n",
              "1  Mistral AI is a company known for creating adv...                  183.490   \n",
              "\n",
              "   open_readability_score  \n",
              "0                 184.505  \n",
              "1                 188.565  "
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "t.select(t.task, t.omn_response, t.ml_response, t.large_readability_score, t.open_readability_score).where(t.task == 'summarization').collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NA2iegnVOiJw"
      },
      "source": [
        "Pixeltable's schema provides a holistic view of data ingestion, inference API calls, and metric computation, reflecting your entire workflow."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 312
        },
        "id": "88loCKBtFM2R",
        "outputId": "0e2016e0-2d95-4fab-ac73-61df704ebf3d"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<style type=\"text/css\">\n",
              "#T_d5077 th {\n",
              "  text-align: center;\n",
              "}\n",
              "#T_d5077_row0_col0, #T_d5077_row0_col1, #T_d5077_row0_col2, #T_d5077_row1_col0, #T_d5077_row1_col1, #T_d5077_row1_col2, #T_d5077_row2_col0, #T_d5077_row2_col1, #T_d5077_row2_col2, #T_d5077_row3_col0, #T_d5077_row3_col1, #T_d5077_row3_col2, #T_d5077_row4_col0, #T_d5077_row4_col1, #T_d5077_row4_col2, #T_d5077_row5_col0, #T_d5077_row5_col1, #T_d5077_row5_col2, #T_d5077_row6_col0, #T_d5077_row6_col1, #T_d5077_row6_col2, #T_d5077_row7_col0, #T_d5077_row7_col1, #T_d5077_row7_col2, #T_d5077_row8_col0, #T_d5077_row8_col1, #T_d5077_row8_col2, #T_d5077_row9_col0, #T_d5077_row9_col1, #T_d5077_row9_col2, #T_d5077_row10_col0, #T_d5077_row10_col1, #T_d5077_row10_col2, #T_d5077_row11_col0, #T_d5077_row11_col1, #T_d5077_row11_col2, #T_d5077_row12_col0, #T_d5077_row12_col1, #T_d5077_row12_col2 {\n",
              "  white-space: pre-wrap;\n",
              "  text-align: left;\n",
              "}\n",
              "</style>\n",
              "<table id=\"T_d5077\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th id=\"T_d5077_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
              "      <th id=\"T_d5077_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
              "      <th id=\"T_d5077_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row0_col0\" class=\"data row0 col0\" >task</td>\n",
              "      <td id=\"T_d5077_row0_col1\" class=\"data row0 col1\" >string</td>\n",
              "      <td id=\"T_d5077_row0_col2\" class=\"data row0 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row1_col0\" class=\"data row1 col0\" >system</td>\n",
              "      <td id=\"T_d5077_row1_col1\" class=\"data row1 col1\" >string</td>\n",
              "      <td id=\"T_d5077_row1_col2\" class=\"data row1 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row2_col0\" class=\"data row2 col0\" >input_text</td>\n",
              "      <td id=\"T_d5077_row2_col1\" class=\"data row2 col1\" >string</td>\n",
              "      <td id=\"T_d5077_row2_col2\" class=\"data row2 col2\" ></td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row3_col0\" class=\"data row3 col0\" >open_mistral_nemo</td>\n",
              "      <td id=\"T_d5077_row3_col1\" class=\"data row3 col1\" >json</td>\n",
              "      <td id=\"T_d5077_row3_col2\" class=\"data row3 col2\" >chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='open-mistral-nemo', temperature=0.7, max_tokens=300)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row4_col0\" class=\"data row4 col0\" >mistral_medium</td>\n",
              "      <td id=\"T_d5077_row4_col1\" class=\"data row4 col1\" >json</td>\n",
              "      <td id=\"T_d5077_row4_col2\" class=\"data row4 col2\" >chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='mistral-medium', temperature=0.7, max_tokens=300)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row5_col0\" class=\"data row5 col0\" >omn_response</td>\n",
              "      <td id=\"T_d5077_row5_col1\" class=\"data row5 col1\" >string</td>\n",
              "      <td id=\"T_d5077_row5_col2\" class=\"data row5 col2\" >open_mistral_nemo.choices[0].message.content.astype(string)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row6_col0\" class=\"data row6 col0\" >ml_response</td>\n",
              "      <td id=\"T_d5077_row6_col1\" class=\"data row6 col1\" >string</td>\n",
              "      <td id=\"T_d5077_row6_col2\" class=\"data row6 col2\" >mistral_medium.choices[0].message.content.astype(string)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row7_col0\" class=\"data row7 col0\" >large_sentiment_score</td>\n",
              "      <td id=\"T_d5077_row7_col1\" class=\"data row7 col1\" >float</td>\n",
              "      <td id=\"T_d5077_row7_col2\" class=\"data row7 col2\" >get_sentiment_score(ml_response)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row8_col0\" class=\"data row8 col0\" >large_keywords</td>\n",
              "      <td id=\"T_d5077_row8_col1\" class=\"data row8 col1\" >json</td>\n",
              "      <td id=\"T_d5077_row8_col2\" class=\"data row8 col2\" >extract_keywords(ml_response)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row9_col0\" class=\"data row9 col0\" >large_readability_score</td>\n",
              "      <td id=\"T_d5077_row9_col1\" class=\"data row9 col1\" >float</td>\n",
              "      <td id=\"T_d5077_row9_col2\" class=\"data row9 col2\" >calculate_readability(ml_response)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row10_col0\" class=\"data row10 col0\" >open_sentiment_score</td>\n",
              "      <td id=\"T_d5077_row10_col1\" class=\"data row10 col1\" >float</td>\n",
              "      <td id=\"T_d5077_row10_col2\" class=\"data row10 col2\" >get_sentiment_score(omn_response)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row11_col0\" class=\"data row11 col0\" >open_keywords</td>\n",
              "      <td id=\"T_d5077_row11_col1\" class=\"data row11 col1\" >json</td>\n",
              "      <td id=\"T_d5077_row11_col2\" class=\"data row11 col2\" >extract_keywords(omn_response)</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td id=\"T_d5077_row12_col0\" class=\"data row12 col0\" >open_readability_score</td>\n",
              "      <td id=\"T_d5077_row12_col1\" class=\"data row12 col1\" >float</td>\n",
              "      <td id=\"T_d5077_row12_col2\" class=\"data row12 col2\" >calculate_readability(omn_response)</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n"
            ],
            "text/plain": [
              "table 'mistral_prompts'\n",
              "\n",
              "            Column Name   Type                                                                                                                                                             Computed With\n",
              "                   task string                                                                                                                                                                          \n",
              "                 system string                                                                                                                                                                          \n",
              "             input_text string                                                                                                                                                                          \n",
              "      open_mistral_nemo   json chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='open-mistral-nemo', temperature=0.7, max_tokens=300)\n",
              "         mistral_medium   json    chat_completions([{'role': 'system', 'content': system}, {'role': 'user', 'content': input_text}], top_p=0.9, model='mistral-medium', temperature=0.7, max_tokens=300)\n",
              "           omn_response string                                                                                                               open_mistral_nemo.choices[0].message.content.astype(string)\n",
              "            ml_response string                                                                                                                  mistral_medium.choices[0].message.content.astype(string)\n",
              "  large_sentiment_score  float                                                                                                                                          get_sentiment_score(ml_response)\n",
              "         large_keywords   json                                                                                                                                             extract_keywords(ml_response)\n",
              "large_readability_score  float                                                                                                                                        calculate_readability(ml_response)\n",
              "   open_sentiment_score  float                                                                                                                                         get_sentiment_score(omn_response)\n",
              "          open_keywords   json                                                                                                                                            extract_keywords(omn_response)\n",
              " open_readability_score  float                                                                                                                                       calculate_readability(omn_response)"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "t"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}