← Back to Cookbook
MistralOCR
Details
File: third_party/gradio/MistralOCR.md
Type: Markdown Guide
Use Cases: OCR
Integrations: Gradio
Content
Markdown content:
# Building Gradio app with Mistral OCR to extract text and images from PDFs This cookbook provides a step-by-step guide to setting up and using Mistral OCR with Gradio. The application allows you to extract text and images from PDFs and images using Mistral's OCR capabilities. ## Prerequisites Before you begin, ensure you have the following: * Python installed on your system. * An API key from Mistral AI. * Necessary Python packages installed. ## Step 1: Install Required Packages First, install the required Python packages. You can do this using pip: ```bash pip install gradio requests mistralai ``` ## Step 2: Set Up Environment Variables You need to set up your Mistral API key as an environment variable. You can do this in your terminal or add it to your script. You can create an API key on our [Platforme](https://console.mistral.ai/api-keys/). ```python import os os.environ["MISTRAL_API_KEY"] = "your_mistral_api_key_here" ``` ## Step 3: Import Libraries Import the necessary libraries in your Python script. ```python import gradio as gr import os import base64 import requests from mistralai import Mistral ``` ## Step 4: Initialize Mistral Client Initialize the Mistral client using your API key. ```python api_key = os.environ["MISTRAL_API_KEY"] client = Mistral(api_key=api_key) ``` ## Step 5: Define Helper Functions ### Encode Image to Base64 This function encodes an image to a base64 string. It will be required to provide local images to the service. ```python def encode_image(image_path): """Encode the image to base64.""" try: with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') except FileNotFoundError: return "Error: The file was not found." except Exception as e: return f"Error: {e}" ``` ### Replace Images in Markdown This function replaces image placeholders in markdown with base64-encoded images. Mistral OCR is capable of outputting interleaved text and images; this function will replace placeholders to render with Gradio. ```python def replace_images_in_markdown(markdown_str: str, images_dict: dict) -> str: for img_name, base64_str in images_dict.items(): markdown_str = markdown_str.replace(f"", f"") return markdown_str ``` ### Get Combined Markdown This function combines the markdown from all pages of the OCR response. It will output a ready-to-be-rendered version and a raw markdown output without the images. ```python def get_combined_markdown(ocr_response) -> tuple: markdowns = [] raw_markdowns = [] for page in ocr_response.pages: image_data = {} for img in page.images: image_data[img.id] = img.image_base64 markdowns.append(replace_images_in_markdown(page.markdown, image_data)) raw_markdowns.append(page.markdown) return "\n\n".join(markdowns), "\n\n".join(raw_markdowns) ``` ### Fetch Content Type This function fetches the content type of a URL. The objective is to detect if we are handling PDF files or image files. ```python def get_content_type(url): """Fetch the content type of the URL.""" try: response = requests.head(url) return response.headers.get('Content-Type') except Exception as e: return f"Error fetching content type: {e}" ``` ## Step 6: Define OCR Functions ### Perform OCR on File To perform OCR on local PDF files, it is required to upload them first to the Platforme and get a signed URL that will be used for OCR tasks. ```python def perform_ocr_file(file, ocr_method="Mistral OCR"): if ocr_method == "Mistral OCR": if file.name.endswith('.pdf'): uploaded_pdf = client.files.upload( file={ "file_name": file.name, "content": open(file.name, "rb"), }, purpose="ocr" ) signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id) ocr_response = client.ocr.process( model="mistral-ocr-latest", document={ "type": "document_url", "document_url": signed_url.url, }, include_image_base64=True ) client.files.delete(file_id=uploaded_pdf.id) elif file.name.endswith(('.png', '.jpg', '.jpeg')): base64_image = encode_image(file.name) ocr_response = client.ocr.process( model="mistral-ocr-latest", document={ "type": "image_url", "image_url": f"data:image/jpeg;base64,{base64_image}" }, include_image_base64=True ) combined_markdown, raw_markdown = get_combined_markdown(ocr_response) return combined_markdown, raw_markdown return "## Method not supported.", "Method not supported." ``` ### Perform OCR on URL Next, we can define a function to perform OCR on URLs. We need different ones for images and for PDF documents. ```python def perform_ocr_url(url, ocr_method="Mistral OCR"): if ocr_method == "Mistral OCR": content_type = get_content_type(url) if 'application/pdf' in content_type: ocr_response = client.ocr.process( model="mistral-ocr-latest", document={ "type": "document_url", "document_url": url, }, include_image_base64=True ) elif any(image_type in content_type for image_type in ['image/png', 'image/jpeg', 'image/jpg']): ocr_response = client.ocr.process( model="mistral-ocr-latest", document={ "type": "image_url", "image_url": url, }, include_image_base64=True ) else: return "Unsupported file type. Please provide a URL to a PDF or an image.", "" combined_markdown, raw_markdown = get_combined_markdown(ocr_response) return combined_markdown, raw_markdown return "## Method not supported.", "Method not supported." ``` ## Step 7: Create Gradio Interface Lastly, we can create the Gradio interface to interact with the OCR functions! ```python with gr.Blocks() as demo: gr.Markdown("# Mistral OCR") gr.Markdown("Upload a PDF or an image, or provide a URL to extract text and images using Mistral OCR capabilities.\n\nLearn more in the blog post [here](https://mistral.ai/news/mistral-ocr).") with gr.Tab("Upload File"): file_input = gr.File(label="Upload a PDF or Image") ocr_method_file = gr.Dropdown(choices=["Mistral OCR"], label="Select OCR Method", value="Mistral OCR") file_output = gr.Markdown(label="Rendered Markdown") file_raw_output = gr.Textbox(label="Raw Markdown") file_button = gr.Button("Process") example_files = gr.Examples( examples=[ "pixtral-12b.pdf", "receipt.png" ], inputs=[file_input] ) file_button.click( fn=perform_ocr_file, inputs=[file_input, ocr_method_file], outputs=[file_output, file_raw_output] ) with gr.Tab("Enter URL"): url_input = gr.Textbox(label="Enter a URL to a PDF or Image") ocr_method_url = gr.Dropdown(choices=["Mistral OCR"], label="Select OCR Method", value="Mistral OCR") url_output = gr.Markdown(label="Rendered Markdown") url_raw_output = gr.Textbox(label="Raw Markdown") url_button = gr.Button("Process") example_urls = gr.Examples( examples=[ "https://arxiv.org/pdf/2410.07073", "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png" ], inputs=[url_input] ) url_button.click( fn=perform_ocr_url, inputs=[url_input, ocr_method_url], outputs=[url_output, url_raw_output] ) demo.launch(max_threads=1) ``` ## Step 8: Run the Application Run the script to launch the Gradio interface. You can interact with the interface to perform OCR on files or URLs. ```bash python your_script_name.py ``` A live demo can be found [here](https://pandora-s-mistral-ocr.hf.space), and more information can be found on the [blog post](https://mistral.ai/news/mistral-ocr).