Acquiring the textual content in a messy PDF file is extra problematic than it’s useful. The issue doesn’t lie within the means to rework pixels into textual content, however slightly, in sustaining the construction of the doc. Tables, headings, and pictures must be in the best sequence. When utilizing Mistral OCR 3, it’s not the textual content conversion, however the manufacturing of enterprise usable info. The brand new AI-powered doc extraction device might be meant to boost sophisticated file extraction.
This information discusses the Mistral OCR 3 mannequin. We’ll additionally focus on its new options and their strategies of utilization, and eventually, conclude with a comparability with the open-weights DeepSeek-OCR mannequin as nicely.
Understanding Mistral OCR 3
Mistral presents its new device OCR 3 as a general-purpose one. It offers with the massive variety of paperwork current in organizations, and isn’t restricted to OCRing clear scans of invoices. Mistral offers an important enhancements that remedy a few of the frequent failures of OCR.
- Handwriting: The mannequin will get improved work on printing and handwriting of textual content on printers.
- Types: It processes sophisticated buildings of bins, labels, and blended forms of texts. It’s typical of invoices, receipts, and authorities paperwork.
- Scanned Paperwork: The system is much less affected by scanning artifacts similar to skew, distortion, low decision, and so forth.
- Advanced Tables: It gives an improved desk of reconstruction. This can embody a mix of cells, in addition to multi-rows. The output is in HTML tags with a purpose to preserve the unique format.
Mistral says that it examined the mannequin in opposition to inner benchmarks, which imply actual enterprise circumstances.
What’s New in OCR 3?
The ultimate launch provides two vital modifications to builders: high quality of the output and management. These traits amplify organized extraction powers of the mannequin.
1. New Controls for Doc Parts: The changelog of the Mistral OCR 3 associates the brand new mannequin with novel parameters and outputs. Tableformat is now capable of choose between markdown and HTML. Extractheader, extractfooter, and hyperlinks will even assist in the dealing with of particular doc sections. This is without doubt one of the foundations of its doc AI system.
2. A UI Playground for Quick Testing: Mistral OCR 3 has its OCR API and a “Doc AI Playground” in Mistral AI Studio. A playground permits you to check difficult eventualities expediently, e.g. defective scans or scribbles. Earlier than automating your course of, you may modify such parameters as desk format and verify outputs. Profitable OCR tasks ought to have a suggestions loop that’s quick.
3. Backward Compatibility: Mistral confirms that OCR 3 is suitable with the remainder of its earlier model. This can allow groups to modernize their techniques over time with out re-writing their pipeline.
Fashions and Pricing
The OCR 3 is claimed to be mistral-ocr-2512. The documentation additionally refers to a mistral-ocr-latest alias. Pricing might be achieved on a web page foundation.
- $2 per 1000 pages
- $3 per 1000 annotated pages
The second value can be if you find yourself utilizing annotations to do structured extraction. This price must be put within the funds early by the groups.
Palms-on with the Doc AI Playground
You possibly can entry Mistral OCR 3 by way of the Doc AI Playground in Mistral AI Studio. This permits for fast, sensible testing.
- Open the Doc AI Playground in Mistral AI Studio. Head over to console.mistral.ai/construct/document-ai/ocr-playground
In case you see “Choose a plan”, then join utilizing your quantity and it is possible for you to to see the next

- Add a PDF or picture file. Begin with a difficult doc, like a scanned type with a desk.
Why this picture?
A clear bill with a desk (nice first check for OCR 3 desk reconstruction)
Use this to verify:
- studying order (header fields vs line objects)
- desk extraction (rows/columns, totals)
- header/footer extraction
- Choose the OCR 3 mannequin, which can be
mistral-ocr-2512or newest. - Select a desk format. Use html for structural accuracy or markdown in case your pipeline makes use of it.

- Run the method and examine the output. Examine the studying order and desk construction.
Output:

- This primary OCR 3 run is basically flawless for a clear digital bill.
- All key fields, format sections, and the cost abstract desk are captured accurately with no textual content errors or hallucinations.
- Desk construction and numeric consistency are preserved, which is important for monetary automation.
- It reveals OCR 3 is production-ready out of the field for normal invoices.
Palms-on with the OCR API
Choice A: OCR a Doc from a URL
The OCR API helps doc URLs. It returns textual content and structured parts.
Here’s a Python instance utilizing the official SDK.
import os
from mistralai import Mistral, DocumentURLChunk
shopper = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
resp = shopper.ocr.course of(
mannequin="mistral-ocr-2512",
doc=DocumentURLChunk(document_url="https://arxiv.org/pdf/2510.04950"),
table_format="html",
extract_header=True,
extract_footer=True,
)
print(resp.pages[0].markdown[:1000])
Output:

Choice B: Add Information and OCR by file_id
This technique works for personal paperwork, not on a public URL. Mistral’s API has a /v1/recordsdata endpoint for uploads.
First, add the file utilizing Python.
import os
from mistralai import Mistral
shopper = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
uploaded = shopper.recordsdata.add(
file={"file_name": "doc.pdf", "content material": open("/content material/Resume-Pattern-1-Software program-Engineer.pdf", "rb")},
objective="ocr",
)
resp = shopper.ocr.course of(
mannequin="mistral-ocr-2512",
doc={"file_id": uploaded.id},
table_format="html",
)
print(resp.pages[0].markdown[:1000])
Output:

Dealing with Photographs and Tables
Photographs and tables within the markdown are characterised by placeholders utilized by OCR output of Mistral. The actual content material that’s extracted is given again in numerous arrays. This format offers you an choice to have the markdown as the first doc view. The image and desk assets can then be saved within the required location.
Easy OCR is step one. Structured Extraction offers the true worth. The function of thought annotations is offered within the doc AI platform by Mistral. It permits you to create a schema and unstructure paperwork with JSON. That’s the way you provide you with reliable extraction pipelines which can’t be damaged by altering an bill format by a vendor. One answer is extra sensible which is to make use of OCR 3 to enter textual content and annotations to the actual fields you require, e.g. bill numbers or totals.
Scaling Up with Batch Inference
In excessive quantity processing, a batching is required. The batch system by Mistral permits you to submit a lot of API requests in a file with a.jsonl extension. They’ll then be run as one job. The documentation signifies that /v1/ocr is without doubt one of the supported batch jobs endpoints.
Find out how to Select the Proper Mannequin
The only option is dependent upon your paperwork and constraints. Here’s a clear strategy to consider.
What to Measure
- Textual content Accuracy: Use character or phrase error charges on pattern pages.
- Construction High quality: Rating desk reconstruction and studying order correctness.
- Extraction Reliability: Measure area accuracy to your goal knowledge factors.
- Operational Efficiency: Monitor latency, throughput, and failure modes.
Let’s Evaluate
Use the next picture because the reference to match the each fashions. We chosen this picture as it’s:
A tough stress-test type with boxed fields + blended handwriting + printed textual content (nice for evaluating OCR 3 vs DeepSeek-OCR).
We’ll use this to match:
- handwriting accuracy (cursive + digits)
- field/area alignment (numbers inside little squares)
- robustness to dense layouts and small textual content
Mistral OCR 3

Output:

This result’s spectacular given the issue of the enter.
- Mistral OCR 3 accurately identifies the doc construction, headers, and most handwritten digits and textual content, changing a dense handwriting type into usable markdown.
- Some duplication and minor alignment points seem within the tables, which is anticipated for heavy handwriting grids.
- Total, it demonstrates robust handwriting recognition and format consciousness, making it appropriate for real-world type digitization with gentle post-processing
Deepseek OCR

The outcome has been beautified which makes it simpler to undergo than the earlier response. Listed here are few different issues that I observed concerning the :
- DeepSeek OCR reveals strong handwriting recognition however struggles extra with semantic accuracy and format constancy.
- Key fields are misinterpreted, similar to “Metropolis” and “State ZIP”, and desk construction is much less trustworthy with incorrect headers and duplicated rows.
- Character-level recognition is respectable, however spacing, grouping, and area that means degrade beneath dense handwriting.
Outcome:
Mistral OCR 3 clearly outperforms DeepSeek OCR on this handwriting-heavy type. It preserves doc construction, area semantics, and desk alignment way more precisely, even beneath dense handwritten grids. DeepSeek OCR reads characters fairly nicely however breaks on format, headers, and area that means, resulting in greater cleanup effort. For real-world type digitization and automation, Mistral OCR 3 is the clear winner.
Which One Ought to You Select?
Choose Mistral OCR 3 in case you require a full OCR product that features a UI and a transparent OCR API. It’s optimum in case of high-fidelity and predictable SaaS price and valuation of desk reconstruction.
Choose DeepSeek-OCR when it’s required to be hosted on-premises or self-hosted. It offers the flexibleness and management of the inference course of to the groups which are keen to manage the operations. It’s potential that many groups will resort to the each: Mistral as the first pipeline and DeepSeek as a backup of delicate paperwork.
Conclusion
The construction and workflow turn into main issues because of the modifications in Mistral OCR 3. The desk controls, JSON extraction annotations, and a playground have options similar to UI and may cut back improvement time. It is without doubt one of the highly effective productizations of doc intelligence. DeepSeek-OCR gives one other means. It considers OCR a compression downside that’s involved with LLM, and gives customers with freedom of infrastructure. These two fashions display the longer term separation of OCR expertise.
Regularly Requested Questions
A. Its key power is that it concentrates on sustaining doc construction together with sophisticated tables and studying sequences, changing scanned paperwork to helpful info.
A. It has the potential of producing tables in HTML format, which has the added benefit of sustaining advanced knowledge similar to merged cells and multi-row headers making certain better knowledge integrity.
A. Sure, Doc AI Playground within the AI Studio of Mistral provides you add paperwork and experiment with the OCR options.
Login to proceed studying and revel in expert-curated content material.
