Gemini 3 is Now Available as an OCR Model in Tensorlake

AI Summary6 min read

TL;DR

Gemini 3 is now available as an OCR model in Tensorlake, offering advanced document parsing with low edit distance and capabilities for complex visual reasoning, bulk ingestion, and structured data extraction.

Key Takeaways

  • Gemini 3 has the lowest edit distance (0.115) on OmniDocBench compared to competitors like GPT-5.1 and Claude Sonnet 4.5, making it highly accurate for document parsing.
  • Tensorlake integrates Gemini 3 into its Document Ingestion API, enabling bulk document processing, conversion to Markdown, page classification, and structured data extraction using JSON schema.
  • Gemini 3 excels in complex visual reasoning tasks, such as table structure recognition and visual question answering, but is not suitable for tasks requiring bounding boxes or strict text style detection.
  • Tensorlake manages challenges like rate limits and large file chunking automatically, allowing users to ingest massive datasets without hitting errors or exceeding token limits.
  • Users can experiment with Gemini 3 in the Tensorlake Playground or integrate it via HTTP API/SDK, with plans to add more foundation models for enhanced OCR capabilities.

Tags

aiprogrammingrag

Gemini 3 is now available within Tensorlake

Google’s Gemini model since 2.5 Flash has been great at Document Parsing. The latest Gemini 3 pushes the envelope even further. It has the lowest edit distance(0.115) on OmniDocBench compared to GPT-5.1(0.147) and Claude Sonnet 4.5.

Starting today, you can start using Gemini as an OCR Engine with Tensorlake’s Document Ingestion API. You can ingest Documents in bulk, and convert them into Markdown, classify pages or extract structured data using JSON schema. Tensorlake will take care of queuing, working with rate limits and sending you webhooks as documents are processed.

We put Gemini 3 to the test inside Tensorlake, and the results on "hostile" document layouts were immediate.

Case Study 1: Table Structure Recognition

Document: Google 2024 Environmental Report

Financial and scientific reports use visual cues, like indentation, floating columns, and symbols, to convey meaning. To test this, we fed the complex "Water Use" table from the Appendix into Gemini 3.

Google environment report

The Challenge

The table is semi wireless - some lines separating some of the rows while the columns have no boundaries. The column on the right is disconnected to the main block.

The Gemini 3 Result: Visual Understanding

Gemini3 does a perfect job on understanding this table. This is a screenshot from the Tensorlake Cloud Dashboard.

Google environment result

Case Study 2: VQA + Structured Output

Document: House Floor Plans

We wanted to test if Gemini 3 could parse visual symbols on construction documents. We fit Gemini3 into Tensorlake’s Structured Extraction pipeline.

The Input: A raw PDF of a house plan and a Pydantic schema defining the exact fields we needed (e.g., kitchen_outlets: int, description: Number of standard and GFI electrical outlets, as noted by the legend icon labeled "outlet", that are found in the kitchen and dining nook.).

For reference, here is the kitchen+dining nook area.

Kitchen Dining diagram

The circle with two lines are the outlets, as per the legend on the same page:

Kitchen dining legend

The Challenge

There is no text label saying "Outlet" on the diagram, it is only associated with the symbol in the legend The model must identify the specific circle-and-line icon defined in the legend, spatially constrain its search to the visual boundaries of the "Kitchen," and aggregate the count into our JSON structure.

The Result

Gemini 3 successfully understood the visual diagram. It returned a valid JSON object with 6 outlets, correctly distinguishing them from nearby data ports and switches.

Kitchen dining result

Tensorlake blends specialized OCR models and VLMs into a set of convenient APIs. While you could call the Gemini API directly, you would be rebuilding many undifferentiated aspects of a production pipeline. Gemini 3 is now fully integrated with Tensorlake DocAI APIs to read, classify, and extract information from documents.

Tensorlake solves the two biggest headaches of building Document Ingestion APIs using VLMs:

  1. Bulk Ingestion & Rate Limits: From our observation Gemini3 doesn’t handle spiky traffic very well. Throwing 10,000 documents at it will trigger errors due to strict quotas. Tensorlake manages the queue, handling back-off and retries automatically so you can ingest massive datasets without hitting 429 errors.

  2. Chunking Large Files: Tensorlake automatically chunks large documents into chunks of 25 pages to make sure Gemini is able to extract even the most dense pages. We ensure that the output token limit of 64k is not exceeded.

When to use (and NOT use) Gemini 3

Use Gemini 3 when:

  • Complex Visual Reasoning is required: You need to correlate a chart's color legend to a data table, or count symbols on a blueprint (as shown in the house plan example).

Do NOT use Gemini 3 when:

  • You need bounding boxes for citation: Gemini 3 does not perform layout detection of objects in documents. If your application requires strict Bounding Boxes to highlight exactly where a specific paragraph or number came from.

  • You need strict text style and font detection: Visual nuances like strikethroughs, underlines, or specific font colors are often ignored by VLMs, which focus on the "content" rather than the style.

For these tasks, you should use one of Tensorlake’s specialized models, like Model03.

How to use Gemini 3 with Tensorlake

Playground

Gemini 3 is available today in the Tensorlake Playground for experimentation:

Playground settings

Or you can select it with our HTTP API or SDK:

from tensorlake.documentai import DocumentAI, ParsingOptions

client = DocumentAI()

parse_id = client.read(
  file_url="https://tlake.link/docs/real-estate-agreement",
  parsing_options=ParsingOptions(
    ocr_model="gemini3"
  )
)

result = client.result(parse_id)
)
Enter fullscreen mode Exit fullscreen mode

What's Next

Document Ingestion has a lot of edge cases. We want our users to always have access to state of the art models so that they can solve their use cases fairly quickly by changing various aspects of the OCR pipelines with very minimal code changes.

We will add more Foundation Models as OCR model options in Tensorlake’s Document Ingestion API.

Try Tensorlake free

Want to discuss your specific use case?
Schedule a technical demo with our team.

Questions about the benchmark?
Join our Slack community

Visit Website