DeepSeek-OCR: Images Simplify Text for Large Language Models

DeepSeek-OCR: Simplifying Text for Large Language Models

DeepSeek is experimenting with an OCR model, demonstrating that compressed images are more memory-friendly for GPU calculations than numerous text tokens.

Many company documents, although available as PDFs, are often scanned, making it challenging to convert them to text while preserving their complex structure.

Images, tables, and graphics are common sources of errors, prompting a surge in OCR software relying on large language models (LLMs) in recent months.

Chinese AI developer DeepSeek is now entering this field, releasing an experimental OCR model under the MIT license, following their Reasoning Model R1.

Even though it sounds simple, these documents can often only be converted to text with great effort.

This move may seem surprising, as OCR was not previously considered DeepSeek's core competence.

Author's summary: DeepSeek experiments with OCR model for large language models.

more

heise online heise online — 2025-10-24