Chinese AI startup DeepSeek has introduced a new multimodal AI capable of processing large and complex documents using significantly fewer tokens.
DeepSeek-OCR employs visual perception as a means of compressing information.
The system is the result of research into the “role of visual encoders” for text compression in large language models (LLM). This approach enables neural networks to handle vast amounts of information without a proportional increase in computational costs.
“With DeepSeek-OCR, we demonstrated that compressing text through visual representations allows for a 7–20 fold reduction in tokens at various stages of context. This opens a promising direction for addressing the long history problem in LLMs,” the company stated.
DeepSeek-OCR consists of two main components:
- DeepEncoder — the encoder;
- DeepSeek3B-MoE-A570M — the decoder.
The first serves as the main computational core of the model. It maintains low activity while processing high-resolution images, achieving a substantial level of compression. This reduces the number of tokens.
The decoder, a Mixture-of-Experts model with 570 million parameters, is responsible for restoring the original text. The architecture divides the neural network into several independent subnetworks — “experts,” each specializing in its part of the input data. Together, they solve the overall task.
DeepSeek-OCR can analyze complex structured visual content, tables, formulas, and geometric diagrams. According to the company, this makes the model particularly useful for applications in finance and scientific research.
The company noted that DeepSeek-OCR achieved 97% decoding accuracy. At a 20x compression ratio, the model retained about 60%. This underscores its ability to preserve information even at extreme levels of compression.
On OmniDocBench — a benchmark test for evaluating the understanding of diverse documents — DeepSeek-OCR outperformed leading optical character recognition models like GOT-OCR 2.0 and MinerU 2.0, while using significantly fewer tokens.
Back in August, the startup updated its flagship AI model V3.
