Sunday, April 5, 2026
Catatonic Times
No Result
View All Result
  • Home
  • Crypto Updates
  • Bitcoin
  • Ethereum
  • Altcoin
  • Blockchain
  • NFT
  • Regulations
  • Analysis
  • Web3
  • More
    • Metaverse
    • Crypto Exchanges
    • DeFi
    • Scam Alert
  • Home
  • Crypto Updates
  • Bitcoin
  • Ethereum
  • Altcoin
  • Blockchain
  • NFT
  • Regulations
  • Analysis
  • Web3
  • More
    • Metaverse
    • Crypto Exchanges
    • DeFi
    • Scam Alert
No Result
View All Result
Catatonic Times
No Result
View All Result

AI Doesn’t Just Read Texts, It “Sees” Them

by Catatonic Times
October 22, 2025
in Metaverse
Reading Time: 3 mins read
0 0
A A
0
Home Metaverse
Share on FacebookShare on Twitter


Deepseek’s new OCR system processes texts as pictures and compresses them as much as 10 occasions. This expertise, able to analyzing 33 million pages in a day, permits AI to learn for much longer paperwork.

Deepseek, a Chinese language synthetic intelligence firm, is attracting consideration with its new OCR (Optical Character Recognition) system developed for extra environment friendly processing of text-based paperwork. The system compresses image-based texts, enabling AI fashions to course of for much longer paperwork with out hitting their reminiscence limits.

Processing Textual content as Visible Information

In line with Deepseek’s technical report, the system analyzes textual content knowledge in picture format as a substitute of processing it immediately. This method considerably reduces the computational load. The brand new OCR system can compress texts by as much as 10 occasions whereas retaining 97% of the knowledge.

As identified, massive language fashions symbolize textual content as tokens, with every token containing a number of characters. Researchers are working to develop fashions that may course of lengthy paperwork and conversations exceeding tens of millions of tokens, thereby increasing the context window. Nonetheless, because the variety of tokens that may be processed concurrently will increase, so do the computational prices. Thus, a big token capability prevents the mannequin’s reminiscence from filling up even with lengthy paperwork, however it will increase the price. Deepseek’s OCR answer, nonetheless, processes very lengthy content material as if it had been a picture, successfully viewing the content material as pixels.

Seeing Lengthy Texts as Pixels

The core of the system consists of two primary parts: DeepEncoder and Deepseek3B-MoE. DeepEncoder, which handles the picture processing, operates with 380 million parameters. Deepseek3B-MoE, answerable for textual content era, has 570 million energetic parameters. DeepEncoder combines Meta’s 80-million-parameter SAM (Phase Something Mannequin) and OpenAI’s 300-million-parameter CLIP mannequin. An middleman 16x compressor considerably reduces the picture knowledge, rising processing velocity. For instance, 4,096 tokens of a $1,024 occasions 1,024$ pixel picture are diminished to solely 256 tokens after compression.

Deepseek OCR can function utilizing between 64 and 400 “imaginative and prescient tokens,” relying on the decision. This quantity considerably lightens operations that sometimes require hundreds of tokens in basic OCR programs. In OmniDocBench assessments, the system outperformed GOT-OCR 2.0 utilizing solely 100 imaginative and prescient tokens. It additionally surpassed the efficiency of MinerU 2.0, which required over 6,000 tokens, whereas working underneath 800 tokens.

The system, optimized for various doc varieties, makes use of 64 tokens for easy shows, 100 tokens for books and reviews, and 800 tokens utilizing a particular mode referred to as “Gundam mode” for advanced newspapers.Deepseek OCR can course of not solely textual content but in addition advanced visible parts like diagrams, chemical formulation, and geometric shapes. Moreover, it really works in roughly 100 languages, can protect formatting, and might generate plain textual content or basic visible descriptions if desired.

Processes 33 Million Pages a Day

Roughly 30 million PDF pages had been used to coach the system. 25 million of this knowledge consisted of English and Chinese language paperwork, and the remainder comprised 10 million artificial diagrams, 5 million chemical formulation, and 1 million geometric shapes.

In real-world use, Deepseek OCR achieves a really excessive processing capability. The system can course of over 200,000 paperwork a day on a single Nvidia A100 GPU. With 20 servers, every housing eight A100 GPUs, this capability will increase to 33 million pages per day. This velocity has the potential to vastly facilitate the manufacturing of coaching knowledge for brand new AI fashions. Each the code and mannequin weights are publicly out there (accessible through the supply part).

You Would possibly Additionally Like;

Observe us on TWITTER (X) and be immediately knowledgeable concerning the newest developments…

Copy URL
URL Copied



Source link

Tags: DoesntReadSeesTexts
Previous Post

CleanSpark Stock Jumps 13% on Big AI Expansion Plans

Next Post

Is this support level make-or-break for Bitcoin

Related Posts

Why Network Failures Break UC Performance
Metaverse

Why Network Failures Break UC Performance

April 4, 2026
HiBob Launches Global Chapters to Connect HR Leaders Locally
Metaverse

HiBob Launches Global Chapters to Connect HR Leaders Locally

April 3, 2026
Why Workday, Qualtrics and Viva Must Integrate in 2026
Metaverse

Why Workday, Qualtrics and Viva Must Integrate in 2026

April 2, 2026
Oracle Layoffs: The Hidden Impact on Enterprise IT
Metaverse

Oracle Layoffs: The Hidden Impact on Enterprise IT

April 2, 2026
The Office Has Left the Building: Here’s How to Ensure Your Headsets Follow
Metaverse

The Office Has Left the Building: Here’s How to Ensure Your Headsets Follow

April 1, 2026
Lessons from The Wrong Biennale – Hypergrid Business
Metaverse

Lessons from The Wrong Biennale – Hypergrid Business

April 2, 2026
Next Post
Is this support level make-or-break for Bitcoin

Is this support level make-or-break for Bitcoin

LayerZero outlook: ZRO price on the edge ahead of M token unlock

LayerZero outlook: ZRO price on the edge ahead of $43M token unlock

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Catatonic Times

Stay ahead in the cryptocurrency world with Catatonic Times. Get real-time updates, expert analyses, and in-depth blockchain news tailored for investors, enthusiasts, and innovators.

Categories

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Scam Alert
  • Uncategorized
  • Web3

Latest Updates

  • US Banking Group Slams Coinbase Conditional Trust Approval, Citing Risks in Crypto Banking Expansion – Featured Bitcoin News
  • Bitcoin And Ethereum Adoption Gets A Boost From Schwab Launch
  • $200M+ Bet on Conflict Outcomes in 2026 – Bitcoin News
  • About Us
  • Advertise with Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Catatonic Times.
Catatonic Times is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto Updates
  • Bitcoin
  • Ethereum
  • Altcoin
  • Blockchain
  • NFT
  • Regulations
  • Analysis
  • Web3
  • More
    • Metaverse
    • Crypto Exchanges
    • DeFi
    • Scam Alert

Copyright © 2024 Catatonic Times.
Catatonic Times is not responsible for the content of external sites.