MLLMs can replace OCR for document extraction, large-scale study finds
A large-scale benchmarking study comparing multimodal large language models (MLLMs) against traditional OCR-enhanced pipelines for document information extraction finds that image-only inputs can achieve comparable performance. The research evaluates multiple out-of-the-box MLLMs on business documents and proposes an automated hierarchical error analysis framework using LLMs to diagnose failure modes.