Benchmarking AI on Tables and Engineering Drawings: Results and Findings

This benchmark systematically evaluates 13 AI models across two demanding domains: tabular data extraction from PDFs and engineering drawing interpretation.

Tested models: 9 LLMs with vision capabilities (GPT-4o, GPT o4 mini, GPT o3, Claude Opus 4, Gemini 2.5 Pro, Gemini 2.5 Flash, Grok 2 Vision, Qwen VL Plus, Pixtral Large) and 3 traditional layout models (Amazon Textract via boto3, Azure Prebuilt Layout, Google Layout Parser).

We focus on practical performance metrics: per-document extraction accuracy, processing latency, and cost. Models are evaluated zero-shot, without fine-tuning, under noisy and irregular real-world conditions. Additionally, we perform iterative multi-pass extraction to assess improvements in coverage and stability. Two datasets includes real-world complex tables and engineering drawings.

For reproducibility and transparency, we provide the exact prompt used for testing each model on document understanding tasks, enabling other researchers to replicate or extend our experiments.

This benchmark delivers actionable insights for production deployment, revealing the trade-offs between extraction accuracy, inference speed, and operational cost, and highlights which models are ready to be used as-is in real document processing applications. Intelligent Document Processing (IDP) Models Benchmark | Businessware Technologies

Benchmarking AI on Tables and Engineering Drawings: Results and Findings

Comments (1)