Open‑source InternVL3.5 crushes GPT‑4V on multimodal benchmarks

3 acossta 1 8/26/2025, 10:31:27 PM medium.com ↗

Comments (1)

acossta · 2h ago
This isn’t another hype piece. The InternVL3.5 is a coherent vision‑language model that actually understands pixels and text together. It comes in sizes from 1 B up to a monster 241 B parameters, and on benchmarks like MMMU and ChartQA it beats closed models like GPT‑4V, Claude and Qwen. An open‑source LLM that competitive signals we can build cutting‑edge multimodal apps without depending on a black‑box API, which is a big deal for devs who care about hackability and reproducibility.