Red Hat Technical Writing Style Guide (stylepedia.net)

We built a tool that lets you augment LLM agents with visual capabilities — like OCR, object detection, and video editing — using just plain English. No need to write computer vision code.

Examples:

> “Blur all faces in this image and preview it.”

> “Extract the invoice ID, email, and totals from this invoice and overlay their locations.”

> "Redact all the sensitive data in this image, and preview the result."

> “Trim this video from 0:30 to 1:10 and add captions.”

It works with any MCP-compatible agent (Claude, OpenAI, Cursor, etc.), and turns natural language into visual AI workflows. No Python. No brittle CV pipelines. Just describe what you want, and your agent handles the rest.

Here's the full showcase / our docs:

[1] Colab showcase: https://colab.research.google.com/github/vlm-run/vlmrun-cook...

[2] MCP Intro / Docs: https://docs.vlm.run/mcp/introduction

We’d love feedback — especially from devs building LLM tools, agentic frameworks, or anything that needs visual understanding.

kernel33 · 18h ago

Are you running everything through a single end-to-end vision model, or do you dynamically dispatch to specialized OCR, detection, and segmentation backends?

fzysingularity · 17h ago

This demo showcases the latter approach with tool-calling - essentially filling in the gaps of current VLMs. That said, we're of course interested in folding all these capabilities into a single model, but that's going to take a bit more work.

What makes this approach interesting is that our VLMs need to able to understand intermediate results (sometimes in the form of images themselves), and then delegate to other specialized tools whenever it can't perform a specific action.

MirajulMohin · 19h ago

Tried it out. Cool!

Red Hat Technical Writing Style Guide (stylepedia.net)

Kawa: ECS – A fast and modern ECS for C++20 Looking for Feedback and Testers (github.com)

Executed Chinese prisoners likely used in UK exhibition (2021) (theartnewspaper.com)

Tariffs turn techies topsyturvy as US braces for PC tax: American shipments flat (theregister.com)

Layers of Lawyers and Liars (matthewbutterick.com)

Strategy (rohitgupta.in)

The Middle Ages are making a political comeback (jewishworldreview.com)

The AI startup frenzy: 'Everyone's pivoting, then pivoting again' (theverge.com)

Causal effect of video gaming on mental well-being in Japan 2020–2022 (2024) (nature.com)

Assessing Public Reach of the 2023 National Test of the Wireless Emergency Alert [pdf] (rand.org)

Snippets Library (snippetslibrary.com)

AI slows down some experienced software developers, study finds (reuters.com)

European Union Unveils Rules for Powerful A.I. Systems (nytimes.com)

Education 3.0 (doc.searls.com)

Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it (github.com)

Spacelift Raises $51M Series C to Redefine Enterprise Infrastructure Automation (spacelift.io)

What Every Data Scientist Needs to Know About GPUs [video] (youtube.com)

Sweden and Norway racing to launch satellites from mainland Europe (reuters.com)

Bastille 1.0 – Bastille Day 2025 (github.com)

The Lazy Marketer's Guide to Not Writing Terrible AI Prompts (aistackmarketer.substack.com)

What Keeps the Lights On (thenewatlantis.com)

Arm estimates a 14-fold increase in data center customers since 2021 (reuters.com)

Japan Wires the Ocean with an Earthquake-Sensing 'Nervous System' (scientificamerican.com)

Robot performs realistic gallbladder surgery 'with 100% accuracy' (news.sky.com)

Jupiter endangers Earth, and may have extincted the dinosaurs (bigthink.com)

Parsing 1 Billion Rows in TypeScript/Bun Under 10 seconds (taekim.dev)

Upgrading agentic coding capabilities with the new Devstral models (mistral.ai)

End-to-End News Sentiment Pipeline with Serverless AWS, DuckDB and Streamlit (github.com)

Pump Fiction (youtube.com)

Multi-Player Durable Stream Playground (s2.dev)

Satellite data indicates recent Arctic peatland expansion with warming (nature.com)

Robot performs first realistic surgery without human help (hub.jhu.edu)

Underwater turbine spinning for 6 years off Scotland's coast is a breakthrough (apnews.com)

Searchcraft: Advanced Search Developer Tools (searchcraft.io)

Psalm v7: up to 10x performance (blog.daniil.it)

Challenges for no code tools for data science (medium.com)

Elon Musk's X faces an uncertain future (axios.com)

Integrating Long-Term Memory with Gemini 2.5 (philschmid.de)

Show HN: Coherence – 5 min agentic chat SDK (withcoherence.com)

Auto Generating Blog Feed (pliutau.com)

I Found a Lost Music Generator from the 90s [video] (youtube.com)

Show HN: Perennial Task (Prn) (github.com)

SHOW HN: Stripe Ignoring Legal Letters and Holding $800k+

Go-EUVD: Go Library for Interacting with Enisa EU Vulnerability Database (EUVD) (github.com)

Tiny aquarium fish net 3D print (blog.qiqitori.com)

The Grip That Race and Identity Have on My Students (nytimes.com)

Evaluating the Critical Risks of Amazon’s Nova Premier (alphaxiv.org)

How I ported Penko Park to Switch (ghostbutter.com)

Europe's Great Founders Must Unretire (generalist.com)

Intel CEO says it's "too late" for them to catch up with AI competition (tomshardware.com)

Build visual AI workflows from a prompt – OCR, detection, editing and more

Comments (4)