I scanned all of GitHub's "oops commits" for leaked secrets (trufflesecurity.com)

125 points by elza_1111 2h ago 52 comments

Fei-Fei Li: Spatial intelligence is the next frontier in AI [video] (youtube.com)

92 points by sandslash 1d ago 35 comments

Astronomers discover 3I/ATLAS – Third interstellar object to visit Solar System (abc.net.au)

103 points by gammarator 5h ago 52 comments

Next month, saved passwords will no longer be in Microsoft’s Authenticator app (cnet.com)

91 points by ColinWright 2d ago 121 comments

Whole-genome ancestry of an Old Kingdom Egyptian (nature.com)

95 points by A_D_E_P_T 8h ago 46 comments

Trans-Taiga Road (2004) (jamesbayroad.com)

101 points by jason_pomerleau 7h ago 39 comments

Exploiting the IKKO Activebuds “AI powered” earbuds (2024) (blog.mgdproductions.com)

503 points by ajdude 18h ago 190 comments

Nano-engineered thermoelectrics enable scalable, compressor-free cooling (jhuapl.edu)

63 points by mcswell 2d ago 30 comments

That XOR Trick (2020) (florian.github.io)

115 points by hundredwatt 2d ago 60 comments

ASCIIMoon: The moon's phase live in ASCII art (asciimoon.com)

202 points by zayat 1d ago 69 comments

Gmailtail – Command-line tool to monitor Gmail messages and output them as JSON (github.com)

66 points by c4pt0r 8h ago 10 comments

There's no such thing as a tree (phylogenetically) (2021) (eukaryotewritesblog.com)

24 points by afunk 2d ago 7 comments

Demonstration of Algorithmic Quantum Speedup for an Abelian Hidden Subgroup (journals.aps.org)

15 points by boilerupnc 4h ago 5 comments

Show HN: CSS generator for a high-def glass effect (glass3d.dev)

300 points by kris-kay 17h ago 86 comments

Conversations with a hit man about a notorious cold case (magazine.atavist.com)

58 points by gmays 1d ago 4 comments

Couchers is officially out of beta (couchers.org)

189 points by laurentlb 14h ago 81 comments

Writing Code Was Never the Bottleneck (ordep.dev)

5 points by phire 2d ago 0 comments

AI note takers are flooding Zoom calls as workers opt to skip meetings (washingtonpost.com)

161 points by tysone 14h ago 181 comments

Features of D That I Love (bradley.chatha.dev)

133 points by vips7L 16h ago 111 comments

Vitamin C Boosts Epidermal Growth via DNA Demethylation (jidonline.org)

88 points by gnabgib 12h ago 32 comments

A Higgs-Bugson in the Linux Kernel (blog.janestreet.com)

114 points by Ne02ptzero 14h ago 13 comments

Serenading Cells with Audible Sound Alters Gene Activity (scientificamerican.com)

7 points by Bluestein 2d ago 0 comments

The uncertain future of coding careers and why I'm still hopeful (jonmagic.com)

29 points by mooreds 7h ago 48 comments

What to build instead of AI agents (decodingml.substack.com)

168 points by giuliomagnifico 8h ago 98 comments

LLMs as Compilers (resync-games.com)

26 points by kadhirvelm 6h ago 35 comments

The Evolution of Caching Libraries in Go (maypok86.github.io)

107 points by maypok86 3d ago 26 comments

Websites hosting major US climate reports taken down (apnews.com)

354 points by geox 11h ago 169 comments

Sony's Mark Cerny Has Worked on "Big Chunks of RDNA 5" with AMD (overclock3d.net)

87 points by ZenithExtreme 16h ago 93 comments

Gene therapy restored hearing in deaf patients (news.ki.se)

329 points by justacrow 17h ago 79 comments

Physicists start to pin down how stars forge heavy atoms (quantamagazine.org)

59 points by jnord 11h ago 3 comments

The Zen of Quakerism (2016) (friendsjournal.org)

104 points by surprisetalk 3d ago 82 comments

Don’t use “click here” as link text (2001) (w3.org)

476 points by theandrewbailey 21h ago 328 comments

A list is a monad (alexyorke.github.io)

137 points by polygot 3d ago 149 comments

MindsDB (YC W20) is hiring an AI solutions engineer (job-boards.greenhouse.io)

1 points by adam_carrigan 15h ago 0 comments

More assorted notes on Liquid Glass (morrick.me)

143 points by freediver 1d ago 131 comments

Private sector lost 33k jobs, badly missing expectations of 100k increase (cnbc.com)

460 points by ceejayoz 19h ago 280 comments

Escher's art and computer science (github.com)

54 points by signa11 1d ago 9 comments

You People Keep Contradicting Yourselves (taylor.gl)

12 points by taylorlunt 5h ago 12 comments

Efficient set-membership filters and dictionaries based on SAT (github.com)

39 points by keepamovin 3d ago 8 comments

Evidence of a 12,800-year-old shallow airburst depression in Louisiana (scienceopen.com)

102 points by keepamovin 2d ago 57 comments

I'm a physicist by trade, not by training, and that matters (csferrie.medium.com)

6 points by MaysonL 1d ago 1 comments

WebAssembly Troubles part 4: Microwasm (2019) (troubles.md)

38 points by Bogdanp 3d ago 1 comments

Nightmares Linked to Faster Ageing and Premature Mortality (emjreviews.com)

12 points by gnabgib 9h ago 4 comments

A proof-of-concept neural brain implant providing speech (arstechnica.com)

110 points by LorenDB 3d ago 68 comments

Man says ChatGPT sparked a 'spiritual awakening'. Wife says threatens marriage (cnn.com)

4 points by thunderbong 1h ago 1 comments

NIH Scientists Link Air Pollution and Lung Cancer Mutations in Non-Smokers (insideclimatenews.org)

16 points by OutOfHere 4h ago 0 comments

An Analysis of Links from the White House's "Wire" Website (blog.jim-nielsen.com)

10 points by OuterVale 7h ago 1 comments

Cloudflare Introduces Default Blocking of A.I. Data Scrapers (nytimes.com)

405 points by stephendause 19h ago 289 comments

Poudriere Inside FreeBSD VNET Jail (vermaden.wordpress.com)

6 points by vermaden 7h ago 0 comments

The Unseen Fury of Solar Storms (noemamag.com)

19 points by NaOH 3d ago 1 comments

How we solved multi-modal tool-calling in MCP agents – VLM Run MCP

14 fzysingularity 6 7/2/2025, 6:27:05 PM docs.vlm.run ↗

Comments (6)

vlmrunadmin007 · 14h ago

It's impressive how the MCP example in https://docs.vlm.run/mcp/examples/template-search search retains visual context across multiple images and tool calls. Unlike most chat interfaces, it enables seamless multi-step reasoning—like finding a logo in one image and tracking it in another—without losing state. This makes it ideal for building stateful, iterative visual workflows.

fzysingularity · 14h ago

Hi HN,

We’ve been building agentic VLMs that operate over visual data (i.e. images, PDFs, videos), and were surprised at how underdeveloped the current infrastructure is for multi-modal tool-calling. MCP is all the rage these days, but it sidesteps a fundamental issue that no one seems to talk about - especially in multimodal contexts.

Some of the pain points we ran into when building our MCP server: - LLMs call tools by-value. That’s fine for text and JSON arguments, but completely breaks down for visual inputs. - You can’t pass images or videos as base64 - it kills context limits and latency, and leads to poor-dev experience. - Most “multimodal” MCP servers out there are single-turn demos. They assume local files and don’t support remote or persistent objects, making it impossible to build real workflows that operate on intermediate visual state - which is the core of most computer vision tasks.

So we built our remotely-hosted MCP server (https://docs.vlm.run/mcp/) that makes it trivial for agents to see, understand, and act on visual content using a suite of computer vision tools. We expose these tools (face detection, redaction, captioning, tracking, etc.) through a clean MCP-compatible API. Any agent that can hook into remote MCP servers - Claude, OpenAI, Cursor - can use it out of the box.

Here are a few end-to-end examples (orchestrated by Claude, using our tools):

[1] Document Redaction: https://docs.vlm.run/mcp/examples/document-redaction [2] Face Detection + Blurring: https://docs.vlm.run/mcp/examples/face-redaction [3] Template Matching + Visual Search: https://docs.vlm.run/mcp/examples/template-search [4] Video editing: https://docs.vlm.run/mcp/examples/video-captioning

We’d love to hear what workflows you’re building - and what visual tools you'd want your agents to build on.

EarlyOom · 13h ago

Shocking how poor frontier models perform on simple visual tasks. Best-in-domain tool calling will Become the norm

coolsank · 13h ago

Very interesting. Document redaction is definitely a great use case. Gotta check this out

slake · 5h ago

Noiceee!

mafangchang · 13h ago

Impressive!