GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

This is a super cool project. But it would be 10x cooler if they had generated CLIP or some other embeddings for the images, so you could search for text but also do semantic vector search like "people fighting", "cats and dogs, "red tesla", "clown", "child playing with dog", etc.

jacobajit · 21m ago

I feel like street-view data is surprisingly underused for geospatial intelligence.

With current-gen multimodal LLMs, you could very easily query and plot things like "broken windows," "houses with front-yard fences," "double-parked cars," "faded lane markers," etc. that are difficult to generally derive from other sources.

For any reasonably-sized area, I'd guess the largest bottleneck is actually the Maps API cost vs the LLM inference. And ideally we'd have better GIS products for doing this sort of analysis smoothly.

m_kos · 3h ago

GitHub of the person who prepared the data. I am curious how much compute was needed for NY. I would love to do it for my metro but I suspect it is way beyond my budget.

https://github.com/yz3440

(The commenters below are right. It is the Maps API, not compute, that I should worry about. Using the free tier, it would have taken the author years to download all tiles. I wish I had their budget!)

LeifCarrotson · 2h ago

I would wager the compute for the OCR is cheap. Just get a beefy local desktop PC, if it runs overnight or even takes a week that's fine.

It's the Google Maps API costs that will sink your project if you can't get them waived as art:

https://mapsplatform.google.com/pricing/

Not sure how many panoramas there are in New York or your metro, but if it's over the free tier you're talking thousands of dollars.

daemonologist · 2h ago

The linked article mentions that they ingested 8 million panos - even if they're scraping the dynamic viewer that's $30k just in street view API fees (the static image API would probably be at least double that due to the low per-call resolution).

OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.

ks2048 · 2h ago

It says 8 million images. So, 13.2 images/second for one week.

I'm wondering about more the data - did they use Google's API or work with Google to use the data?

jjwiseman · 28m ago

The creator gave a talk that has more details on how it was done: https://www.youtube.com/watch?v=gfODe92DzLU

IIRC he found a way to download streetview images without paying, and used the OCR built-in to macOS (which is really good).

wilson090 · 2h ago

This would probably make John Wilson's job a lot easier (https://en.wikipedia.org/wiki/How_To_with_John_Wilson)

ninju · 1h ago

There's a lot of PIZZA in New York City!

dang · 3h ago

Related. Others?

All Text in NYC - https://news.ycombinator.com/item?id=42367029 - Dec 2024 (4 comments)

All text in Brooklyn - https://news.ycombinator.com/item?id=41344245 - Aug 2024 (50 comments)

daemonologist · 1h ago

This is exceedingly fun.

A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.

cmwelsh · 37m ago

First lucky try, “calisthenics” scores a verified 1. It would be interesting if there was a Parquet file of the raw data.

https://www.alltext.nyc/search?q=Calisthenics

tkgally · 33m ago

“perplexed” gets one hit. It appears in a Bible quotation on an abortion rights poster on West 77th Street. Someone is sleeping beneath the poster:

https://www.alltext.nyc/search?q=perplexed

koolba · 1h ago

One match: https://www.alltext.nyc/search?q=Buxom

Benjammer · 1h ago

I found "intertwining" with a score of 3 also. Two instances of the word on the same sign and then a false positive third pic.

WorldPeas · 3h ago

hah, it can find all the KEST GAK stickers now: https://www.alltext.nyc/search?q=kest

adrianparsons · 2h ago

https://www.alltext.nyc/search?q=ana+peru

JackFr · 3h ago

Can’t find me any REVS tags. https://en.m.wikipedia.org/wiki/Revs_(graffiti_artist)

Instead shows me thousands of “Rev“

lildvlpr · 2h ago

I immediately looked up "Blob Dylan"

dumbfounder · 2h ago

Search for “fart” if you want a good laugh.

IncRnd · 1h ago

This is pretty cool! I'm curious what was used for OCR? Amazon Mechanical Burp?

tills13 · 3h ago

I _love_ this but it's pretty bad. I searched for "Morgue" and one of the matches was the "2025 Google" watermark which it thought was "Big Morgue"

Again, a complex problem and I love it...

cobbzilla · 2h ago

Searching for “foo” is humorous, it’s mostly restaurants with signs that say “food” but the “d” is cropped.

zxh · 15m ago

When you search 'google'... you'll see... lol

shibeprime · 2h ago

520 matches on "hotdog" 8084 matches on "massage" in no particular order

IAmGraydon · 3h ago

As others have mentioned, the idea is so cool, but the text recognition is abysmal.

lelandfe · 1h ago

It worked perfectly on the two tests I tried: the GSA building in SoHo, and BKLYN Blend in Bedstuy.

ya1sec · 2h ago

amazing. look up some graffiti writers you know

8bitsrule · 1h ago

Gosh! Maybe one of these days someone will take time off from this cultural wonderment to construct a simple, easy to use, text-to-audio.file program - you know, install, paste in some text, convert, start-up a player - so that the blind can listen to texts that aren't recorded in audiobooks. Without a CS degree.

repeekad · 41m ago

I think the issue is the compute power needed for good voice models is far from free just in hardware and electricity, so any good text to audio solution likely needs to cost some money. Wiring up Google vertex AI text to speech or the aws equivalent is probably something chat gpt could walk most people through even without a CS degree, a simple python script you could authenticate from a terminal command, and would maybe cost a couple bucks for personal usage

A service you can pay for of that simplicity probably doesn’t exist because there are other tools that integrate better with how the blind interact with computers, I doubt it’s copy and pasting text, and those tools are likely more robust albeit expensive

egypturnash · 2h ago

I typed in "fart" and none of the results on the first page were actually the word "fart".

dumbfounder · 2h ago

I also did this. But I wasn’t mad, I was amused.

brentm · 2h ago

Pretty cool

theodric · 3h ago

Cool concept, but the accuracy seems quite low. The hits for "pedo" are pretty hilarious, though! https://www.alltext.nyc/search?q=pedo&p=2

GPT-5 (openai.com)

Fight Chat Control (fightchatcontrol.eu)

GitHub is no longer independent at Microsoft after CEO resignation (theverge.com)

I tried every todo app and ended up with a .txt file (al3rez.com)

Ultrathin business card runs a fluid simulation (github.com)

I want everything local – Building my offline AI workspace (instavm.io)

Wikipedia loses challenge against Online Safety Act (bbc.com)

Emailing a one-time code is worse than passwords (blog.danielh.cc)

Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

Debian 13 “Trixie” (debian.org)

Vibechart (vibechart.net)

Claude Code is all you need (dwyer.co.za)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

How I code with AI on a budget/free (wuu73.org)

Try and (ygdp.yale.edu)

GPT-5: Key characteristics, pricing and system card (simonwillison.net)

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

OpenFreeMap survived 100k requests per second (blog.hyperknot.com)

Jim Lovell, Apollo 13 commander, has died (nasa.gov)

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Historical Tech Tree (historicaltechtree.com)

Cursed Knowledge (immich.app)

The Chrome VRP Panel has decided to award $250k for this report (issues.chromium.org)

Meta Leaks Part 1: Israel and Meta (archive.org)

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

Flipper Zero dark web firmware bypasses rolling code security (rtl-sdr.com)

Getting good results from Claude Code (dzombak.com)

The Framework Desktop is a beast (world.hey.com)

GPT-5 for Developers (openai.com)

Monero appears to be in the midst of a successful 51% attack (twitter.com)

Linear sent me down a local-first rabbit hole (bytemash.net)

StarDict sends X11 clipboard to remote servers (lwn.net)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

OpenSSH Post-Quantum Cryptography (openssh.com)

Trump Orders National Guard to Washington and Takeover of Capital’s Police (nytimes.com)

Why are there so many rationalist cults? (asteriskmag.com)

Vanishing from Hyundai’s data network (techno-fandom.org)

The surprise deprecation of GPT-4o for ChatGPT consumers (simonwillison.net)

My Lethal Trifecta talk at the Bay Area AI Security Meetup (simonwillison.net)

Windows XP Professional (win32.run)

MCP overlooks hard-won lessons from distributed systems (julsimon.medium.com)

Tor: How a military project became a lifeline for privacy (thereader.mitpress.mit.edu)

OpenAI's new open-source model is basically Phi-5 (seangoedecke.com)

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models [pdf] (arxiv.org)

Exit Tax: Leave Germany before your business gets big (eidel.io)

Show HN: Building a web search engine from scratch with 3B neural embeddings (blog.wilsonl.in)

Building Bluesky comments for my blog (natalie.sh)

Cursor CLI (cursor.com)

Project Hyperion: Interstellar ship design competition (projecthyperion.org)

Long-term exposure to outdoor air pollution linked to increased risk of dementia (cam.ac.uk)

Search all text in New York City

Comments (37)