Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

GitHub of the person who prepared the data. I am curious how much compute was needed for NY. I would love to do it for my metro but I suspect it is way beyond my budget.

https://github.com/yz3440

(The commenters below are right. It is the Maps API, not compute, that I should worry about. Using the free tier, it would have taken the author years to download all tiles. I wish I had their budget!)

LeifCarrotson · 1h ago

I would wager the compute for the OCR is cheap. Just get a beefy local desktop PC, if it runs overnight or even takes a week that's fine.

It's the Google Maps API costs that will sink your project if you can't get them waived as art:

https://mapsplatform.google.com/pricing/

Not sure how many panoramas there are in New York or your metro, but if it's over the free tier you're talking thousands of dollars.

daemonologist · 1h ago

The linked article mentions that they ingested 8 million panos - even if they're scraping the dynamic viewer that's $30k just in street view API fees (the static image API would probably be at least double that due to the low per-call resolution).

OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.

ks2048 · 1h ago

It says 8 million images. So, 13.2 images/second for one week.

I'm wondering about more the data - did they use Google's API or work with Google to use the data?

dang · 2h ago

Related. Others?

All Text in NYC - https://news.ycombinator.com/item?id=42367029 - Dec 2024 (4 comments)

All text in Brooklyn - https://news.ycombinator.com/item?id=41344245 - Aug 2024 (50 comments)

ninju · 13m ago

There's a lot of PIZZA in New York City!

dumbfounder · 57m ago

Search for “fart” if you want a good laugh.

daemonologist · 48m ago

This is exceedingly fun.

A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.

koolba · 18m ago

One match: https://www.alltext.nyc/search?q=Buxom

Benjammer · 38m ago

I found "intertwining" with a score of 3 also. Two instances of the word on the same sign and then a false positive third pic.

wilson090 · 1h ago

This would probably make John Wilson's job a lot easier (https://en.wikipedia.org/wiki/How_To_with_John_Wilson)

WorldPeas · 2h ago

hah, it can find all the KEST GAK stickers now: https://www.alltext.nyc/search?q=kest

adrianparsons · 1h ago

https://www.alltext.nyc/search?q=ana+peru

JackFr · 2h ago

Can’t find me any REVS tags. https://en.m.wikipedia.org/wiki/Revs_(graffiti_artist)

Instead shows me thousands of “Rev“

lildvlpr · 1h ago

I immediately looked up "Blob Dylan"

IncRnd · 37m ago

This is pretty cool! I'm curious what was used for OCR? Amazon Mechanical Burp?

cobbzilla · 1h ago

Searching for “foo” is humorous, it’s mostly restaurants with signs that say “food” but the “d” is cropped.

tills13 · 2h ago

I _love_ this but it's pretty bad. I searched for "Morgue" and one of the matches was the "2025 Google" watermark which it thought was "Big Morgue"

Again, a complex problem and I love it...

egypturnash · 1h ago

I typed in "fart" and none of the results on the first page were actually the word "fart".

dumbfounder · 56m ago

I also did this. But I wasn’t mad, I was amused.

shibeprime · 1h ago

520 matches on "hotdog" 8084 matches on "massage" in no particular order

8bitsrule · 37m ago

Gosh! Maybe one of these days someone will take time off from this cultural wonderment to construct a simple, easy to use, text-to-audio.file program - you know, install, paste in some text, convert, start-up a player - so that the blind can listen to texts that aren't recorded in audiobooks. Without a CS degree.

ya1sec · 1h ago

amazing. look up some graffiti writers you know

IAmGraydon · 2h ago

As others have mentioned, the idea is so cool, but the text recognition is abysmal.

lelandfe · 40m ago

It worked perfectly on the two tests I tried: the GSA building in SoHo, and BKLYN Blend in Bedstuy.

brentm · 1h ago

Pretty cool

theodric · 2h ago

Cool concept, but the accuracy seems quite low. The hits for "pedo" are pretty hilarious, though! https://www.alltext.nyc/search?q=pedo&p=2