A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.
GitHub of the person who prepared the data. I am curious how much compute was needed for NY. I would love to do it for my metro but I suspect it is way beyond my budget.
Not sure how many panoramas there are in New York or your metro, but if it's over the free tier you're talking thousands of dollars.
daemonologist · 16m ago
The linked article mentions that they ingested 8 million panos - even if they're scraping the dynamic viewer that's $30k just in street view API fees (the static image API would probably be at least double that due to the low per-call resolution).
OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.
ks2048 · 28m ago
It says 8 million images. So, 13.2 images/second for one week.
I'm wondering about more the data - did they use Google's API or work with Google to use the data?
A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.
https://github.com/yz3440
It's the Google Maps API costs that will sink your project if you can't get them waived as art:
https://mapsplatform.google.com/pricing/
Not sure how many panoramas there are in New York or your metro, but if it's over the free tier you're talking thousands of dollars.
OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.
I'm wondering about more the data - did they use Google's API or work with Google to use the data?
All Text in NYC - https://news.ycombinator.com/item?id=42367029 - Dec 2024 (4 comments)
All text in Brooklyn - https://news.ycombinator.com/item?id=41344245 - Aug 2024 (50 comments)
Instead shows me thousands of “Rev“
Again, a complex problem and I love it...