Revolutionary Model Reveals How Real Universe Structure Affects Cosmic Evolution (universetoday.com)

Really looking for something we can run locally in terms of OCR LLM, I think a lot of people doing a lot of OCR and document extraction aren’t looking to upload every file into the cloud and the use is more narrow than typing into a chatbot.

While Gemini is nice, it would be nice to have a pipeline that works locally on a reasonably RAM’d unified memory Mac or Framework AMD board.

eithed · 1h ago

OCRs don't hallucinate outputs = if it says "212.99mm" on architecture diagram it doesn't suddenly turn into "2413m" on the other end, because LLM thought this feels better. I remember reading on HN where that was happening in a such case (but sadly my google foo fails me to find a link)

strangecasts · 26m ago

The case you might be thinking of is the JBIG2 implementation bug [1, 2] in Xerox photocopiers where the pattern-matching would incorrectly treat certain characters as interchangeable, leading to numbers getting rewritten in spreadsheets.

[1] https://www.bbc.com/news/technology-23588202

[2] https://www.dkriesel.com/en/blog/2013/0810_xerox_investigati...

eithed · 8m ago

That's exactly it! Thank you!

endymion-light · 1h ago

I don't mind people doing blog-posts advertising they're own companies - but I feel like i'd like a little bit more substance within this topic. It is interesting in a way, I find I turn to things like gemini 2.5 within simple OCR/NLP and now more substantial image editing than specific models.

I think that's more because of the current state of the industry, a lot of those models are either internal, paywall locked or annoying to use. I don't want to waste effort in trying to sign up for a 4 week trail of X service to perform a one off task.

Unfortunately, this post didn't really elucidate or go into an interesting topic within this space.

I'm not expecting a research paper, but it would be great to get some stats, graphs, examples and meat on the bones. I opened this up expecting some actual examples of problems within OCR & NLP and showing how X multi-modal model solves them.

behnamoh · 1h ago

This is a nothing burger blog post that likely made it to the front page because it mentions "LLM" in the title. Worse yet, it's an ad actually.

OtherShrezzing · 1h ago

The first thing I do on HN posts with lots of upvotes and few comments is scroll to the bottom and check if the closing paragraph has a link to some saas product. If it does, I close the tab.

thaeli · 1h ago

Ironically, this check would be a pretty good use for a LLM.

WesleyLivesay · 1h ago

You beat me to this comment, but you are absolutely correct.

Tractor8626 · 1h ago

OCR doesn't have prompt injection problem

mattigames · 1h ago

It's only prompt injection if it comes from state sponsored hackers, otherwise it's just surprise prompt augmentation.

tiahura · 1h ago

"I still believe that processing documents will be a solved problem in a couple years time."

Current 80/20-rule-ignoring AI dogma in a nutshell.

tovej · 1h ago

Are LLMs not NLP? They process natural language, no?

And I assume the multimodal tools still use OCR for text extraction, or am I missing something?

My understanding is that they're still doing OCR+NLP, just differently than traditional approaches.

universesquid · 3m ago

1.) technically yes, most models used for that task are NLP but not LLMs in the modern sense though 2.) Actually they don't. Multimodal LLMs parse PDFs by taking multiple screenshots on each page.

Bidirectional Signals from the Emitter's Perspective in PHP (medium.com)

Microsoft Word now automatically saves new documents to the cloud (theverge.com)

Krea announces a real-time video model (twitter.com)

Google Labs Stax – End-to-end eval made simple (developers.googleblog.com)

Deep dive into Kubernetes admission control (labs.iximiuz.com)

Revolutionary Model Reveals How Real Universe Structure Affects Cosmic Evolution (universetoday.com)

Show HN: cc-hooks-ts – A type-safe, extendable hook builder for Claude Code (github.com)

A.I. Is Coming for Culture (newyorker.com)

TransUnion suffers data breach impacting over 4.4M people (bleepingcomputer.com)

Show HN: An Open-Source Eval Suite That Helps You Fix Postgres-Based Text-to-SQL (tigerdata.com)

Show HN: Devplan – Generate specs and coding prompts with deep context (devplan.com)

Bash Prompts Collection (gilesorr.com)

Show HN: Grammit – Local-only AI grammar checker (Chrome extension) (chromewebstore.google.com)

The Cause Of, and the Solution To, All Your Team's Problems (worksonmymachine.ai)

Talk to Me Human (talktomehuman.com)

Built a tiny site that formats Markdown for Substack (md-to-substack.netlify.app)

Show HN: Welcome to "Voice AI Stack" Weekly – A Home for Voice AI Builders (videosdkweekly.substack.com)

Rainer Weiss, physicist who forged new understanding of universe, dies at 92 (news.mit.edu)

Show HN: An embeddable log viewer for AWS, Sentry (benchwrk.com)

Solo founders are battling Silicon Valley's biggest bias (sajithpai.com)

Theia AI framework puts you in charge of AI inside your IDE (developer-tech.com)

AI vs. Technical Debt: Is this a race to the bottom? (deepdocs.dev)

Optimising for maintainability – Gleam in production at Strand (gleam.run)

What's an old AI model worth? (robinsloan.com)

Show HN: Dolpo – A VS Code Pomodoro Timer with Brown Noise (marketplace.visualstudio.com)

The Internet Was a Place (gabydelvalle.substack.com)

Show HN: I used nano banana to create art – in just a few seconds (pencilart.app)

Show HN: I cut my Claude API bill by 66% with Git-based context (shadowgit.com)

Ask HN: ChatSherlock – Looking for Technical Validation (chatsherlock.com)

Sharp X68000 – The Japanese Amiga Alternative (everythingamiga.com)

Endianness: The How and the Why (ntietz.com)

SK Telecom fined $97M after schoolkid security blunders let attackers run riot (theregister.com)

Pascal? On My Arduino? It's More Likely Than You Think (hackaday.com)

Reading for Fun Is Plummeting in the US, and Experts Are Concerned (sciencealert.com)

The Edible Schoolyard (smallfarmersjournal.com)

Mistrusted Advisor: Evading Detection, Public S3 Buckets, AWS Data Exfiltration (fogsecurity.io)

Using AWS Certificate Manager as a covert exfiltration mechanism (me.costaskou.com)

Show HN: HTML Commenter (alexispurslane.github.io)

Anything can be a message queue if you use it wrongly enough (xeiaso.net)

Fast frame-consistent video models as "world" models (krea.ai)

The Stock Market Is Selling the Fed's Independence Because ZIRP Broke the World (splinter.com)

Software 3.0 Is the Era of Architects: Building in the Era of Orchestration (opuslabs.substack.com)

ReSharper's New Out-of-Process Engine Cuts UI Freezes in Visual Studio by 80% (blog.jetbrains.com)

Change of Status for Palisades (neimagazine.com)

Nvidia Forecasts Decelerating Growth After Two-Year AI Boom (bloomberg.com)

The $10T AI Revolution: Why It's Bigger Than the Industrial Revolution [video] (youtube.com)

Republicans in Congress open probe into Wikipedia for alleged bias (usatoday.com)

How to Argue with an AI Booster (wheresyoured.at)

The largest sand battery just went live in Finland (newatlas.com)

An abiding mystery of the French Revolution is solved – by epidemiology (nature.com)

LLMs solving problems OCR+NLP couldn't

Comments (14)