Show HN: I built an AI that uses a metacognitive loop 2 solve invention problems (robw1se.substack.com)

> The blind and visually impaired people advocating for this have been conditioned to believe that technology will solve all accessibility problems because, simply put, humans won’t do it.

Technology is not just sprouting out of the ground out of its own. It is humans who are making it. Therefore if technology is helpful it was humans who helped.

> Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

> I fully predict that blind people will be advocating to make actual LLM platforms accessible

Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

> I also predict web accessibility will actually get worse, not better, as coding models will spit out inaccessible code that developers won’t check or won’t even care to check.

Who knows. Either that, or some pages will become more accessible because the effort of making it accessible will be less on the part of the devs. It probably will be a mixed bag with a little bit of column A and column B.

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

These are all AI. They are all improving leaps and bounds.

> An LLM will always be there, well, until the servers go down

Of course. That is a concern. This is why models you can run yourself are so important. Local models are good for latency and reliability. But even if the model is run on a remote server as long as you control the server you can decide when it becomes shut down.

lxgr · 13h ago

> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly.

> I doubt OCR and even self-driving cars will get any significant advancements.

This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs".

stinkbeetle · 12h ago

> > I fully predict that blind people will be advocating to make actual LLM platforms accessible

> Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

AIs I have used have fairly basic interfaces - input some text or an image and get back some text or an image - is that not something that accessibility tools can already do? Or do they mean something else by "actual LLM platform"? This isn't a rhetorical question, I don't know much about interfaces for the blind.

devinprater · 11h ago

Oh no, cause screen readers are dumb things. If you don't send them an announcement, through live regions or accessibility announcements on Android or iOS, they will not know that a response has been received. So, the user will just sit there and have to tap and tap to see when a response comes in. This is especially frustrating with streaming responses where you're not sure when streaming has completed. Gemini for Android is awful at this when typing to it while using TalkBack. No announcements. Claude on web and Android also do nothing, and on iOS it at least places focus, accidentally I suspect, at the beginning of the response. chatGPT on iOS and web are great; it tells me when a response is being generated and reads it out when it's done. On iOS, it sends each line to VoiceOver as it's being generated. AI companies, and companies in general, need to understand that not all blind people talk to their devices.

simonw · 11h ago

Sounds like I should reverse engineer the ChatGPT web app and see what they're doing.

agos · 10h ago

dang, I was hoping that with the impossibly simple interface chatGPT has and the basically unlimited budget they have, they would have done a bit better for accessibility. shameful

simonw · 12h ago

I've been having trouble figuring out how best to implement a streaming text display interface in a way that's certain to work well with screenreaders.

miki123211 · 9h ago

This really depends on the language.

In some languages, pronunciation(a+b) == pronunciation(a) + pronunciation(b). Polish mostly belongs to this category, for example. For these, it's enough to go token-by-token.

For English, it is not that simple, as e.g. the "uni" in "university" sounds completely different to the "uni" in "uninteresting."

In English, even going word-by-word isn't enough, as words like "read" or "live" have multiple pronunciations, and speech synthesizers rely on the surrounding context to choose which one to use. This means you probably need to go by sentence.

Then you have the problem of what to do with code, tables, headings etc. While screen readers can announce roles as you navigate text, they cannot do so when announcing the contents of the live region, so if that's something you want, you'de need to build a micro screen-reader of sorts.

devinprater · 11h ago

If it's command-line based, maybe stream based on lines, or even better, sentences rather than received tokens.

NoahZuniga · 13h ago

Gemini 2.5 has the best vision understanding of any model I've worked with. Leagues beyond gpt5/o4

IanCal · 11h ago

It's hard to overstate this. They perform segmentation and masking and provide information from that to the model and it helps enormously.

Image understanding is still drastically lower than text performance, making glaring mistakes that are hard to understand but gemini 2.5 models are far and away the best in what I've tried.

devinprater · 11h ago

There's a whole tool based on having Gemini 2.5 describe Youtube videos, OmniDescriber.

https://audioses.com/en/yazilimlar.php

johnfn · 10h ago

Interesting -- what sort of things do you use it for?

devinprater · 9h ago

Having Youtube videos described to me, basically. Since Google won't do it.

pineaux · 11h ago

Yeah i made a small app to sell my fathers books. I scanned all the books by making pictures of the book shelves + books (collection of 15k books almost all non-fiction). Then fed them to different AI's. Combining mistralOCR and Gemini worked very very good. I ran all the past both AIs and compared the output per book. Then saved all the output into an SQL for later reference. I did some other stuff with it, then made a document out of the output and sent it to a large group of book buyers. I asked them to bid on individual books and the whole collection.

jibal · 11h ago

> > Let’s not mention the fact the ==> particular <== large language model, LLM called ==> Chat GPT <== they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

giancarlostoro · 11h ago

> Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

Not sure but the Grok avatars or characters, whatever, I've experimented with them, though I hate the defaults that xAI made, because they seem to not be generic simple AI robot or w/e after you tell them to stop flirting and calling you babe (seriously what the heck lol) they can really hold a conversation. I talked to it about a musician I liked, very niche genre of music, and they were able to provide an insanely accurately relatable song from a different artist I did not know, all in real time.

I think it was last year or the year before? They did a demo where they had two phones, one could see, one could not, and the two ChatGPT instances were talking to each other, one was describing the room to the other. I think we are probably there by now to where you can describe a room.

rgoulter · 14h ago

> With an LLM, it will never get annoyed, aggravated, think less of the person, or similar.

Between people, it's extremely commonly considered impolite to request excess help from other people. -- So, having an info retrieval / interactive chat which will patiently answer questions is a boon for everyone.

I guess you can try and frame all 'helping' as "you're +1 if you're being helpful", but don't be surprised if not everyone sees things that way all the time.

adhamsalama · 12h ago

As long as you don't get rate-limited!

y-curious · 12h ago

I guarantee you that rate limits are a thing when you ask non-impaired people for help constantly, too. I'd be taking my chances with AI rate limits for things like "describe this character in detail" and "repeat the minute-long dialog you just delivered"

simonw · 15h ago

The headline is clearly meant to be sarcastic but the actual body of the text seems to indicate that AI back in 2023 was going pretty great for the blind - it mostly reports on others who are enthusiastic adopters of it, despite the author's own misgivings.

PhantomHour · 14h ago

There's a big difference in precisely how the technology is applied.

Transformer models making screen readers better is cool. Companies deciding to fire their human voice actors and replacing all audiobooks with slop is decidedly not cool.

You can really see this happening in translation right now. Companies left and right are firing human translators and replacing their work with slop, and it's a huge step down in quality because AI simply cannot do the previous level of quality. (Mr Chad Gippity isn't going to maintain puns or add notes for references that the new audience won't catch.)

And that's in a market where there is commercial pressure to have quality work. Sloppy AI translations are already hurting sales.

In accessibility, it's a legal checkbox. Companies broadly do not care. It's already nearly impossible to get people to do things like use proper aria metadata. "We're a startup, we gotta go fast, ain't got no time for that".

AI is already being used to provide a legally-sufficient but practically garbage level of accessibility. This is bad.

conradev · 11h ago

Firing voice actors is not great. Replacing human-narrated audio with AI narrated audio is not great.

But the coverage of audiobooks is… also not great? Of the books I've purchased recently, maybe 30% or less have audiobooks? What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?

The copyright holders are incentivized to make money. It does not make financial sense to pay humans to narrate their entire catalog. As long as they're the only ones allowed to distribute derivative works, we're kind of in a pickle.

PhantomHour · 8h ago

> What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?

You weren't doing that before AI either, were you?

The practical answer has already been "you pipe an ebook through a narrator/speech synthesizer program".

> The copyright holders are incentivized to make money.

Regulations exist. It'd be rather trivial to pass a law mandating every ebook sold to be useable with screen readers. There's already laws for websites, albeit a bit poorly enforced.

cpfohl · 12h ago

Boy my experience with small chunks of translation between languages I know well is not the same at all. When prompted properly the translation quality is unbelievable and can absolutely catch nuances, puns, and add footnotes.

That said, I use it with pretty direct prompting, and I strongly prefer the "AI Partners with a Human" model.

PhantomHour · 8h ago

I'm sorry but I simply do not believe you that it does handle things like puns and things requiring footnotes, my experience is that LLMs are miserable at this even when directly "instructed" to.

But for what it concerns my previous comment: It doesn't really matter what the "state of the art" AI is because companies simply do not use that. They just pipe it through the easiest & cheapest models, human review (that does not actually get the time to do meaningful review) optional.

Wowfunhappy · 14h ago

I did not interpret the headline as sarcastic.

simonw · 14h ago

The actual headline is:

  AI is going great for the blind.

That . (not present in the Hacker News posting) made me think it was sarcastic, combined with the author's clear dislike of generative AI.

lxgr · 13h ago

It also pattern matches to "Web3 is going just great", a popular crypto-skeptic blog – not sure if that's intentional.

There seems to be a sizable crowd of cryptocurrency hype critics that have pivoted to criticizing the AI hype (claiming that the hype itself is also largely caused by the same actors, and that accordingly neither crypto nor AI have much object-level merit to them) – ironically and sadly in a quite group-think-heavy way, considering how many valid points of criticism there are to be made of both.

ljlolel · 13h ago

I posted it with the period

Robdel12 · 8h ago

My mom is 100% blind, everything I build I make sure my mom can use it. I remediated visa checkout to save a tier 1 bank contract and got them to WCAG2.0 compatibility. With my “credentials” out do the way…

Ugh, this is what sort of bothers me about the accessibility community. Something about it is always coming off preachy and like a moral argument. This is the worst way to get folks to actually care. You’re just making them feel bad.

Look the fact is everyone needs to use technology to live these days. And us devs suck ass at making those things accessible. Even in the slightest. It won’t be until we all age into needing it when it finally becomes a real issue that’s tackled. Until then, tools like LLMs are going to be amazingly helpful. Posts like this are missing the forest in the trees.

My mom has been using ChatGPT for a ton of things that’s helpful. It’s a massive net positive. The LLM alt tags Facebook added a long time ago, massively helpful. Perfect? Hell no. But we gotta stop acting like these improvements aren’t helpful and aren’t progress. It comes across as whiny. I say this as someone who is in this community.

bsoles · 1h ago

When my employer jumped on the bandwagon and built it's own internal wrapper around ChatGPT, I've tested it with a screen reader using keyboard navigation. And it was terrible. As long as humans don't really care about the disabled (which they really don't), I doubt the AI will solve the problems of visually impaired people.

999900000999 · 15h ago

Even before the LLMs, simple voice assistants have been great for those with limited sight.

I recall speaking to a girl who thanked these voice assistants for helping her order food and cook.

Right now I'm using AI while traveling, it gets stuff 85% right which is enough for lunch.

devinprater · 7h ago

I use AI, as a blind person. I posted a week or so ago a video in which I use TalkBack's image description feature to play a video game that has no accessibility at all. Of course, that was on Android which isn't the most blind-friendly of OS', and iOS doesn't have LLM image descriptions built in, nor good PS2 emulators.

Other blind people are all in with the AI hype, describing themselves as partially sighted because of AI with their Meta Rayban glasses. Sidenote, Rayban glasses report that I died last year. I somehow missed my funeral, sorry to all those who were there without me. I do like brains, though...

Meanwhile many LLM sites are not blind-friendly, like Claude, Perplexity, and there are sites that try but fail so exasperatingly hard that I lose any motivation for filing reports where I can't even begin to explain what's breaking so hard. It's evident that OpenWebUI have not tested their accessibility with a screen reader. Anyway blindness organizations (NFB mainly) have standardized on just use ChatGPT and everything else is the wild west where they absolutely do not care. Gemini could be more accessible, on web and especially Android, but all reports have been ignored so I'm not going to bother with them anymore. It's sad since their AI describes images well. Thank goodness for the API, and tools like [PiccyBot](https://www.piccybot.com/) on iOS/Android and [viewpoint](https://viewpoint.nibblenerds.com/) and [OmniDescriber](https://audioses.com/en/yazilimlar.php) on Windows. I'm still waiting for iOS to catch up to Android in LLM image descriptions built into the screen reader. Meanwhile, at least we have [this shortcut](https://shortcutjar.com/describe/documentation/). It uses GPT 4O but at least it's something. Apple could easily integrate with their own Apple Intelligence to call out to ChatGPT or whatever, but I guess competition has to happen. Or something. Maybe next year lol. In the meantime I'll either use my own cents to get descriptions or share to Seeing AI or something like a cave man.

josefresco · 9h ago

I've been volunteering for Be My Eyes* for a few years now. It's been very rewarding. I get maybe a half dozen calls each year.

I help people with very mundane and human tasks: cooking, gardening, label identification.

*https://www.bemyeyes.com/download-app/

miki123211 · 9h ago

Would you be willing to share what kind of calls you usually get? No personal or sensitive details of course.

Did the volume of calls change meaningfully with the introduction of AI into Be my Eyes?

gostsamo · 15h ago

Thats''s one very angry take on things. As a blind person myself, AI is a net benefit, it has potential, but I also agree that there are lots of people who think that if AI solutions are good enough, there is no need to invest in actually accessible gui-s. The last one is an extremely wrong take, because ai-s always will be the slowest solution which might be badly prompted or just hallucinate. Just today someone was complaining that Gemini's code editor is not fully accessible and was looking for advice there, so I'd give the author points for mentioning that the very ai interface might be unaccessible. Not to mention that often chat web interfaces lack proper aria descriptions for some basic operations.

simne · 3h ago

Could you approximate number of blind persons using AI?

If their number is significant, they could themselves be foundation for some AI business, even if all other consumers will turn away from AI.

For about accessibility of web, must say it is terrible even for non-blind person, but AI could also change this to better.

What I mean, you may hear about PalmPilot organizers, they was very limited in hardware, but existed private company, which provided proxy-browser, which input ordinary web sites and shown on PalmPilot small display optimized version, plus have mode for offline reading. With existing AI now could do much better.

ccgreg · 14h ago

The IETF AI-Preference standard group is currently discussing whether or not to include an example of bypassing AI preferences to support assistive technologies. Oddly enough, many publishers oppose that.

1gn15 · 10h ago

What does bypassing AI preferences mean? Just ignoring them?

ianbicking · 9h ago

Probably ignoring things like robots.txt, I'm guessing? But I'd be curious what exactly the list of things is, and if it's growing. Would it go as far as ChatGPT filling in CAPTCHAs?

autocomplete="off" is an instance of something that user agents willfully ignore based on their own heuristics, and I'm assuming accessibility tools have always ignored a lot of similar things.

ccgreg · 7h ago

A lot of publishers do not care about blind people, and would prefer that they be unable to use AI to read.

stinkbeetle · 12h ago

> While the stuff LLMs is giving us is incorrect information, it’s still information that the sighted world won’t or refuses to give us.

I don't understand what's going on here. He's angry at us horrible sighteds for refusing to give them incorrect information? Or because we refuse to tell them when their LLMs give them incorrect information? Or he thinks that we're refusing to give them correct information which makes it okay that the LLM is giving them incorrect information?

ants_everywhere · 12h ago

It's nearly impossible to think clearly when you're angry.

jibal · 10h ago

> He's angry at us horrible sighteds

It's not about you so no need to be personally offended.

Have a bit of empathy and do a bit of research and it's not hard to understand that accessibility is limited.

stinkbeetle · 2h ago

I'm not offended at all, just trying to understand what was written. What exactly the gripe is and what he wants.

My empathy is not the problem here. Having a disability doesn't give you a free pass to be a bitter asshole.

fwip · 12h ago

I believe they're saying that sometimes-wrong information from a machine is preferable to no information (without machine). At least to some people.

jibal · 10h ago

Obviously.

fwip · 10h ago

I was trying to be gentle. :P

paulsutter · 15h ago

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

Great to read that blind folks get so much benefit from LLMs. But this one quote seemed odd. The most amazing OCR and document attribution products are becoming available due to LLMs

rafram · 13h ago

LLM/VLM-based OCR is highly prone to hallucination - the model does not know when it can’t read a text, it can’t estimate its own confidence, and it deals with fuzzy/unclear texts by simply making things up. I would be very nervous using it for anything critical.

paulsutter · 12h ago

There are really amazing products coming

rafram · 11h ago

I’ll believe it when I see it.

jibal · 10h ago

The article is nearly 2 years old ... people don't have perfect foresight.

ljlolel · 15h ago

This was 2023 so I can only assume it’s gotten even better!

RianAtheer · 11h ago

Amazing to see AI making real impact for accessibility! Tools like this are game-changers for the visually impaired. AI in banking is fascinating, bringing faster fraud detection, personalized services, and cost savings, while also raising concerns around job shifts and bias. The key is partnership, not replacement: humans and AI working together. You can also read about this amazing article "The Rise of AI in Banking: Friend or Foe" on finance gossips

Show HN: Entropy-Guided Loop – How to make small models reason (github.com)

Show HN: TwoTickets – meet through events, not swipes

Show HN: Listgitfiles.sh – Fetch Raw GitHub File URLs with One Command (gist.github.com)

Show HN: A unified approach to compute sandboxes (computesdk.com)

Show HN: LightCycle, a FOSS game in Rust based on Tron (github.com)

Show HN: Trending rust NTP inspection CLI (github.com)

Show HN: Chibi, AI that tells you why users churn (chibi.sh)

Show HN: Moribito – A TUI for LDAP Viewing/Queries (github.com)

Show HN: Amber – better Beeper, a modern all-in-one messenger (useamber.app)

Show HN: Run gpt-oss-20b on 8GB GPUs (github.com)

Show HN: Best JSON Comparison Tool (jsontoolbox.com)

Show HN: Helpme, a CLI tool to look up emergency and non emergency resources (github.com)

Show HN: We built an open-source alternative to expensive pair programming apps (github.com)

Show HN: A vibe port of security library libinjection from C to Rust (github.com)

Show HN: My first Go project, a useless animated bunny sign for your terminal (github.com)

Show HN: VoiceGecko – System-wide voice-to-text that types anywhere (voicegecko.io)

Show HN: Text2SQL with a Graph Semantic Layer (github.com)

Show HN: Davia – A community platform to build, share, and edit applications (docs.davia.ai)

Show HN: Simple modenized .NET NuGet server reached RC (github.com)

Show HN: A hacky app for location sharing without suirvellance (fyrspot.app)

Show HN: Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code (github.com)

Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code (github.com)

Show HN: I built an AI that uses a metacognitive loop 2 solve invention problems (robw1se.substack.com)

Show HN: I built a deep research tool for local file system (github.com)

Show HN: Tail Lens – Visually edit tailwind css dev tool

Show HN: dvcdbg 0.3.0: 1.3KB Initialization Sequence Explorer(Arduino in Rust) (crates.io)

Show HN: Woomarks, transfer your Pocket links to this app or self-host it (woomarks.com)

Show HN: Hacker News em dash user leaderboard pre-ChatGPT (gally.net)

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown (sosumi.ai)

Show HN: Anonymous Age Verification (gist.github.com)

Show HN: CompareGPT – Making LLMs More Trustworthy by Reducing Hallucinations

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts (bilawal.net)

Show HN: slack-explorer-mcp – Let AI find historical context in Slack (github.com)

Show HN: Provably secure vibe coding is now a thing (secureaf.lovable.app)

Show HN: Fst – Lightweight C utility for detailed directory statistics LGPL 3.0

Show HN: Lightweight server-driven template language for JavaScript (github.com)

Show HN: Blueprint: Fast, Nunjucks-like templating engine for Java 8 and beyond

Show HN: Unity WebGL Playground (onejs.com)

Show HN: Find Hidden Gems on HN (pj4533.com)

Show HN: Open-source AI writing your javadoc (deviantabstraction.com)

Show HN: Whodunit – Solve AI written mysteries (whodunit.rip)

Show HN: Promptproof – GitHub Action to test LLM prompts, catch bad JSON schemas (github.com)

Show HN: MCP Secrets Vault – Local MCP proxy to keep API keys out of LLM context (github.com)

Show HN: Neuron – Cognitive Multi-Agent Architecture for Reasoning

Show HN: PasteVault – An open-source, E2EE pastebin with a VS Code-like editor (pastevault.dev)

Show HN: StoryMotion, hand-drawn motion graphics editor based on Excalidraw (storymotion.video)

Show HN: Forward Error Correction for Pion WebRTC (pion.ly)

Show HN: Ruby-TI mruby type analyser (github.com)

Show HN: Zyg – Stop Writing Status Updates (zyg.sh)

Show HN: A founder community with true anonymity(HMAC identities,no socialgraph) (foundermood.zorentia.com)

AI is going great for the blind (2023)

Comments (56)