Tencent Open Sourced a 3D World Model (github.com)

> The blind and visually impaired people advocating for this have been conditioned to believe that technology will solve all accessibility problems because, simply put, humans won’t do it.

Technology is not just sprouting out of the ground out of its own. It is humans who are making it. Therefore if technology is helpful it was humans who helped.

> Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

> I fully predict that blind people will be advocating to make actual LLM platforms accessible

Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

> I also predict web accessibility will actually get worse, not better, as coding models will spit out inaccessible code that developers won’t check or won’t even care to check.

Who knows. Either that, or some pages will become more accessible because the effort of making it accessible will be less on the part of the devs. It probably will be a mixed bag with a little bit of column A and column B.

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

These are all AI. They are all improving leaps and bounds.

> An LLM will always be there, well, until the servers go down

Of course. That is a concern. This is why models you can run yourself are so important. Local models are good for latency and reliability. But even if the model is run on a remote server as long as you control the server you can decide when it becomes shut down.

lxgr · 2h ago

> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly.

> I doubt OCR and even self-driving cars will get any significant advancements.

This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs".

NoahZuniga · 2h ago

Gemini 2.5 has the best vision understanding of any model I've worked with. Leagues beyond gpt5/o4

devinprater · 10m ago

There's a whole tool based on having Gemini 2.5 describe Youtube videos, OmniDescriber.

https://audioses.com/en/yazilimlar.php

IanCal · 36m ago

It's hard to overstate this. They perform segmentation and masking and provide information from that to the model and it helps enormously.

Image understanding is still drastically lower than text performance, making glaring mistakes that are hard to understand but gemini 2.5 models are far and away the best in what I've tried.

pineaux · 9m ago

Yeah i made a small app to sell my fathers books. I scanned all the books by making pictures of the book shelves + books (collection of 15k books almost all non-fiction). Then fed them to different AI's. Combining mistralOCR and Gemini worked very very good. I ran all the past both AIs and compared the output per book. Then saved all the output into an SQL for later reference. I did some other stuff with it, then made a document out of the output and sent it to a large group of book buyers. I asked them to bid on individual books and the whole collection.

stinkbeetle · 1h ago

> > I fully predict that blind people will be advocating to make actual LLM platforms accessible

> Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.

AIs I have used have fairly basic interfaces - input some text or an image and get back some text or an image - is that not something that accessibility tools can already do? Or do they mean something else by "actual LLM platform"? This isn't a rhetorical question, I don't know much about interfaces for the blind.

devinprater · 15m ago

Oh no, cause screen readers are dumb things. If you don't send them an announcement, through live regions or accessibility announcements on Android or iOS, they will not know that a response has been received. So, the user will just sit there and have to tap and tap to see when a response comes in. This is especially frustrating with streaming responses where you're not sure when streaming has completed. Gemini for Android is awful at this when typing to it while using TalkBack. No announcements. Clause on web and Android also do nothing, and on iOS it at least places focus, accidentally I suspect, at the beginning of the response. chatGPT on iOS and web are great; it tells me when a response is being generated in the same of web, and reads it out when its done. On iOS, it sends each line to VoiceOver as it's being generated. AI companies, and companies in general, need to understand that not all blind people talk to their devices.

simonw · 2m ago

Sounds like I should reverse engineer the ChatGPT web app and see what they're doing.

simonw · 1h ago

I've been having trouble figuring out how best to implement a streaming text display interface in a way that's certain to work well with screenreaders.

devinprater · 13m ago

If it's command-line based, maybe stream based on lines, or even better, sentences rather than received tokens.

giancarlostoro · 34m ago

> Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.

Not sure but the Grok avatars or characters, whatever, I've experimented with them, though I hate the defaults that xAI made, because they seem to not be generic simple AI robot or w/e after you tell them to stop flirting and calling you babe (seriously what the heck lol) they can really hold a conversation. I talked to it about a musician I liked, very niche genre of music, and they were able to provide an insanely accurately relatable song from a different artist I did not know, all in real time.

I think it was last year or the year before? They did a demo where they had two phones, one could see, one could not, and the two ChatGPT instances were talking to each other, one was describing the room to the other. I think we are probably there by now to where you can describe a room.

rgoulter · 3h ago

> With an LLM, it will never get annoyed, aggravated, think less of the person, or similar.

Between people, it's extremely commonly considered impolite to request excess help from other people. -- So, having an info retrieval / interactive chat which will patiently answer questions is a boon for everyone.

I guess you can try and frame all 'helping' as "you're +1 if you're being helpful", but don't be surprised if not everyone sees things that way all the time.

adhamsalama · 1h ago

As long as you don't get rate-limited!

y-curious · 53m ago

I guarantee you that rate limits are a thing when you ask non-impaired people for help constantly, too. I'd be taking my chances with AI rate limits for things like "describe this character in detail" and "repeat the minute-long dialog you just delivered"

simonw · 4h ago

The headline is clearly meant to be sarcastic but the actual body of the text seems to indicate that AI back in 2023 was going pretty great for the blind - it mostly reports on others who are enthusiastic adopters of it, despite the author's own misgivings.

PhantomHour · 3h ago

There's a big difference in precisely how the technology is applied.

Transformer models making screen readers better is cool. Companies deciding to fire their human voice actors and replacing all audiobooks with slop is decidedly not cool.

You can really see this happening in translation right now. Companies left and right are firing human translators and replacing their work with slop, and it's a huge step down in quality because AI simply cannot do the previous level of quality. (Mr Chad Gippity isn't going to maintain puns or add notes for references that the new audience won't catch.)

And that's in a market where there is commercial pressure to have quality work. Sloppy AI translations are already hurting sales.

In accessibility, it's a legal checkbox. Companies broadly do not care. It's already nearly impossible to get people to do things like use proper aria metadata. "We're a startup, we gotta go fast, ain't got no time for that".

AI is already being used to provide a legally-sufficient but practically garbage level of accessibility. This is bad.

conradev · 50m ago

Firing voice actors is not great. Replacing human-narrated audio with AI narrated audio is not great.

But the coverage of audiobooks is… also not great? Of the books I've purchased recently, maybe 30% or less have audiobooks? What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?

The copyright holders are incentivized to make money. It does not make financial sense to pay humans to narrate their entire catalog. As long as they're the only ones allowed to distribute derivative works, we're kind of in a pickle.

cpfohl · 58m ago

Boy my experience with small chunks of translation between languages I know well is not the same at all. When prompted properly the translation quality is unbelievable and can absolutely catch nuances, puns, and add footnotes.

That said, I use it with pretty direct prompting, and I strongly prefer the "AI Partners with a Human" model.

Wowfunhappy · 3h ago

I did not interpret the headline as sarcastic.

simonw · 3h ago

The actual headline is:

  AI is going great for the blind.

That . (not present in the Hacker News posting) made me think it was sarcastic, combined with the author's clear dislike of generative AI.

lxgr · 2h ago

It also pattern matches to "Web3 is going just great", a popular crypto-skeptic blog – not sure if that's intentional.

There seems to be a sizable crowd of cryptocurrency hype critics that have pivoted to criticizing the AI hype (claiming that the hype itself is also largely caused by the same actors, and that accordingly neither crypto nor AI have much object-level merit to them) – ironically and sadly in a quite group-think-heavy way, considering how many valid points of criticism there are to be made of both.

ljlolel · 2h ago

I posted it with the period

RianAtheer · 28m ago

Amazing to see AI making real impact for accessibility! Tools like this are game-changers for the visually impaired. AI in banking is fascinating, bringing faster fraud detection, personalized services, and cost savings, while also raising concerns around job shifts and bias. The key is partnership, not replacement: humans and AI working together. You can also read about this amazing article "The Rise of AI in Banking: Friend or Foe" on finance gossips

999900000999 · 3h ago

Even before the LLMs, simple voice assistants have been great for those with limited sight.

I recall speaking to a girl who thanked these voice assistants for helping her order food and cook.

Right now I'm using AI while traveling, it gets stuff 85% right which is enough for lunch.

stinkbeetle · 1h ago

> While the stuff LLMs is giving us is incorrect information, it’s still information that the sighted world won’t or refuses to give us.

I don't understand what's going on here. He's angry at us horrible sighteds for refusing to give them incorrect information? Or because we refuse to tell them when their LLMs give them incorrect information? Or he thinks that we're refusing to give them correct information which makes it okay that the LLM is giving them incorrect information?

ants_everywhere · 1h ago

It's nearly impossible to think clearly when you're angry.

fwip · 53m ago

I believe they're saying that sometimes-wrong information from a machine is preferable to no information (without machine). At least to some people.

ccgreg · 3h ago

The IETF AI-Preference standard group is currently discussing whether or not to include an example of bypassing AI preferences to support assistive technologies. Oddly enough, many publishers oppose that.

gostsamo · 3h ago

Thats''s one very angry take on things. As a blind person myself, AI is a net benefit, it has potential, but I also agree that there are lots of people who think that if AI solutions are good enough, there is no need to invest in actually accessible gui-s. The last one is an extremely wrong take, because ai-s always will be the slowest solution which might be badly prompted or just hallucinate. Just today someone was complaining that Gemini's code editor is not fully accessible and was looking for advice there, so I'd give the author points for mentioning that the very ai interface might be unaccessible. Not to mention that often chat web interfaces lack proper aria descriptions for some basic operations.

paulsutter · 4h ago

> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.

Great to read that blind folks get so much benefit from LLMs. But this one quote seemed odd. The most amazing OCR and document attribution products are becoming available due to LLMs

rafram · 1h ago

LLM/VLM-based OCR is highly prone to hallucination - the model does not know when it can’t read a text, it can’t estimate its own confidence, and it deals with fuzzy/unclear texts by simply making things up. I would be very nervous using it for anything critical.

paulsutter · 1h ago

There are really amazing products coming

rafram · 28m ago

I’ll believe it when I see it.

ljlolel · 4h ago

This was 2023 so I can only assume it’s gotten even better!