Ask HN: AI voice systems that no longer recognize touch tone input?

I’ve been trying to convert a dense 60 page paper document to Markdown today from photos taken on my iPhone. I know this is probably not the best way to do it but it’s still been surprising to find that even the latest cloud models are struggling to process many of the pages. Lots of hallucination and “I can’t see the text” (when the photo is perfectly clear). Lots of retrying different models, switching between LLMs and old fashioned OCR, reading and correcting mistakes myself. It’s still faster than doing the whole transcription manually but I thought the tech was further along.

bugglebeetle · 5m ago

Try this:

https://github.com/rednote-hilab/dots.ocr

HocusLocus · 37m ago

By 1990 Omnipage 3 and its successors were 'good enough' and with their compact dictionaries and letter form recognition were miracles of their time at ~300MB installed.

In 2025 LLMs can 'fake it' using Trilobites of memory and Petaflops. It's funny actually, like a supercomputer being emulated in real time on a really fast Jacquard loom. By 2027 even simple hand held calculator addition will be billed in kilowatt-hours.

fcoury · 10m ago

I really wanted this to be good. Unfortunately it converted a page that contained a table that is usually very hard for converters to properly convert and I got a full page with "! Picture 1:" and nothing else. On top of that, it hung at page 17 of a 25 page document and never resumed.

nawazgafar · 7m ago

Author here, that sucks. I'd love to recreate this locally. Would you be willing to share the PDF?

treetalker · 23m ago

I presume this doesn't handle handwriting.

Does anyone have a suggestion for locally converting PDFs of handwriting into text, say on a recent Mac? Use case would be converting handwritten journals and daily note-taking.

nawazgafar · 14m ago

Author here, I tested it with this PDF of a handwritten doc [1], and it converted both pages accurately.

1. https://github.com/pnshiralkar/text-to-handwriting/blob/mast...

password4321 · 13m ago

I don't know re: handwriting so only barely relevant but here is a new contender for a CLI "OCR Tool using Apple's Vision Framework API": https://github.com/riddleling/macocr

ntnsndr · 19m ago

+1. I have tried a bunch of local models (albeit the smaller end, b/c hardware limits), and I can't get handwriting recognition yet. But online Gemini and Claude do great. Hoping the local models catch up soon, as this is a wonderful LLM use case.

UPDATE: I just tried this with the default model on handwriting, and IT WORKED. Took about 5-10 minutes on my laptop, but it worked. I am so thrilled not to have to send my personal jottings into the cloud!

simonw · 16m ago

This one should handle handwriting - it's using Qwen 2.5 VL which is a vision LLM that is very good at handwritten text.

abnry · 27m ago

I would really like a tool to reliably get the title of PDF. It is not as easy as it seems. If the PDF exists online (say a paper or course notes) a bonus would be to find that or related metadata.

s0rce · 25m ago

Zotero does an ok job at this for papers.

roscas · 1h ago

Almost perfect, the PDF I tested it missed only a few symbols.

But that is something I will use for sure. Thank you.

nawazgafar · 19m ago

Glad to hear it! What types of symbols did it miss?

david_draco · 1h ago

Looking at the code, this converts PDF pages to images, then transcribes each image. I might have expected a pdftotext post-processor. The complexity of PDF I guess ...

firesteelrain · 1h ago

There is a very popular Python module called ocrmypdf. I used it to help my HOA and OCR’ing of old PDFs.

https://github.com/ocrmypdf/OCRmyPDF

No LLMs required.

enjaydee · 25m ago

Saw this tweet the other day that helped me understand just how crazy PDF parsing can be

https://threadreaderapp.com/thread/1955355127818358929.html

moritonal · 41m ago

I imagine part of the issue is how many PDFs are just a series of images anyway.

westurner · 49m ago

Shell: GNU parallel, pdftotext

Python: PyPdf2, PdfMiner.six, Grobid, PyMuPdf; pytesseract (C++)

paperetl is built on grobid: https://github.com/neuml/paperetl

annotateai: https://github.com/neuml/annotateai :

> annotateai automatically annotates papers using Large Language Models (LLMs). While LLMs can summarize papers, search papers and build generative text about papers, this project focuses on providing human readers with context as they read.

pdf.js-hypothes.is: https://github.com/hypothesis/pdf.js-hypothes.is:

> This is a copy of Mozilla's PDF.js viewer with Hypothesis annotation tools added

Hypothesis is built on the W3C Web Annotations spec.

dokieli implements W3C Web Annotations and many other Linked Data Specs: https://github.com/dokieli/dokieli :

> Implements versioning and has the notion of immutable resources.

> Embedding data blocks, e.g., Turtle, N-Triples, JSON-LD, TriG (Nanopublications).

A dokieli document interface to LLMs would be basically the anti-PDF.

Rust crates: rayon handles parallel processing, pdf-rs, tesseract (C++)

pdf-rs examples/src/bin/extract_page.rs: https://github.com/pdf-rs/pdf/blob/master/examples/src/bin/e...

firesteelrain · 1h ago

Ironically, Ollama likely is using Tesseract under the hood. Python library ocrmypdf uses Tesseract too. https://github.com/ocrmypdf/OCRmyPDF

leodip · 37m ago

Nice! I wonder what is the hardware required to run qwen2.5vl locally. A 6gb 2cpu VPS can do?

wittjeff · 41m ago

Please add a license file. Thanks!

nawazgafar · 20m ago

Will do!