ClawPDF – Open-Source Virtual/Network PDF Printer with OCR and Image Support

88 miles 12 5/19/2025, 12:31:33 PM github.com ↗

Comments (12)

sirjaz · 1h ago
We need more things like this. I know people don't like Windows Server because it is not open source, but it is simple to use and get up and running. Also, user management is easy.
sowbug · 30m ago
OT: someone please make a RPi image that "prints" a page to an eink display. I want to duct-tape an RPi Zero and a rechargeable battery to the back of a display, then be able to print recipes to it while cooking. Other people might print board-game rules or speech notes while rehearsing -- anything that you'd typically print and then throw away after brief usage.

I know I could make a PDF, sideload it to a Kindle, etc. Too many steps. I just want the display to appear as a printer on my phone.

xrendan · 23m ago
I have some really old code that pretty much does this, I'll see if I can find it.
xrendan · 15m ago
Ugh, I don't have it. It was from before I used git.

Basically to do this you have a cups server that exposes itself as a network printer that prints to a specified PDF directory and then you have a program watching that directory for new files and if there's a new one it opens up whatever pdf viewer you want in full screen.

Setup a shared pdf printer: https://askubuntu.com/questions/1310867/how-to-set-up-shared...

IlikeKitties · 26m ago
Sounds pretty vibe codable, why don't you try it yourself?
hoistbypetard · 1h ago
That looks really useful.

But, also, wow! Windows-only and AGPLv3 is not a combination I think I've ever seen before.

criddell · 2h ago
Why use Tesseract for this? Windows' built-in OCR is so much better in my experience.
jeroenhd · 1h ago
Microsoft's OCR engine supports Windows 10.0.10240.0 and up. This project intends to support Windows 7 and up.

In theory you could maintain code paths for both, offering a slimmer package for Windows 10+, but that'd also cost more time and effort to maintain.

Also, not many people know Windows comes with an OCR API. It's extremely underused in my opinion.

atmanactive · 1h ago
Windows OCR is used by PowerToys.

https://github.com/microsoft/PowerToys

skeeter2020 · 1h ago
I suspect because of the vintage of this project. This is built on .net Framework 4.x, hence windows only.

edit: and goes deep into COM for device interfaces. Wow! blast from the past.

Oras · 1h ago
Yeah, tesseract has lots of issues especially identifying tables
kittikitti · 17m ago
This is an incredible idea! I really like it because it sounds so obvious after being exposed to it but I never thought of it before! I wonder what other ways we could integrate GPT's, LLM's, and other AI into the simple "Print" functionality across all our devices.