Show HN: Annotate any document and train extraction by example, not prompts

2 avloss11 0 9/17/2025, 7:26:37 PM deeptagger.com ↗

Hi HN, I built a tool for teaching LLMs how to extract structured data from documents by annotating, not prompt engineering. I’d love your feedback.

How it works: - Upload a document (DOCX, PDF, image, etc.) - Select and tag parts of it (supports nesting, arrays, custom tag structures) - Upload another document → click "predict" → see editable annotations - Amend them and save as a new example - Call the API with a third document → get JSON back

Use cases: - Identify "important clauses" in contracts - Extract "total value" from invoices - Anything subjective, like "healthy ingredients" on a label - Anything objective, like "postcode" or "phone number" You could even tag things like "good rhymes" in a poem — basically anything an LLM can understand.

The key idea: instead of iterating endlessly on prompts (and sometimes regressing), you just iterate on examples. Each example improves accuracy in a concrete way, and you often need far fewer than traditional ML approaches.

We’re also on Product Hunt today (currently #5), but feedback from HN is very appreciated.

Show HN: Keplar – Voice AI for qualitative research at quantitative scale (keplar.io)

Ask HN: What is the most useful AI tool you use outside of Cursor/ChatGPT?

Show HN: A new platform similar to codepen.io (crazycontext.com)

Firefox 143 Released (firefox.com)

Attack on SonicWall's cloud portal exposes customers' firewall configurations (cyberscoop.com)

Tutorial: From drone video to photo-realistic 3D scene:Blender Conference 2025 (youtube.com)

Cosmonauts Algorithm: Soviet Scientists Invented Predictive Health Monitoring (valeman.medium.com)

Image-GS: Content-Adaptive Image Representation via 2D Gaussians (github.com)

Wild Performance Tricks in Rust (davidlattimore.github.io)

Claude Code Degradation: A postmortem of three recent issues (anthropic.com)

Integrating ClickHouse into your React app (clickhouse.com)

Build AI Voice Mobile Apps (bravostudio.app)

Government gestures leave roots of Indonesia protests intact (japantimes.co.jp)

A better future for JavaScript that won't happen (drewdevault.com)

Is AI a Bubble? [Exponential View] (exponentialview.co)

Building a "Yogurt Phone" with eBPF and netkit: a surprisingly deep rabbit hole (blog.yadutaf.fr)

Rethink Robotics shuts down again (therobotreport.com)

Uber Kraken is back? New releases (github.com)

They tested video games in the nineties (spillhistorie.no)

Trade Joe's Does Not Have Surveillance Cameras (dan.bulwinkle.net)

Show HN: Pgmcp, an MCP server to query any Postgres database in natural language (github.com)

Show HN: Create Vector Graphics with AI (svg.new)

Steering Committee Retrospective (haskellforall.com)

As U.S. jobs disappear, the Federal Reserve returns to rate cuts (cnbc.com)

Dawn of the Self-Building A.I (puck.news)

Cooklang – A Recipe Markup Language (cooklang.org)

Show HN: MeldSecurity – Run Popular Security Tools in the Browser (Free) (meldsecurity.com)

Protein Synthesis (1971) [video] (youtube.com)

Total Epistemic Divorce (murmurationstwo.substack.com)

Chinese dissident who led pro-democracy group in NYC guilty of spying for CCP (apnews.com)

Why Things Go to Shit (everythingisbullshit.blog)

Quantum canvases: a "painted debate" on free will (nirvanicai.substack.com)

Google and PayPal Forge Multiyear Partnership to Revolutionize Commerce (googlecloudpresscorner.com)

Omarchy v3.0.0 Release (github.com)

CEOs of Discord, Steam, Twitch, Reddit Invited to Testify on User Radicalization (oversight.house.gov)

Scientists Replay Movie Encoded in DNA (nimh.nih.gov)

Gluon: a GPU programming language based on the same compiler stack as Triton (github.com)

Clever Hans, a horse that appeared to perform arithmetic (en.wikipedia.org)

Chimpanzees consume equivalent of a beer a day in alcohol from fermented fruit (theguardian.com)

Stringzilla v4 Introduces 500 GigaCUPS Edit Distance on GPUs (ashvardanian.com)

Google Researchers Warn of Looming AI-Run Economies (decrypt.co)

Elon Musk's xAI lays off workers tasked with training Grok (businessinsider.com)

GitHub Copilot is not updating code in file

Steam, Discord, Twitch, Reddit to testify before Congress over 'radicalization' (polygon.com)

PSF Board Election Results for 2025 (discuss.python.org)

Show HN: BestLanding – Squeeze More Signups from Your Traffic (bestland.ing)

Some air cleaners release harmful by-products. Now we have a way to measure them (phys.org)

Silicon Valley's Doing Hard Things Again [video] (youtube.com)

Identifying and Preventing Fraudulent Engineering Candidates: An Investigation (socket.dev)

Israeli spies control your VPN and Social Media (mronline.org)

Show HN: Annotate any document and train extraction by example, not prompts

Comments (0)