State of the Art with AI: Stay on Top of Research with Personalized Emails (stateoftheartwithai.com)

In college (about 15 years ago) I worked for a professor who was compiling precint level results for old elections. My job was just to request the info and then do manual data entry. It was abysmally slow.

This application seems very good - but still a bit amazing that lawmakers haven't just required that all data be uploaded via csv! Even if every csv was slightly different format, it would be way easier for everyone (LLM or not).

simonw · 3h ago

This is such an excellent example of a responsible and thorough application of vision LLMs to a gnarly data entry problem.

polskibus · 2h ago

It’s also an excellent example on how lack of forced machine-readable format for gov publishing is a PITA.

Mtinie · 35m ago

If I was in power and wanted to continue said rule, I’d definitely discourage the adoption of any standardized formatting for election results.

Not, you know, for any nefarious purpose…but because what we’ve used forever was good enough for grandpappy, so it’s obviously good enough for us.

/cough

sitkack · 2h ago

json to qr code would be a good start. PRIOR ART inb4 a troll.

GardenLetter27 · 1h ago

Why is the original source data not available anywhere digitally?

Since it's printed it is clearly already in a database somewhere. Why can't that just be made public too.

Seems bizarre to OCR printed documents (although I am aware of many companies doing this to parse invoices, etc.)

simonw · 1h ago

Welcome to government data.

One key problem is that the US has tens of thousands of local governments, and each of them get to solve problems in their own way.

Digital literacy of the kind that understands why releasing a CSV file is more valuable than a PDF is rare enough that most of them won't have someone with that level of thinking in a decision making role.

nxrabl · 3h ago

Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?

SnooSux · 3h ago

There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years

dwillis · 1h ago

Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.

benob · 2h ago

I wonder how difficult it would be to bias a model so that it subtly corrupts election results when performing OCR.

croemer · 1h ago

Surely not hard but why?

bilbo0s · 1h ago

Easier to steal elections?

Don't have to bother with gerrymandering, or slick legal ways to arrest people for voting with the wrong documents. Or just good old fashioned intimidation, like making the polling place the police station or the ICE detention facility.

It's just a lot smoother process when you can simply write some software to manipulate the count.

Who's gonna check?

(No, seriously, Who's gonna check? Because you also need to layoff everyone in that department once you're in power.)

simonw · 1h ago

Corrupted OCR won't help you steal elections. The result counting is a different process, with well designed checks and safeguards.

The problem is that once the counts are done and have been reported a lot of places then print those results out on paper and then scan those papers into a PDF for anyone who asks for a copy!

dwillis · 1h ago

Many jurisdictions do rate-limiting audits using the original ballots, so futzing with the results wouldn't necessarily make that easier. Also, cast vote records are public in many states - those are records of each ballot cast. So people can check.

philips · 1h ago

I think you mean risk limiting, right?

bilbo0s · 5m ago

Freudian Slip?

philips · 1h ago

You may consider reading about risk limiting audits. https://www.voting.works/audits

What Are the Future of CMS?

Free Docusign Alternative (useinkless.com)

Who is using AI to code? Global diffusion and impact of generative AI (arxiv.org)

ISM Manufacturing Index (corporatefinanceinstitute.com)

UK Global Talent Visa (relocateme.substack.com)

Show HN: Would You Rent a Robot for Your Home?

Using a space elevator to get water off Ceres (phys.org)

Ghostty is a fast, cross-platform terminal emulator (github.com)

Over 16B records leaked in "unimaginable" major data breach (techradar.com)

Kyutai STT – A speech-to-text optimized for real-time usage (kyutai.org)

Biomarker-driven nutrition: going beyond generic diet advice (empirical.health)

The New Church of Finance (2012) (deseret.com)

DNS at the edge: performance, security, and strategic advantage (axonshield.com)

David Lynch explains Transcendental Meditation [video] (2015) (youtube.com)

State of the Art with AI: Stay on Top of Research with Personalized Emails (stateoftheartwithai.com)

Show HN: LiteChat: A t3.chat cloneathon competitor [video] (youtube.com)

Akamai Web Application Firewall – How It Works (axonshield.com)

Data Science Weekly – Issue 604 (datascienceweekly.substack.com)

The day Steve Jobs dissed me in a keynote (2010) (sive.rs)

Deep Dive into DNS: Super Smart Address Book (axonshield.com)

Humans Need Not Apply (2015) [video] (youtube.com)

Seeking Jira Survivors (atono.io)

Show HN: Provide a short description, get animated character with deep backstory (ki-storygen.com)

Show HN: Raindrop Deep Search: Deep Research for Your Production AI Data (raindrop.ai)

Show HN: I vibe coded a Gnutella P2P client to learn how it works (github.com)

Show HN: Voice AI Practice Scenarios for PM Interviews (toughtongueai.com)

How to Do Open Source and Right to Repair Advocacy Successfully (medium.com)

Cognitive Assessments for Autism: The Best Tools and Methods (neurolaunch.com)

Intelligence on Tap: Redefining the Human Role (medium.com)

Washington Startup Named Finalist for 2025 World of Wipes Innovation Award (mycookwarecare.com)

BF16 and Image Generation Models (engineering.drawthings.ai)

'A bundle of microscopic tornadoes' may have given the universe its structure (livescience.com)

Ask HN: What are the most popular uses of LLMs (other than code/image gen)?

The Less Humble Programmer (2023) (digitalhumanities.org)

Replit just changed their Pricing...

The 16B-record data breach that no one's ever heard of (cybernews.com)

Easy to use, protected and tracked artifact delivery (kagehq.com)

With AI, we all feel like "10x developers"

Another win for EU users? Ads in WhatsApp won't be coming this year (neowin.net)

Elon Musk: Digital Superintelligence, Multiplanetary Life, Being Useful [video] (youtube.com)

Denisovan mitochondrial DNA from dental of the >146k-year-old Harbin cranium (cell.com)

Trying Out Wayland in 2025 (tyil.nl)

Show HN: Clarabase – Managed REST APIs from a Single JSON Schema in Seconds

Indicators of Global Climate Change 2024: annual update of key indicators (igcc.earth)

Change your Google password now, 16B login records were recently exposed (androidpolice.com)

Cardiovascular risk associated with the use of cannabis and cannabinoid (heart.bmj.com)

Show HN: I built an app to explain why chess moves are good or bad (app.chesscoach.dev)

Temperatures pass 32C as first UK area enters heatwave (bbc.com)

Estrogen: A Trip Report (smoothbrains.net)

Hydronuclear Testing (computer.rip)

How OpenElections uses LLMs

Comments (18)