Project West Ford: Cold War Plan to Solve Radio Problems with 480M Space Needles (multiverseemployeehandbook.com)

As a founder working with financial models, I was spending way too much time copy-pasting numbers from PDF balance sheets and income statements into Excel.

So I built Assess Finance — a tool that extracts financial data from any PDF (even scanned ones), and automatically generates clean, standardized:

Income Statements

Balance Sheets

Cash Flow Reports

It’s fast, works with multi-year reports, and exports to Excel/CSV. No AI hype—just real time saved.

Would love your feedback. I also wrote a breakdown of how it works under the hood (OCR + financial structure mapping) if anyone’s interested.

Comments (1)

igor_strelkov · 9h ago

Here’s how it works under the hood:

1. PDF Parsing: We detect whether the PDF is native (text-based) or scanned (image-based). Native PDFs are parsed using pdfplumber; scanned files go through Tesseract OCR.

2. Table Extraction: We use heuristics + a fine-tuned model to identify financial tables (not just any table) and extract structured data like Revenue, EBITDA, Net Income, etc., even if labels vary.

3. Standardization Engine: A rule-based mapper matches extracted rows to a standardized chart of accounts (GAAP/IFRS-style), handling multi-year columns and inconsistent formats across companies.

4. Validation Layer: We auto-check for accounting errors (e.g., Assets ≠ Liabilities + Equity), date mismatches, or missing totals. Flagged reports are pushed for manual review or cleanup.

5. Export Formats: Outputs are returned as standardized Excel/CSV files—ready for financial modeling, BI dashboards, or credit analysis.

No LLMs involved yet, just focused, fast, deterministic extraction and mapping logic. But we’re experimenting with retrieval-augmented generation (RAG) for interpreting footnotes.

Happy to answer any questions or go deeper on architecture, caching, or product edge cases.

An Analysis of Links from the White House's "Wire" Website (blog.jim-nielsen.com)

Why are my Product Hunt upvotes delayed (ceresai.xyz)

Qualcomm's Centriq 2400 and the Falkor Architecture (chipsandcheese.com)

Bridging Shopify and Shipstation on Heroku: A Story of Custom Fulfillment (kevinhq.com)

My official list of post-glitch.com hosting options (livelaugh.blog)

All high value work is deep work, and all motivation is based on belief (reddit.com)

'There is a problem': Meta users complain of being shut out of their accounts (bbc.com)

Mount Everest's Trash-Covered Slopes Are Being Cleaned by Drones (bloomberg.com)

Gaming on a Medical Device [video] (youtube.com)

Open Source 1.7tb Dataset of What AI Crawlers Are Doing (huggingface.co)

Microsoft will lay off 9k employees, or less than 4% of the company (techcrunch.com)

Whole-genome ancestry of an Old Kingdom Egyptian (nature.com)

NYT to start searching deleted ChatGPT logs after beating OpenAI in court (arstechnica.com)

AI virtual personality YouTubers, or 'VTubers,' are earning millions (cnbc.com)

US rural communities bearing the brunt of Bitcoin mining (dw.com)

gmailtail: tail -f Your Gmail (github.com)

A Non-Partisan U.S. Military Is Essential (time.com)

What to build instead of AI agents (decodingml.substack.com)

Flint, Michigan replaces most lead pipes 10 years after Michigan water crisis (nbcnews.com)

Nebius emerged from Russia as one of Nvidia's top-performing investments (sherwood.news)

One Life (thisisyouronelife.com)

Project West Ford: Cold War Plan to Solve Radio Problems with 480M Space Needles (multiverseemployeehandbook.com)

When Code Writes Itself: The Dawn of Just‑in‑Time Software (zergai.com)

Open source CLI to expose local services using Cloudflare Tunnel (github.com)

Reading Abundance from China (chinatalk.media)

The War on the Walkman (newsletter.pessimistsarchive.org)

Nightmares Linked to Faster Ageing and Premature Mortality (emjreviews.com)

OpenGOAL: Reviving the Language That Brought Us Jak and Daxter (opengoal.dev)

No representation without reservation; Gender quotas in India (voxdev.org)

Hetackling SAP supply chain pain. Got advice?

Hey, If You Know Anything About SAP – I Need Your Brain for a SEC

Cancel Culture in Academia (papers.ssrn.com)

A "Living Web" Manifesto (owebp.net)

Penguin turns up on beach in Rio de Janeiro, alone and far from home (washingtonpost.com)

Latest iteration of big, beautiful bill to limit gambling loss deductions to 90% (reviewjournal.com)

'AI doesn't know what an orgasm sounds like': audiobook actors grapple with the (theguardian.com)

OpenAI says Robinhood's tokens aren't equity in the company (cnbc.com)

RAG Developer Experience Survey (airtable.com)

Narrative Capture (unintendedconsequenc.es)

Expose Ollama on the Network (github.com)

Tell HN: My fish died because of CoderPad

Red Teaming for Gen. AI, Report on a Copyright-Focused Exercise in Academic Med (arxiv.org)

Using Playwright MCP with Claude Code (til.simonwillison.net)

CASP protein structure prediction contest may be eliminated due to NIH cuts (science.org)

I Don't Need Ozempic. But I Want It (thefp.com)

Wayback: Gluing together Wayland components to turn Xwayland into a full X (social.treehouse.systems)

Show HN: Qrblox – AI Chat with QR Codes (qrblox.com)

New evidence that some supernovae may be a "double detonation" (arstechnica.com)

454 Hints That a Chatbot Wrote Part of a Biomedical Researcher's Paper (nytimes.com)

Can we test it? Yes, we can (youtube.com)

Show HN: I built a tool that extracts structured financial data from PDFs

Comments (1)