I built a neural classifier to replace Plaid's transaction categories

2 WilliXL 0 5/5/2025, 9:06:58 PM

I recently shut down a startup I was building. It was a rewards platform for health-related spending. My users were scattered across the US, but mostly in SF, NYC, LA, Chicago, and Boston.

The core product relied on inferring whether a transaction was health-related or not. I quickly realized that adding rules and heuristics on top of Plaid's categories wouldn't work. Not to mention that Plaid's categorization was way too inaccurate to be deciding financial rewards on.

Here's an account of what I built to make it work, verified with a cleaned dataset of 6k data points collected from my platform.

First of all, Plaid's baseline categorization accuracy was low: - Categorization accuracy was 65.22% overall - Accuracy was better for well-known merchants (Plaid identified an "Entity ID") at 83.99%

I tried RAG to start, but that immediately fell apart due to name collisions and regional duplication

Thankfully I was able to start with Plaid's already cleaned transaction data. To better resolve entities, my pipeline took in: - Transaction amount (for product band heuristics) - Location - POS method (in-person vs. online) - A list of known bank-specific formatting quirks that I collected as I tried to build this pipeline (for now limited to the Big Banks ™)

Using that data I could much better figure out: - Which entity the purchase was made from among entities with duplicate names (mostly SMBs) - Collapsing regional identifiers into a single parent organization - Side note: did you know that Orangetheory has a different regional identifier for every location. For example: "Orangetheory", "OTF", "otf", "otf {city}", "orangetheory {city}" are all possible names. This one took so long to solve robustly

Also this way I could provide a custom category to look for. In my case it was "health-related" or not. Which I defined with the FSA/HSA eligibility rules (in JSON format), plus some other properties like fitness/studio classes merchants, and supplements.

The results: - 87.28% accuracy on classifying "health-related" spend (with a "needs more info" tag for marketplace cases like Amazon) - 95.78% accuracy on personal finance category classification, with only 300 known entities logged in my database. So this can definitely improve with more effort put in expanding the known entities list

I made this writeup mostly for catharsis to shutting down my startup, and to warn of potential things to look out for when trying to properly utilize transactions data.

But I really do believe that this kind of infra, semantic understanding of financial data, is becoming increasingly valuable as financial data becomes more available. And new businesses can be built with it. I am considering expanding more on this infra as a developer API or toolkit. So if you're working on financial rewards, personal finance apps, FSA/HSA/expense platforms, accounting tools, etc. I'd love to hear from you!

Adding a feature because ChatGPT incorrectly thinks it exists (holovaty.com)

OpenAI’s Windsurf deal is off, and Windsurf’s CEO is going to Google (theverge.com)

MacPaint Art from the Mid-80s Still Looks Great Today (blog.decryption.net.au)

Bypassing Google's big anti-adblock update (0x44.xyz)

Nvidia won, we all lost (blog.sebin-nyshkim.net)

Bootstrapping a side project into a profitable seven-figure business (projectionlab.com)

My open source project was relicensed by a YC company [license updated] (twitter.com)

Local-first software (2019) (inkandswitch.com)

Introducing tmux-rs (richardscollin.github.io)

Show HN: Ten years of running every day, visualized (nodaysoff.run)

Supabase MCP can leak your entire SQL database (generalanalysis.com)

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks (github.com)

Let me pay for Firefox (discourse.mozilla.org)

Being too ambitious is a clever form of self-sabotage (maalvika.substack.com)

Measuring the impact of AI on experienced open-source developer productivity (metr.org)

Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)" (simonwillison.net)

Are we the baddies? (geohot.github.io)

ETH Zurich and EPFL to release a LLM developed on public infrastructure (ethz.ch)

Supreme Court's ruling practically wipes out free speech for sex writing online (ellsberg.substack.com)

At Least 13 People Died by Suicide Amid U.K. Post Office Scandal, Report Says (nytimes.com)

The Rise of Whatever (eev.ee)

US Court nullifies FTC requirement for click-to-cancel (arstechnica.com)

Websites hosting major US climate reports taken down (apnews.com)

Tree Borrows (plf.inf.ethz.ch)

Mercury: Ultra-fast language models based on diffusion (arxiv.org)

Kiro: A new agentic IDE (kiro.dev)

Postgres LISTEN/NOTIFY does not scale (recall.ai)

Hidden interface controls that affect usability (interactions.acm.org)

Linda Yaccarino is leaving X (nytimes.com)

Nobody has a personality anymore: we are products with labels (freyaindia.co.uk)

How does a screen work? (makingsoftware.com)

I extracted the safety filters from Apple Intelligence models (github.com)

SVGs that feel like GIFs (koaning.io)

I used o3 to profile myself from my saved Pocket links (noperator.dev)

Open letter accuses BBC board member of having a conflict of interest on Gaza (theguardian.com)

Jane Street barred from Indian markets as regulator freezes $566M (cnbc.com)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics (alpha.lisagui.com)

Oakland cops gave ICE license plate data; SFPD also illegally shared with feds (sfstandard.com)

Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge (businessinsider.com)

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels (github.com)

Major reversal in ocean circulation detected in the Southern Ocean (icm.csic.es)

A non-anthropomorphized view of LLMs (addxorrol.blogspot.com)

Bill Atkinson's psychedelic user interface (patternproject.substack.com)

Google can now read your WhatsApp messages (neowin.net)

Apple's Browser Engine Ban Persists, Even Under the DMA (open-web-advocacy.org)

The force-feeding of AI features on an unwilling public (honest-broker.com)

Mini NASes marry NVMe to Intel's efficient chip (jeffgeerling.com)

Hannah Cairo: 17-year-old teen refutes a math conjecture proposed 40 years ago (english.elpais.com)

Grok 4 Launch [video] (twitter.com)

Upgrading an M4 Pro Mac mini's storage for half the price (jeffgeerling.com)

I built a neural classifier to replace Plaid's transaction categories

Comments (0)