Ask HN: How do you programmatically track changes in SEC filings?

1 HupDup 0 6/21/2025, 5:20:34 AM
I'm working on a project that requires analyzing the evolution of corporate strategy by tracking changes in language within SEC filings over several years. For example, I want to answer questions like: "When did major cloud providers first start identifying 'AI' as a core business driver, and how did the surrounding language change each year?" or "How has the sentiment of the 'Risk Factors' section for SaaS companies shifted from 'growth' to 'efficiency' since 2022?" The naive approach seems to be downloading all filings, converting to text, and using keyword searches, but this feels brittle and misses semantic context. Vector search on chunked documents is better, but handling tables and maintaining context across a decade of reports is non-trivial. For those who have worked with this kind of unstructured, time-series text data, what are the most effective techniques or non-obvious challenges? I'm trying to figure out if this is a genuinely hard data science problem or if there are established solutions I'm overlooking.

Comments (0)

No comments yet