Stack Overflow intentionally includes false data in open data dumps

9 stevage 2 9/3/2025, 12:34:42 PM meta.stackexchange.com ↗

Comments (2)

egberts1 · 4h ago
TRW Credit Score (now Experian) frequently seed their database with fake entries to ensure that they can prove in court that it was their database.

And it has resulted in out-of-court (massive) settlements as I've heard 3rd handedly.

Back in the days, as a DB tester, I've been told to only use one set of nane and that everything is logged. Kinda diminishing the point of having a bound-checking unit test.

Seeding your database with "inert" artifacts is just another way of corporate copyrighting and defensive posture.

stevage · 2h ago
It's a bit odd in SO's case how easily the fake data can be detected and removed. A company would only really fall afoul of it if they were doing large scale scraping that scooped up SO data alongside a lot of other stuff. Any manual inspection would detect it pretty quickly.