I think a fundamental mistake I see many developers make is they use caching trying to solve problems rather than improve efficiency.
It's the equivalent of adding more RAM to fix poor memory management or adding more CPUs/servers to compensate for resource heavy and slow requests and complex queries.
If your application requires caching to function effectively then you have a core issue that needs to be resolved, and if you don't address that issue then caching will become the problem eventually as your application grows more complex and active.
chamomeal · 24m ago
Idk I think caching is a crucial part of many well-designed systems. There’s a lot of very cache-able data out there. If invalidating events are well defined or the data is fine being stale (week/month level dashboards, for example), that’s a fantastic reason to use a cache. I’d much rather just stuff those values in a cache than figure out any other more complicated solution.
I also just think it’s a necessary evil of big systems. Sometimes you need derived data. You can even think about databases as a kind of cache: the “real” data is the stream of every event that ever updated data in the database! (Yes this stretching the meaning of cache lol)
However I agree that caching is often an easy bandaid for a bad architecture.
A friend of mine once argued that adding a cache to a system is almost always an indication that you have an architectural problem further down the stack, and you should try to address that instead.
The more software development experience I gain the more I agree with him on that!
IgorPartola · 44m ago
If you think of it as a cache, yes. If you think of it as another data layer then no.
For example, let’s say that every web page your CMS produces is created using a computationally expensive compilation. But the final product is more or less static and only gets updated every so often. You can basically have your compilation process pull the data from your source of truth such as your RSBMS but then store the final page (or large fragments of it) in something like MongoDB. In other words the cache replacement happens at generation time and not on demand. This means there is always a cached version available (though possibly slightly stale), and it is always served out of a very fast data store without expensive computation. I prefer this style of caching to on demand caching because it means you avoid cache invalidation issues AND the thundering herd problem.
Of course this doesn’t work for every workflow but I can get you quite far. And yes this example can also be sort of solved with a static site generator but look beyond that at things like document fragments, etc. This works very well for dynamic content where the read to write ratio is high.
chamomeal · 20m ago
I already typed a longer comment elsewhere that I don’t feel like reiterating but I agree with you. Caching is a natural outcome of not having infinite time and memory for running programs. Sometimes it’s a bandaid over bad design, but often it’s a responsible decision to take load off of other important systems
cpursley · 36m ago
Lost me at DumpsterFireDB as cache. But if the goal is to create an even worse architecture thats even harder to maintain, go for it.
DrBazza · 56m ago
I'd argue the database falls into that category.
The two questions no one seems to ask are 'do I even need a database?', and 'where do I need my database?'
There are alternate data storage 'patterns' that aren't databases. Though ultimately some sort of (Structure) query language gets invented to query them.
jitl · 1h ago
Yeah my architecture problem is that Postgres RDS EBS storage is slow as dog. Sure our data won’t go poof if we lose an instance but it’s so slow.
(It’s not really my architecture problem. My architecture problem is that we store pages as grains of sand in a db instead of in a bucket, and that we allow user defined schemas)
barrkel · 50m ago
Caches suck because invalidation needs to be sprinkled all over the place in what is often an abstraction-violating way.
Then there's memoization, often a hack for an algorithm problem.
I once "solved" a huge performance problem with a couple of caches. The stain of it lies on my conscience. It was actually admitting defeat in reorganizing the logic to eliminate the need for the cache. I know that the invalidation logic will have caused bugs for years. I'm sure an engineer will curse my name for as long as that code lives.
jmull · 1h ago
That's true in my experience.
Caches have perfectly valid uses, but they are so often used in fundamentally poor ways, especially with databases.
AtheistOfFail · 1h ago
I disagree. For large search pages where you're building payloads from multiple records that don't change often, it could be beneficial to use a cache. Your cache ends up helping the most common results to be fetched less often and return data faster.
eatonphil · 23m ago
Many of these points are not compelling to me when 1) you can filter both rows and columns (in postgres logical replication anyway [0]) and 2) SQL views.
Having caching by default (like in Convex) is a really neat simplification to app development.
hoppp · 2h ago
The cache service is a database of sorts that usually stores key value pairs.
The difference is in persistence and scaling and read/write permissions
barrkel · 49m ago
No, what makes a cache a cache is invalidation. A cache is stale data. It's a latent out of date calculation. It's misinformation that risks surviving until it lies to the user.
Supermancho · 52m ago
ie A cache is a database. The difference is features and usage.
jayd16 · 38m ago
So I guess this guy wants Firestore (or the OSS equivalent)?
It's the equivalent of adding more RAM to fix poor memory management or adding more CPUs/servers to compensate for resource heavy and slow requests and complex queries.
If your application requires caching to function effectively then you have a core issue that needs to be resolved, and if you don't address that issue then caching will become the problem eventually as your application grows more complex and active.
I also just think it’s a necessary evil of big systems. Sometimes you need derived data. You can even think about databases as a kind of cache: the “real” data is the stream of every event that ever updated data in the database! (Yes this stretching the meaning of cache lol)
However I agree that caching is often an easy bandaid for a bad architecture.
This talk on Apache Samza completely changed how I think about caching and derived data in general: https://youtu.be/fU9hR3kiOK0?si=t9IhfPtCsSyszscf
And this interview has some interesting insights on the problems that caching faces at super large scale systems (twitter specifically): https://softwareengineeringdaily.com/2023/01/12/caching-at-t...
The more software development experience I gain the more I agree with him on that!
For example, let’s say that every web page your CMS produces is created using a computationally expensive compilation. But the final product is more or less static and only gets updated every so often. You can basically have your compilation process pull the data from your source of truth such as your RSBMS but then store the final page (or large fragments of it) in something like MongoDB. In other words the cache replacement happens at generation time and not on demand. This means there is always a cached version available (though possibly slightly stale), and it is always served out of a very fast data store without expensive computation. I prefer this style of caching to on demand caching because it means you avoid cache invalidation issues AND the thundering herd problem.
Of course this doesn’t work for every workflow but I can get you quite far. And yes this example can also be sort of solved with a static site generator but look beyond that at things like document fragments, etc. This works very well for dynamic content where the read to write ratio is high.
The two questions no one seems to ask are 'do I even need a database?', and 'where do I need my database?'
There are alternate data storage 'patterns' that aren't databases. Though ultimately some sort of (Structure) query language gets invented to query them.
(It’s not really my architecture problem. My architecture problem is that we store pages as grains of sand in a db instead of in a bucket, and that we allow user defined schemas)
Then there's memoization, often a hack for an algorithm problem.
I once "solved" a huge performance problem with a couple of caches. The stain of it lies on my conscience. It was actually admitting defeat in reorganizing the logic to eliminate the need for the cache. I know that the invalidation logic will have caused bugs for years. I'm sure an engineer will curse my name for as long as that code lives.
Caches have perfectly valid uses, but they are so often used in fundamentally poor ways, especially with databases.
[0] https://www.postgresql.org/docs/current/logical-replication-...
Having caching by default (like in Convex) is a really neat simplification to app development.
The difference is in persistence and scaling and read/write permissions