I don’t see the state file as a complete downside. It is very simple and very easy to understand. It makes it easy to tell or predict what terraform will do given the current state and desired state.
Its simpleness makes troubleshooting easier: the state files are easy to read and manipulate or repair in the event of a drift, mismatch, or botched provider update.
With the solution proposed it feels like the state becomes a black box I shouldn’t put my hands in. I wonder how the troubleshooting scenarios change with it.
Personally, I haven’t ran into the scaling issue described; at any given time there is usually only one entity working with the state file. We do use terragrunt for larger systems but it is manageable. ~1000 engineer org.
dwroberts · 16m ago
If you use a tool like Atmos (https://atmos.tools/) you kind of fix this issue already for free - because it takes the place of the root module, it actually manages the state of each sub module separately (they each have their own individual state file rather than being converged into one).
lawnchair · 11m ago
I don't think it fixes it. Atmos makes splitting and managing multiple states easier, but it still splits the graph. It doesn't change the underlying execution model.
sausagefeet · 37m ago
Hey! One of the Stategraph developers here and can answer any questions. The major motivation is just how small scale Terraform/Tofu start to breakdown and creates work for users when they have to refactor for performance issues that shouldn't exist. So we want a drop in solution that just dissolves those issues without the user having to do anything.
giveita · 32m ago
Not an expert, but doesn't microservices help with this. Each microservice has its own YAMLesque resource descriptor (TF, cloudformation, whatever) and is managed independently. My team can add a SQS or S3 without locking your team.
I might be wrong regarding more sophisticated infra though.
sausagefeet · 26m ago
Not necessarily. The guidance is to split your TF code across multiple states which might feel like it make sense but for your microservices to communicate that beed to share some base infrastructure, such as networking, so where does that live? Putting dependencies in their own state means that you lose the ability to understand how changing them impacts all of your infrastructure because you have this information black hole at the boundary of their state.
With Stategraph, you'll get all the benefits and isolation of separate state files, but when you changed resources, you'll get meaningful plans around all of the infrastructure they impact, not just the statically defined boundaries of a state file.
lawnchair · 25m ago
Author here. You are right that splitting by microservice reduces overlap. The problem is shared resources never go away such as VPCs IAM or databases so contention shows up there.
Splitting state files is the common workaround but that only creates new problems like cross state dependencies and orchestration glue. The real issue is the storage model which is a single JSON blob with a global lock. Treating state as a graph with proper concurrency control avoids contention while keeping a cohesive view of infrastructure.
spinningarrow · 3m ago
Do you have an example you can share?
We have about 30 services with each managing their own terraform state. We also have a shared infra repo managing some top level items. We haven’t run into any issues (with any regularity at least) that I can think of but I’m wondering if this could be a good tool for us as we grow and things become even more complex?
pst · 53m ago
This is awesome. Having a single state for all resources in an environment is critical for keeping all the moving pieces in check and a core design aspect of Kubestack. But the growing state files quickly become a bottleneck. I'm definitely giving this a good test drive. Very excited.
sausagefeet · 31m ago
Thank you, that is great to hear! We're pushing pretty hard to get a pre-alpha out to get some foundations testable by the community.
arccy · 27m ago
so kind of like crossplane where each resource is managed individually?
tuananh · 46m ago
can it be a sqlite db in s3 with locking implemented with s3?
sausagefeet · 35m ago
Hello, Stategraph developer here, the answer is: probably not. That doesn't resolve the core issue of state being managed as a big blob.
I don’t see the state file as a complete downside. It is very simple and very easy to understand. It makes it easy to tell or predict what terraform will do given the current state and desired state.
Its simpleness makes troubleshooting easier: the state files are easy to read and manipulate or repair in the event of a drift, mismatch, or botched provider update.
With the solution proposed it feels like the state becomes a black box I shouldn’t put my hands in. I wonder how the troubleshooting scenarios change with it.
Personally, I haven’t ran into the scaling issue described; at any given time there is usually only one entity working with the state file. We do use terragrunt for larger systems but it is manageable. ~1000 engineer org.
I might be wrong regarding more sophisticated infra though.
With Stategraph, you'll get all the benefits and isolation of separate state files, but when you changed resources, you'll get meaningful plans around all of the infrastructure they impact, not just the statically defined boundaries of a state file.
Splitting state files is the common workaround but that only creates new problems like cross state dependencies and orchestration glue. The real issue is the storage model which is a single JSON blob with a global lock. Treating state as a graph with proper concurrency control avoids contention while keeping a cohesive view of infrastructure.
We have about 30 services with each managing their own terraform state. We also have a shared infra repo managing some top level items. We haven’t run into any issues (with any regularity at least) that I can think of but I’m wondering if this could be a good tool for us as we grow and things become even more complex?