Launch HN: Datafruit (YC S25) – AI for DevOps
Demo video: https://www.youtube.com/watch?v=2FitSggI7tg.
Right now, we have two main methods to interact with Datafruit:
(1) automated infrastructure audits— agents periodically scan your environment to find cost optimization opportunities, detect infrastructure drift, and validate your infra against compliance requirements.
(2) chat interface (available as a web UI and through slack) — ask the agent questions for real-time insights, or assign tasks directly, such as investigating spend anomalies, reviewing security posture, or applying changes to IaC resources.
Working at FAANG and various high-growth startups, we realized that infra work requires an enormous amount of context, often more than traditional software engineering. The business decisions, codebase, and cloud itself are all extremely important in any task that has been assigned. To maximize the success of the agents, we do a fair amount of context engineering. Not hallucinating is super important!
One thing which has worked incredibly well for us is a multi-agent system where we have specialized sub-agents with access to specific tool calls and documentation for their specialty. Agents choose to “handoff” to each other when they feel like another agent would be more specialized for the task. However, all agents share the same context (https://cognition.ai/blog/dont-build-multi-agents). We’re pretty happy with this approach, and believe it could work in other disciplines which require high amounts of specialized expertise.
Infrastructure is probably the most mission-critical part of any software organization, and needs extremely heavy guardrails to keep it safe. Language models are not yet at the point where they can be trusted to make changes (we’ve talked to a couple of startups where the Claude Code + AWS CLI combo has taken their infra down). Right now, Datafruit receives read-only access to your infrastructure and can only make changes through pull requests to your IaC repositories. The agent also operates in a sandboxed virtual environment so that it could not write cloud CLI commands if it wanted to!
Where LLMs can add significant value is in reducing the constant operational inefficiencies that eat up cloud spend and delay deadlines—the small-but-urgent ops work. Once Datafruit indexes your environment, you can ask it to do things like:
"Grant @User write access to analytics S3 bucket for 24 hours"
-> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow
"Find where this secret is used so I can rotate it without downtime"
-> Discovers all instances of your secret, including old cron-jobs you might not know about, so you can safely rotate your keys
"Why did database costs spike yesterday?"
-> Identifies expensive queries, shows optimization options, implements fixes
We charge a straightforward subscription model for a managed version, but we also offer a bring-your-own-cloud model. All of Datafruit can be deployed on Kubernetes using Helm charts for enterprise customers where data can’t leave your VPC.
For the time being, we’re installing the product ourselves on customers' clouds. It doesn’t exist in a self-serve form yet. We’ll get there eventually, but in the meantime if you’re interested we’d love for you guys to email us at founders@datafruit.dev.We would love to hear your thoughts! If you work with cloud infra, we are especially interested in learning about what kinds of work you do which you wish could be offloaded onto an agent.
YC, you want founders of this companies to have 10 years working at Ford Motor Company. It's all reasons I want to write my blog article of "FAANG, please STFU. I wish I could be focused on 100k Requests per Second but instead I'm dealing with engineers who has no idea why their ORM is creating terrible query. Please stop telling them about GraphQL."
"Grant @User write access to analytics S3 bucket for 24 hours" Can the user even have access to this? Do they need write access or can't understand why they are getting errors on read? What happens when they forget in 30 days they asked your LLM for access and now their application does not work because they decided to borrow this S3 bucket instead of asking for one of their own. Yes this happened.
"Find where this secret is used so I can rotate it without downtime" Well, unless you are scanning all our Github repos, Kubernetes secret and containers, you are going to miss the fact this secret was manually loaded into Kubernetes/loaded into flat file in Docker container or stored in some random secret manager none of us are even aware of.
""Why did database costs spike yesterday?" -> Identifies expensive queries, shows optimization options, implements fixes
How? Likely it's because bad schema or lack of understanding with ORMs. Fix is going to be some PR somewhere to Dev who probably does not understand what they are reviewing.
Most of our headaches is the fact that Devs almost never give a shit about Ops, their bosses don't give a shit about Ops and Ops is trying desperately to keep this train which is on fire from derailing. We don't need AI YOLOing more stuff into Prod, we need AI to tell their bosses what downtime they are causing is costing our company so maybe, just maybe, they will actually care.
There is still benefit for non-Infra people. But non-Infra people don't understand system design, so the benefits are limited. Imagine a "mechanic AI". Yes, you could ask it all sorts of mechanic questions, and maybe it could even do some work on the car. But if you wanted to, say, replace the entire engine with a different one, that is a systemic change and has farther reaching implications than an AI will explain, much less perform competently. You need a mechanic to stop you and say, uh, no, please don't change the engine; explain to me what you're trying to do and I'll help you find a better solution. Then you need a real mechanic to manage changing the tires on the moving bus so it doesn't crash into the school. But having an AI could make the mechanic do all of that smoother.
Another thing I'd love to see more AI use of, is people asking the AI for advice. Most devs seem to avoid asking Infra people for architectural/design advice. This leads to them putting together a system using their limited knowledge, and it turns out to be an inferior design to what an Infra person would have suggested. Hopefully they will ask AI for advice in the future.
Something we’ve been dealing with is trying to get the agents to not over-complicate their designs, because they have a tendency to do so. But with good prompting they can be very helpful assistants!
> Right now, Datafruit receives read-only access to your infrastructure
> "Grant @User write access to analytics S3 bucket for 24 hours" > -> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow
These statements directly conflict with one another.
So it needs "iam:CreateRole," "iam:AttachPolicy," and other similar permissions. Those are not "read-only." And, they make it effectively admin in the account.
What safeguards are in place to make sure it doesn't delete other roles, or make production-impacting changes?
How is the auto-revoke handled? Will it require human intervention to merge a PR/apply the Terraform configuration, or will it do it automatically?
Also, auto-revoke right now can be handled by creating a role in Terraform that can be assumed and expires after a certain time. But we’re exploring deeper integrations with identity providers like Okta to handle this better.
I consulted for an early stage company that was trying to do this during the GPT-3 era. Despite the founders' stellar reputation and impressive startup pedigree, it was exceedingly difficult to get customers to provide meaningful read access to their AWS infrastructure, let alone the ability to make changes.
And yeah, we are noticing that it’s difficult to convince people to give us access to their infrastructure. I hope that a BYOC model will help with that.
> we’ve talked to a couple of startups where the Claude Code + AWS CLI combo has taken their infra down
Do you care to share what language model(s) you use?
Why does that need an AI? I’m pretty sure many tools for those things exist, and they predate LLMs.
It is workflow automation in the end of the day. I would rather pick SOAR or AI-SOC where automation like this is very common. For eg blinkops or torq.
We have not spent as much time working in the security space, and I do think that purpose-built solutions are better if you only care about security. We are purposefully trying to stay broad, which might mean that our agents lack depth in specific verticals.
Also, as a daily AI user (claude code / codex subs), I'm not sure I want YOLO AIs anywhere near my infra.
I don't mind letting AI's help with infra, but it's with the configs and infra as code files and it will never have any form of access to anything outside it's little box. It's significantly faster at writing out the port ranges for an FTP (don't ask) ingress than I can by hand.
that's because infrastructure is complicated. the AWS console isn't that bad (it's not great, and you should just use terraform whenever possible because clickops is dull, error-prone work); there's just a lot to know in order to deploy infrastructure cost-effectively.
this is more like "we don't want to hire infra engineers who know what they're doing so here's a tool to make suggestions that a knowledgeable engineer would make, vet and apply. just Trust Us."
https://www.uspto.gov/trademarks/search/likelihood-confusion
> Trademarks don’t have to be identical to be confusingly similar. Instead, they could just be similar in sound, appearance, or meaning, or could create a similar commercial impression.