Thanks for sharing this. I'm actually starting to explore integrating an agent into one of my SaaS solutions, based on a client request.
To be honest, my experience with agents is still pretty limited, so I’d really appreciate any advice, especially around best practices or a roadmap for implementation. The goal is to build something that can learn and reflect the company’s culture, answer situational questions like “what to do in this case,” assist with document error checking, and generally serve as a helpful internal assistant.
All of this came from the client's desire to have a tool that aligns with their internal knowledge and workflows.
Is something like this feasible in terms of quality and reliability? And beyond hallucinations, are there major security or roadblocks concerns I should be thinking about?
ursaguild · 8h ago
Ingesting documents and using natural language to search your org docs with an internal assistant sounds more like a good use case for RAG[1]. Agents are best when you need to autonomously plan and execute a series of actions[2]. You can combine the two but knowing when depends on the use case.
I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.
In this case, the agent would also need to learn from new events, like project lessons learned, for example.
Just curious: can a RAG[1] system actually learn from new situations over time in this kind of setup, or is it purely pulling from what's already there?
ursaguild · 7h ago
Especially with a client, consider the word choices around "learning". When using llms, agents, or rag, the system isn't learning (yet) but making a decision based on the context you provide. Most models are a fixed snapshot. If you provide up to date information, it will be able to give you an output based on that.
"Learning" happens when initially training the llm or arguably when fine-tuning. Neither of which are needed for your use case as presented.
ednite · 7h ago
Thanks for the clarification, really appreciate it. It helps frame things more precisely.
In my case, there will be a large amount of initial data fed into the system as context. But the client also expects the agent to act more like a smart assistant or teacher, one that can respond to new, evolving scenarios.
Without getting into too much detail, imagine I feed the system an instruction like: “Box A and Box B should fit into Box 1 with at least 1" clearance.” Later, a user gives the agent Box A, Box B, and now adds Box D and E, and asks it to fit everything into Box 1, which is too small. The expected behavior would be that the agent infers that an additional Box 2 is needed to accommodate everything.
So I understand this isn't "learning" in the training sense, but rather pattern recognition and contextual reasoning based on prior examples and constraints.
Basically, I should be saying "contextual reasoning" instead of "learning."
Does that framing make sense?
mardef · 1h ago
There is no memory that the LLM has from your initial instructions to your later instructions.
In practice you have to send the entire conversation history with every prompt. So you should think of it as appending to an expanding list of rules that you put send every time.
mousetree · 8h ago
You can ingest new documents and data into the RAG system as you need
eric-burel · 9h ago
The complexity of an agent may range from something relatively simple to whatever level of complexity you want. So your projet sounds doable but you'll have to run some exploration to get proper answers. Regarding reliability, quality and security, it is as important to learn how to observe an agent system than learning how to implement an agent system. An agent/LLM-based solution is proven to work only if you observe that it actually works, experiments, tests and monitoring are not optional like eg in web development. As for security concerns, you'd want to take a look at the OWASP top 10 for LLMs: https://owasp.org/www-project-top-10-for-large-language-mode...
LLMs/agents indeed have their own new set of vulnerabilities.
ednite · 9h ago
That’s sound advice, really appreciate the link. Regarding your point about continuous monitoring, that’s actually the first thing I mentioned to the client.
It’s still highly experimental and needs to be observed, corrected, and tweaked constantly, kind of like teaching a child, where feedback and reinforcement are key.
I may share my experience with the HN community down the line. Thanks again!
abelanger · 8h ago
I'm a big fan of https://github.com/humanlayer/12-factor-agents because I think it gets at the heart of engineering these systems for usage in your app rather than a completely unconstrained demo or MCP-based solution.
In particular you can reduce most concerns around security and reliability when you treat your LLM call as a library method with structured output (Factor 4) and own your own control flow (Factor 8). There should never be a case where your agent is calling a tool with unconstrained input.
ednite · 8h ago
I guess I’ve got some reading and research ahead of me. I definitely would rather support the idea of treating LLM calls more like structured library functions, rather than letting them run wild.
Definitely bookmarking this for reference. Appreciate you sharing it.
trevinhofmann · 6h ago
Others have given some decent advice based on your comment, but would you be interested in a ~30 minute (video) call to dive a bit deeper so I can give more tailored suggestions?
helsinki · 5h ago
Has anyone solved scoped permissions in multi-agent systems? For instance, if a user asks an orchestrator agent to:
1) Search GitHub for an issue in their repo.
2) Fix the issue and push it to GitHub.
3) Search Jira for any tasks related to this bug, and update their status.
4) Post a message to slack, notifying the team that you fixed it.
Now, let’s assume this agent is available to 1000 users at a company. How does the system obtain the necessary GitHub, Jira, and Slack permissions for a specific user?
The answer is fairly obvious if the user approves each action as the task propagates between agents, but how do you do this in a hands-free manner? Let’s assume the user is only willing to approve the necessary permissions once, after submitting their initial prompt and before the orchestrator agent attempts to call the GitHub agent.
If anyone could offer any advice on this, I would really appreciate it. Thank you!
simonw · 4h ago
I would solve this using the equivalent of a service account - I would give that "agent" an identity - "CodeBot" or whatever - and then that as an actor which has permission to read things on Jira, permission to send notifications to Slack, permission to access the GitHub API etc.
Then I would control who had permission to tell that to do, and log everything in detail.
CMCDragonkai · 5h ago
Yea we have been developing Polykey for this purpose. Sent you an email to discuss.
_pdp_ · 8h ago
IMHO this guide should have been called "a theoretical guide for building agents". In practice, you cannot build agents like that if you want them to do useful things.
Also, the examples provided are not only not practical but potentially bad practice. Why do you need a manager pattern to control a bunch of language translation agents when most models will do fine especially for latin-based languages? In practice a single LLM will not only be more cost-effective but also good for the overall user experience too.
Also, prompting is the real unsung hero that barely gets a mention. In practice you cannot get away with just a couple of lines describing the problem / solution at a high-level. Prompts are complex and very much an art form because and frankly, let's be honest, there is no science whatsoever behind them - just intuition. But in practice they do have enormous effect on the overall agent performance.
This guide is not aimed at developers to really educate how to build agents but at business executives and decision-makers who need a high-level understanding without getting into the practical implementation details. It glosses over the technical challenges and complexity that developers actually face when building useful agent systems in production environments.
3abiton · 6h ago
Do you have any good practical guide in mind?
ramesh31 · 7h ago
Tools are the only thing that matters, and are what you should focus on, not "agents" as a separate concept. Locking yourself into any particular agent framework is silly; they are nothing but LLM calling while-loops connected to JSON/XML parsers. Tools define and shape the entirety of an agent's capability to do useful things, and through MCP can be trivially shared with virtually any agentic process.
gavmor · 6h ago
Yes, I wonder if it ever makes sense to partition an agent's toolkit across multiple "agents"—besides horizontal scaling. Why should one process have access to APIs that another doesn't? Authorization and secrets, maybe, but functionality?
DarmokJalad1701 · 5h ago
Depending on the capability or context size of the model, it is easy to imagine a situation where specifying too many tools (or MCPs) can overwhelm it, affecting the accuracy.
To be honest, my experience with agents is still pretty limited, so I’d really appreciate any advice, especially around best practices or a roadmap for implementation. The goal is to build something that can learn and reflect the company’s culture, answer situational questions like “what to do in this case,” assist with document error checking, and generally serve as a helpful internal assistant.
All of this came from the client's desire to have a tool that aligns with their internal knowledge and workflows.
Is something like this feasible in terms of quality and reliability? And beyond hallucinations, are there major security or roadblocks concerns I should be thinking about?
I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.
[1] https://www.willowtreeapps.com/craft/retrieval-augmented-gen...
[2] https://www.willowtreeapps.com/craft/building-ai-agents-with...
In this case, the agent would also need to learn from new events, like project lessons learned, for example.
Just curious: can a RAG[1] system actually learn from new situations over time in this kind of setup, or is it purely pulling from what's already there?
"Learning" happens when initially training the llm or arguably when fine-tuning. Neither of which are needed for your use case as presented.
In my case, there will be a large amount of initial data fed into the system as context. But the client also expects the agent to act more like a smart assistant or teacher, one that can respond to new, evolving scenarios.
Without getting into too much detail, imagine I feed the system an instruction like: “Box A and Box B should fit into Box 1 with at least 1" clearance.” Later, a user gives the agent Box A, Box B, and now adds Box D and E, and asks it to fit everything into Box 1, which is too small. The expected behavior would be that the agent infers that an additional Box 2 is needed to accommodate everything.
So I understand this isn't "learning" in the training sense, but rather pattern recognition and contextual reasoning based on prior examples and constraints.
Basically, I should be saying "contextual reasoning" instead of "learning."
Does that framing make sense?
In practice you have to send the entire conversation history with every prompt. So you should think of it as appending to an expanding list of rules that you put send every time.
It’s still highly experimental and needs to be observed, corrected, and tweaked constantly, kind of like teaching a child, where feedback and reinforcement are key.
I may share my experience with the HN community down the line. Thanks again!
In particular you can reduce most concerns around security and reliability when you treat your LLM call as a library method with structured output (Factor 4) and own your own control flow (Factor 8). There should never be a case where your agent is calling a tool with unconstrained input.
Definitely bookmarking this for reference. Appreciate you sharing it.
1) Search GitHub for an issue in their repo.
2) Fix the issue and push it to GitHub.
3) Search Jira for any tasks related to this bug, and update their status.
4) Post a message to slack, notifying the team that you fixed it.
Now, let’s assume this agent is available to 1000 users at a company. How does the system obtain the necessary GitHub, Jira, and Slack permissions for a specific user?
The answer is fairly obvious if the user approves each action as the task propagates between agents, but how do you do this in a hands-free manner? Let’s assume the user is only willing to approve the necessary permissions once, after submitting their initial prompt and before the orchestrator agent attempts to call the GitHub agent.
If anyone could offer any advice on this, I would really appreciate it. Thank you!
Then I would control who had permission to tell that to do, and log everything in detail.
Also, the examples provided are not only not practical but potentially bad practice. Why do you need a manager pattern to control a bunch of language translation agents when most models will do fine especially for latin-based languages? In practice a single LLM will not only be more cost-effective but also good for the overall user experience too.
Also, prompting is the real unsung hero that barely gets a mention. In practice you cannot get away with just a couple of lines describing the problem / solution at a high-level. Prompts are complex and very much an art form because and frankly, let's be honest, there is no science whatsoever behind them - just intuition. But in practice they do have enormous effect on the overall agent performance.
This guide is not aimed at developers to really educate how to build agents but at business executives and decision-makers who need a high-level understanding without getting into the practical implementation details. It glosses over the technical challenges and complexity that developers actually face when building useful agent systems in production environments.