Show HN: I built a Chrome extension to unify debugging for GTM,GA4& 20 trackers (zen-analytics-pixel-tracker.web.app)

To be honest, my experience with agents is still pretty limited, so I’d really appreciate any advice, especially around best practices or a roadmap for implementation. The goal is to build something that can learn and reflect the company’s culture, answer situational questions like “what to do in this case,” assist with document error checking, and generally serve as a helpful internal assistant.

All of this came from the client's desire to have a tool that aligns with their internal knowledge and workflows.

Is something like this feasible in terms of quality and reliability? And beyond hallucinations, are there major security or roadblocks concerns I should be thinking about?

ursaguild · 8h ago

Ingesting documents and using natural language to search your org docs with an internal assistant sounds more like a good use case for RAG[1]. Agents are best when you need to autonomously plan and execute a series of actions[2]. You can combine the two but knowing when depends on the use case.

I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.

[1] https://www.willowtreeapps.com/craft/retrieval-augmented-gen...

[2] https://www.willowtreeapps.com/craft/building-ai-agents-with...

ednite · 8h ago

Interesting, and thanks for explanations.

In this case, the agent would also need to learn from new events, like project lessons learned, for example.

Just curious: can a RAG[1] system actually learn from new situations over time in this kind of setup, or is it purely pulling from what's already there?

ursaguild · 7h ago

Especially with a client, consider the word choices around "learning". When using llms, agents, or rag, the system isn't learning (yet) but making a decision based on the context you provide. Most models are a fixed snapshot. If you provide up to date information, it will be able to give you an output based on that.

"Learning" happens when initially training the llm or arguably when fine-tuning. Neither of which are needed for your use case as presented.

ednite · 7h ago

Thanks for the clarification, really appreciate it. It helps frame things more precisely.

In my case, there will be a large amount of initial data fed into the system as context. But the client also expects the agent to act more like a smart assistant or teacher, one that can respond to new, evolving scenarios.

Without getting into too much detail, imagine I feed the system an instruction like: “Box A and Box B should fit into Box 1 with at least 1" clearance.” Later, a user gives the agent Box A, Box B, and now adds Box D and E, and asks it to fit everything into Box 1, which is too small. The expected behavior would be that the agent infers that an additional Box 2 is needed to accommodate everything.

So I understand this isn't "learning" in the training sense, but rather pattern recognition and contextual reasoning based on prior examples and constraints.

Basically, I should be saying "contextual reasoning" instead of "learning."

Does that framing make sense?

mardef · 1h ago

There is no memory that the LLM has from your initial instructions to your later instructions.

In practice you have to send the entire conversation history with every prompt. So you should think of it as appending to an expanding list of rules that you put send every time.

mousetree · 8h ago

You can ingest new documents and data into the RAG system as you need

eric-burel · 9h ago

The complexity of an agent may range from something relatively simple to whatever level of complexity you want. So your projet sounds doable but you'll have to run some exploration to get proper answers. Regarding reliability, quality and security, it is as important to learn how to observe an agent system than learning how to implement an agent system. An agent/LLM-based solution is proven to work only if you observe that it actually works, experiments, tests and monitoring are not optional like eg in web development. As for security concerns, you'd want to take a look at the OWASP top 10 for LLMs: https://owasp.org/www-project-top-10-for-large-language-mode... LLMs/agents indeed have their own new set of vulnerabilities.

ednite · 9h ago

That’s sound advice, really appreciate the link. Regarding your point about continuous monitoring, that’s actually the first thing I mentioned to the client.

It’s still highly experimental and needs to be observed, corrected, and tweaked constantly, kind of like teaching a child, where feedback and reinforcement are key.

I may share my experience with the HN community down the line. Thanks again!

abelanger · 8h ago

I'm a big fan of https://github.com/humanlayer/12-factor-agents because I think it gets at the heart of engineering these systems for usage in your app rather than a completely unconstrained demo or MCP-based solution.

In particular you can reduce most concerns around security and reliability when you treat your LLM call as a library method with structured output (Factor 4) and own your own control flow (Factor 8). There should never be a case where your agent is calling a tool with unconstrained input.

ednite · 8h ago

I guess I’ve got some reading and research ahead of me. I definitely would rather support the idea of treating LLM calls more like structured library functions, rather than letting them run wild.

Definitely bookmarking this for reference. Appreciate you sharing it.

trevinhofmann · 6h ago

Others have given some decent advice based on your comment, but would you be interested in a ~30 minute (video) call to dive a bit deeper so I can give more tailored suggestions?

helsinki · 5h ago

Has anyone solved scoped permissions in multi-agent systems? For instance, if a user asks an orchestrator agent to:

1) Search GitHub for an issue in their repo.

2) Fix the issue and push it to GitHub.

3) Search Jira for any tasks related to this bug, and update their status.

4) Post a message to slack, notifying the team that you fixed it.

Now, let’s assume this agent is available to 1000 users at a company. How does the system obtain the necessary GitHub, Jira, and Slack permissions for a specific user?

The answer is fairly obvious if the user approves each action as the task propagates between agents, but how do you do this in a hands-free manner? Let’s assume the user is only willing to approve the necessary permissions once, after submitting their initial prompt and before the orchestrator agent attempts to call the GitHub agent.

If anyone could offer any advice on this, I would really appreciate it. Thank you!

simonw · 4h ago

I would solve this using the equivalent of a service account - I would give that "agent" an identity - "CodeBot" or whatever - and then that as an actor which has permission to read things on Jira, permission to send notifications to Slack, permission to access the GitHub API etc.

Then I would control who had permission to tell that to do, and log everything in detail.

CMCDragonkai · 5h ago

Yea we have been developing Polykey for this purpose. Sent you an email to discuss.

_pdp_ · 8h ago

IMHO this guide should have been called "a theoretical guide for building agents". In practice, you cannot build agents like that if you want them to do useful things.

Also, the examples provided are not only not practical but potentially bad practice. Why do you need a manager pattern to control a bunch of language translation agents when most models will do fine especially for latin-based languages? In practice a single LLM will not only be more cost-effective but also good for the overall user experience too.

Also, prompting is the real unsung hero that barely gets a mention. In practice you cannot get away with just a couple of lines describing the problem / solution at a high-level. Prompts are complex and very much an art form because and frankly, let's be honest, there is no science whatsoever behind them - just intuition. But in practice they do have enormous effect on the overall agent performance.

This guide is not aimed at developers to really educate how to build agents but at business executives and decision-makers who need a high-level understanding without getting into the practical implementation details. It glosses over the technical challenges and complexity that developers actually face when building useful agent systems in production environments.

3abiton · 6h ago

Do you have any good practical guide in mind?

ramesh31 · 7h ago

Tools are the only thing that matters, and are what you should focus on, not "agents" as a separate concept. Locking yourself into any particular agent framework is silly; they are nothing but LLM calling while-loops connected to JSON/XML parsers. Tools define and shape the entirety of an agent's capability to do useful things, and through MCP can be trivially shared with virtually any agentic process.

gavmor · 6h ago

Yes, I wonder if it ever makes sense to partition an agent's toolkit across multiple "agents"—besides horizontal scaling. Why should one process have access to APIs that another doesn't? Authorization and secrets, maybe, but functionality?

DarmokJalad1701 · 5h ago

Depending on the capability or context size of the model, it is easy to imagine a situation where specifying too many tools (or MCPs) can overwhelm it, affecting the accuracy.

Most liked comment is AI (drive.google.com)

Video: Modern attacks vs. AT-AT's – Tactical analysis of the battle of hoth (youtube.com)

Physicists observe a new form of magnetism (news.mit.edu)

New MLLM Arena is interesting (29e17bc867574fbd64.gradio.live)

Endangered classic Mac plastic color returns as 3D-printer filament (arstechnica.com)

Show HN: AI image generation and editing tool (fluxcontext.pro)

Trump DOJ takes unprecedented step admonishing foreign judge in free speech case (foxnews.com)

Learning Elvish (but not the Middle-earth one) (nevkontakte.com)

Mirage: UGC AI (captions.ai)

Running FreeDOS inside a Pokémon Emerald save file (commentated) [video] (youtube.com)

Why AI Will Never Be Conscious – Recursive Identity Requires Collapse (osf.io)

End of an Era: Landsat 7 Decommissioned After 25 Years of Earth Observation (usgs.gov)

Show HN: WhimpsyAI – I built this to learn something new every day in 10 minutes (whimpsyai.com)

My first attempt at iOS app development (mgx.me)

Engineer Fixes and Re-Installs Old Payphones, Provides Free Calls to the Public (core77.com)

Principles of Flight (cfinotebook.net)

The Medical Evidence Project (jamesclaims.substack.com)

Want to Model a Land Value Tax Shift in Your City? (progressandpoverty.substack.com)

Sydney's sulphur-crested cockatoos spotted using drinking fountains (abc.net.au)

Startup Equity 101 (quarter--mile.com)

Amazon to invest $10bi in NC to expand cloud and advance AI innovation (aboutamazon.com)

Cursor v1.0 (forum.cursor.com)

What Is an AI Agent, Really (agentrank.tech)

Show HN: I built a Chrome extension to unify debugging for GTM,GA4& 20 trackers (zen-analytics-pixel-tracker.web.app)

PS5 shooter goes from 5 players to bestseller after devs defend game (polygon.com)

The basics of an indirect threaded code ANS Forth implementation (boston.conman.org)

Cryptography Scales Trust (newsletter.squishy.computer)

Mastra 101 (mastra.ai)

Differences in link hallucination and source comprehension across different LLM (mikecaulfield.substack.com)

An "ice battery" system is being used to cool buildings and lower energy costs (cbsnews.com)

Using 'Slop Forensics' to Determine Model Ancestry (dbreunig.com)

It's Not Just Poor Rains Causing Drought. The Atmosphere Is 'Thirstier.' (nytimes.com)

Sodium-air fuel cell to enable electric aviation (news.mit.edu)

Economists Raise Questions About Quality of U.S. Inflation Data (wsj.com)

Investors circle the Trump trade's global market victims (November 2024) (finance.yahoo.com)

Deriving the gradient for the backward pass of Layer Normalization (shreyansh26.github.io)

AI as a Financial Market Analyst (monadwealth.com)

Show HN: I made a Custom GPT to help find emails without breaking the bank (chatgpt.com)

The Clemson University Vehicular Electronics Labratory (2017) (cecas.clemson.edu)

Show HN: Top Payment Gateways in Asia (altified.com)

Amazon prepares to test humanoid robots for deliveries, The Information reports (reuters.com)

The Art of SQL Query Optimization (jnidzwetzki.github.io)

Panjandrum: The 'giant firework' built to break Hitler's Atlantic Wall (bbc.com)

High-rise forests can transform city life – and make us happier (bbc.com)

Show HN: Website Speed Checker in Bulk (using apify) (apify.com)

Bloomberg Uses Old Tesla FSD Data to Question Robotaxi Safety (gearmusk.com)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Ask HN: What's the most overengineered tool everyone uses but won't admit sucks?

SaaS Launch Checklist for the Vibecoding Era (lightrains.com)

Copilot Spaces: A new way to work with code and context (github.blog)

A practical guide to building agents [pdf]

Comments (20)