I really don't understand how people given access to a pile of tools and data sources and unleash them on customers. It's horrible UX in my experience and at times worse than a phone tree.
My view is that you need to transition slowly and carefully to AI first customer support.
1. Know the scope of problems an AI can solve with high probability. Related prompt: "You can ONLY help with the following issues."
2. Escalate to a human immediately if its out of scope: "If you cannot help, escalate to a human immediately by CCing bob@smallbiz.co"
3. Have an "unlocked agent" that your customer service person can use to answer a question and evaluate how well the agent performs in helping. Use this to drive your development roadmap.
4. If the "unlocked agent" becomes good at solving a problem, add that to the in-scope solutions.
Finally, you should probably have some way to test existing conversations when you make changes. (It's on my TODO list)
I've implemented this for a few small businesses, and the process is so seamless that no one has suspected interaction with an AI. For one client, there's not even a visible escalation step: they get pinged on their phone and take over the chat!
myhf · 16h ago
The purpose of customer support is to convince the customer that it is not worth their time to pursue support. A worse experience achieves that goal faster.
Using GenAI is a huge breakthrough in this field, because it is a socially acceptable way to tell someone you don't care about their issue.
politelemon · 15h ago
You've articulated it better than I could. I think, reading through this author's post, they've misunderstood the objectives.
The purpose has been achieved, in that there is a large drop rate. The product manager has met their goals, cut costs, and might be looking forward to their bonus.
It would be far more expensive to make the LLM behave effectively than it would be to do nothing. Any product manager that sincerely cared about customer support wouldn't be inflicting a personalised callous disregard for service. Instead they'd be focusing on improving documentation, help, and processes. But that's not innately quantifiable in a way that leads to bonuses, and therefore goes unnoticed.
risyachka · 22h ago
>> really don't understand how people given access to a pile of tools and data sources and unleash them on customers
It’s pretty simple. When a non-tech person sees faked demos of what it can do - it looks epic and everyone extrapolates results and thinks AI is that good.
small_scombrus · 16h ago
Doubly so if the person deciding what gets implemented doesn't really get what their staff actually do.
LLMs ability to give convincing sounding answers is like catnip for service desk managers who have never actually been on the desk itself
gillesjacobs · 23h ago
Nice framing for PMs, but technically it is way too rosy. MCP is real but still full of low utility services and security issues, so “skills as plug-ins” is not production ready. A2A protocols were only just announced this year (Google, etc.) and actual inter-agent interoperability is still research grade, with debugging across agents being a nightmare. Orchestration layers (skills, workflows, multi-agent) look clean in diagrams but turn into brittle state machines under load. LLM “confidence scores” are basically uncalibrated logits dressed up as probabilities.
In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.
gabriel666smith · 23h ago
The idea of giving an LLM with a tool any kind of control over an actual user's account remains (though you put this more elegantly) batshit insane to me.
Even assuming you've correctly auth'd the user contacting you (big assumption!), allowing that user to very literally prompt a 'semi-confident thing with tools' - however many layers of abstraction away the tool is - feels very, very far away from a real-world, sensible implementation right now.
Just shoot the tool prompts over to a human operator, if it's so necessary! Sense-check!
gabriel666smith · 23h ago
I MVP'd one of these (a basic sequence of LLM customer support 'agents') at my last job, I guess spring 2024. So much has changed since then!
'Routing through increasingly specialised agents' was my approach, and the only thing that would've done the job (in MVP form) at the time. There weren't many models that would fit our (v good) CS & Product teams' dataset of "probable queries from customers" into a single context window.
I never personally got my MVP beyond sitting with it beside the customer support inbox, talking to customers. And AFAIK it never moved beyond that after I left.
Nor should it have been, probably - there are (wild, & mostly ineffable) trade-offs that you make the moment you stop actually talking to users at the very moment they get in touch. I don't remember ever making a trade-off like that where it was worthwhile.
I _do_ remember it as perhaps the most worthwhile time I ever spent doing product-y work.
I say that because: To consider a customer support query type that might be 0.005% of all queries received by the CS team, even my trash MVP had to walk a path down a pretty intricate tree of agents and possible query types.
So - if you believe that 'solving the problems users have with your product' = 'making a better product'. then talking to an LLM that was an advocate for a tiny subset of users, and knew very intimately the details of their issue with your product, that felt really good. It felt like it was a very pure version of what _I_ should be to devs, as any kind of interface between them and our users.
It was very hard to stay a believer in the idea of a 'PM' after seeing that, at least. As a person who preferred to just let people get on with things.
I enjoyed the linked post; it's really interesting to see how far things have come. I'm surprised nobody has built 'talk to your customers at scale', yet - this feels like a far more interesting problem than 'avoid talking to your customers at scale'.
I'm also not surprised, I guess, since it's an incredibly bespoke job to do properly, I imagine, for most products.
majormajor · 16h ago
> I enjoyed the linked post; it's really interesting to see how far things have come. I'm surprised nobody has built 'talk to your customers at scale', yet - this feels like a far more interesting problem than 'avoid talking to your customers at scale'.
This sounds hard to pull off in a very similar way to getting good data through surveys.
I generally don't want to talk to my tools. If I'm motivated to talk to you, it's probably because something went wrong. And even if I talked to you when not annoyed, I'd struggle to articulate more than "it's working good" at any given moment - when what you really want as a product person is to know "it's working good, but I had to internalize this workaround for something for my use case that no I don't even think about but originally I found offputting and almost bounced because of" or whatever.
barbazoo · 1d ago
> Confidence calibration: When your agent says it's 60% confident, it should be right about 60% of the time. Not 90%, not 30%. Actual 60%.
With current technology (LLM), how can an agent ever be sure about its confidence?
fumeux_fume · 1d ago
The author's inner PM comes out here and makes some wild claims. Calibration is something we can do with traditional, classification models, but not with most off-the-shelf LLMs. Even if you devised a way to determine if the LLM's confidence claim matched it's actual performance, you wouldn't be able to calibrate or tune it like you would a more traditional model.
esafak · 1d ago
I was about to say "Using calibrated models", then I found this interesting paper:
I'm typically pretty critical of PM oriented pieces, but I found this to be a decent overview of how to reason about building these systems from first principles + some of the non-tech pain points + how to address them.
jbmsf · 19h ago
As an engineer, I like this framework but can think of approximately zero PMs who could use it to build a product.
pluto_modadic · 13h ago
reading this as a security engineer trying to get ahead of misguided PMs who buy into the AI hype and don't know 1) it's immature 2) it's not secure & 3) whether their business use case is viable for the R&D we're about to put into it.
I get the feeling there's going to be either 1) a great revert of the features, 2) a bunch of hurried patches, or 3) a bunch of legacy systems operating on MCP v0.00-beta (metaphorically speaking)
:lol_sob:
harryf · 13h ago
We need to take the focus off cost savings. None of this tech is anywhere near mature enough to replace humans yet.
Far better to focus on enhancing human capabilities with agents.
For example while a human talks to a customer on the phone, AI is fetching useful context about the customer and suggesting talking points to improve the human conversation.
One example of a direct benefit for business using AI this way is reducing onboarding times for new employees
ricardobeat · 1d ago
What does the PM title even mean at this point? It's a bit surprising to see a deep dive into technical architecture - though there is massive value in understanding what's involved - as a PM responsibility, this is more in TPM (technical program manager) land which is a different job.
In my book they ideally focus on understanding scope, user needs and how to measure success, while implementation details such as orchestration strategies, evaluation and making sure your system delivers the capabilities you want in general, are engineering responsibilities.
charcircuit · 1d ago
This post does not do a deep dive into technical architecture.
MangoToupe · 23h ago
The PM's role is to whip devs until the requirements are met. That seems apt here. Even if the requirements make zero sense
ramesh31 · 1d ago
Stop trying to treat these things as more than they are. Stop trying to be clever. These models are the single most complex things ever created by humans; the summation of decades of research, trillions in capex, and the untold countless hours of thousands of people smarter than you and I. You will not meaningfully add to their capabilities with some hacked together reasoning workflows. Work within the confines of what they can actually do; anything else is complete delusion.
sixo · 1d ago
This is a nonsensical opinion by a person who doesn't know what they're talking about, and probably didn't read the article.
These models are tools, and LLM products bundles these tools with other tools, and 90% of UX amounts to bundling these well. The article here gives a great sense of what this takes.
dang · 1d ago
> This is a nonsensical opinion by a person who doesn't know what they're talking about, and probably didn't read the article.
Ok, but can you please make your substantive points without putting others down? Your comment wouold be fine without this bit.
The AI bundling problem is over. The user interface problem is over. You won't need a UI for your apps in a few years, agents are going to drive _EVERYTHING_. If you want a display for some data, the agent will slap together a dashboard on the fly from a composable UI library that's easy to work with, all hot loaded and live-revised based on your needs.
bopbopbop7 · 1d ago
You must be an easy person to market to.
CuriouslyC · 1d ago
I use agents to do so much stuff on my computer, MCPs are easy to roll so you can give them whatever powers you want. Being able to just direct agents to do stuff on my computer via voice is amazing. The direct driving still sucks so they're not a general UI yet, and the models need to be a bit more consistent/smarter in general, but it'll be there very soon.
heyitsguay · 23h ago
What do you do with agents?
CuriouslyC · 23h ago
I use them as an intelligence layer over disk cleanup tools, to manage deployments/cloud configs, I have big repo organization workflows, they can manage my KDE system settings, I use them as editors on documents all over my filesystem (to add comments for revision, not to rewrite, that's not consistent enough), I use them to do deep research on topics and save reports, to look at my google analytics and seo data and suggest changes to my pages. Frankly if I had my druthers I wouldn't use a mouse, the agent would use visual tracking (eye/hand) along with words and body language to just quickly figure out what I want.
LtWorf · 14h ago
> they can manage my KDE system settings
Why do you even have KDE installed if AI has replaced GUIs?
antonvs · 18h ago
You’re saying you’ve found a useful assistant for menial tasks. That’s not consistent with the strong claims you were making upthread.
CuriouslyC · 17h ago
My claim is that the "useful assistant for menial tasks" is the Wright brothers flyer to what we'll have in a few years. If you have voice chat with an agent on your phone that can just do everything you'd need an app for, what's the point of an app? And it's gonna happen, because if your app doesn't let people's agents handle their business and your competitors' does, people are gonna switch if they can. The computer interfaces of the future are going to be made for agents first.
antonvs · 2h ago
> My claim is that the "useful assistant for menial tasks" is the Wright brothers flyer to what we'll have in a few years.
I agree with that.
But what you originally wrote was, "The AI bundling problem is over. The user interface problem is over." It would probably make more sense to say "...will be over."
People tend to be sensitive to those kinds of claims because there's a lot of hype around all this at the moment. So when people seem to imply that what we have right now is much more capable than it actually is, there tends to be pushback.
ares623 · 22h ago
The Juicero moment for software
CuriouslyC · 22h ago
Tell me you don't want to go hands free and have the star trek computer do everything for you. We could be there in ~5 years.
bopbopbop7 · 21h ago
We also could have warp drives next year!
CuriouslyC · 21h ago
Except that the main blocker on the star trek computer is the hooks we wire into the agent to manage the computer. Current gen models are almost smart enough, though their long context support and ability to use tools are a little shaky in general (I have walked a lot of agents through using tools, correct shell command use needs more RL for sure). None of this is outlandish advances, it's all just the natural progression of the track we're on.
bopbopbop7 · 21h ago
You’re either a decent troll, or absolutely delusional.
bluefirebrand · 14h ago
I genuinely do not want this, it sounds like shit
tomrod · 23h ago
I won't use agents for everything. Why would I expect tasks to use agents for everything? This is like saying everything is on the web. No, there is substantial number of things on the web, but not everything.
anuramat · 23h ago
why would anyone want more non-determinism than absolutely necessary?
alehlopeh · 1d ago
Who maintains that UI library? Or does the AI create it on the fly too? Why even bother with a library at that point? Just do a bespoke implementation.
CuriouslyC · 1d ago
The library will exist to maintain high quality/consistency and reduce load times. Also, it's faster to generate a page with parameterized components than to recreate all the components. It's a win all around from an engineering perspective, and nobody has to maintain them, there could be an artifact registry where people publish their components and you or AI can just select nice ones for the given use case.
alehlopeh · 22h ago
Why are people publishing their components when the UI problem is over and no one builds UIs anymore?
CuriouslyC · 21h ago
A widget != a UI. I don't need a stripe app, but things like visualizations are still useful. I want to be able to pull up a graph of my sales on stripe over the last 72 hours using a specific type of plot, cross referenced with my promotions in a dashboard side by side with consistent colors so it's easy to scan. The agent will be able to pull high quality plots of the right type that theme according to my preferences and slot into my dashboard neatly, and I won't have to hassle with stripe or my adtech or analytics or any of that except to configure the agent.
tomrod · 23h ago
Contrary to my other comment, I 100% agree to this.
tomrod · 1d ago
I have a hard time determining if you are in support or critiquing the article. I'm 60% confident it is a critique (I jest, a play on the content :) ).
My view is that you need to transition slowly and carefully to AI first customer support.
1. Know the scope of problems an AI can solve with high probability. Related prompt: "You can ONLY help with the following issues."
2. Escalate to a human immediately if its out of scope: "If you cannot help, escalate to a human immediately by CCing bob@smallbiz.co"
3. Have an "unlocked agent" that your customer service person can use to answer a question and evaluate how well the agent performs in helping. Use this to drive your development roadmap.
4. If the "unlocked agent" becomes good at solving a problem, add that to the in-scope solutions.
Finally, you should probably have some way to test existing conversations when you make changes. (It's on my TODO list)
I've implemented this for a few small businesses, and the process is so seamless that no one has suspected interaction with an AI. For one client, there's not even a visible escalation step: they get pinged on their phone and take over the chat!
Using GenAI is a huge breakthrough in this field, because it is a socially acceptable way to tell someone you don't care about their issue.
The purpose has been achieved, in that there is a large drop rate. The product manager has met their goals, cut costs, and might be looking forward to their bonus.
It would be far more expensive to make the LLM behave effectively than it would be to do nothing. Any product manager that sincerely cared about customer support wouldn't be inflicting a personalised callous disregard for service. Instead they'd be focusing on improving documentation, help, and processes. But that's not innately quantifiable in a way that leads to bonuses, and therefore goes unnoticed.
It’s pretty simple. When a non-tech person sees faked demos of what it can do - it looks epic and everyone extrapolates results and thinks AI is that good.
LLMs ability to give convincing sounding answers is like catnip for service desk managers who have never actually been on the desk itself
In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.
Even assuming you've correctly auth'd the user contacting you (big assumption!), allowing that user to very literally prompt a 'semi-confident thing with tools' - however many layers of abstraction away the tool is - feels very, very far away from a real-world, sensible implementation right now.
Just shoot the tool prompts over to a human operator, if it's so necessary! Sense-check!
'Routing through increasingly specialised agents' was my approach, and the only thing that would've done the job (in MVP form) at the time. There weren't many models that would fit our (v good) CS & Product teams' dataset of "probable queries from customers" into a single context window.
I never personally got my MVP beyond sitting with it beside the customer support inbox, talking to customers. And AFAIK it never moved beyond that after I left.
Nor should it have been, probably - there are (wild, & mostly ineffable) trade-offs that you make the moment you stop actually talking to users at the very moment they get in touch. I don't remember ever making a trade-off like that where it was worthwhile.
I _do_ remember it as perhaps the most worthwhile time I ever spent doing product-y work.
I say that because: To consider a customer support query type that might be 0.005% of all queries received by the CS team, even my trash MVP had to walk a path down a pretty intricate tree of agents and possible query types.
So - if you believe that 'solving the problems users have with your product' = 'making a better product'. then talking to an LLM that was an advocate for a tiny subset of users, and knew very intimately the details of their issue with your product, that felt really good. It felt like it was a very pure version of what _I_ should be to devs, as any kind of interface between them and our users.
It was very hard to stay a believer in the idea of a 'PM' after seeing that, at least. As a person who preferred to just let people get on with things.
I enjoyed the linked post; it's really interesting to see how far things have come. I'm surprised nobody has built 'talk to your customers at scale', yet - this feels like a far more interesting problem than 'avoid talking to your customers at scale'.
I'm also not surprised, I guess, since it's an incredibly bespoke job to do properly, I imagine, for most products.
This sounds hard to pull off in a very similar way to getting good data through surveys.
I generally don't want to talk to my tools. If I'm motivated to talk to you, it's probably because something went wrong. And even if I talked to you when not annoyed, I'd struggle to articulate more than "it's working good" at any given moment - when what you really want as a product person is to know "it's working good, but I had to internalize this workaround for something for my use case that no I don't even think about but originally I found offputting and almost bounced because of" or whatever.
With current technology (LLM), how can an agent ever be sure about its confidence?
Calibrated Language Models Must Hallucinate
https://arxiv.org/abs/2311.14648
https://www.youtube.com/watch?v=cnoOjE_Xj5g
I get the feeling there's going to be either 1) a great revert of the features, 2) a bunch of hurried patches, or 3) a bunch of legacy systems operating on MCP v0.00-beta (metaphorically speaking)
:lol_sob:
Far better to focus on enhancing human capabilities with agents.
For example while a human talks to a customer on the phone, AI is fetching useful context about the customer and suggesting talking points to improve the human conversation.
One example of a direct benefit for business using AI this way is reducing onboarding times for new employees
In my book they ideally focus on understanding scope, user needs and how to measure success, while implementation details such as orchestration strategies, evaluation and making sure your system delivers the capabilities you want in general, are engineering responsibilities.
These models are tools, and LLM products bundles these tools with other tools, and 90% of UX amounts to bundling these well. The article here gives a great sense of what this takes.
Ok, but can you please make your substantive points without putting others down? Your comment wouold be fine without this bit.
https://news.ycombinator.com/newsguidelines.html
Why do you even have KDE installed if AI has replaced GUIs?
I agree with that.
But what you originally wrote was, "The AI bundling problem is over. The user interface problem is over." It would probably make more sense to say "...will be over."
People tend to be sensitive to those kinds of claims because there's a lot of hype around all this at the moment. So when people seem to imply that what we have right now is much more capable than it actually is, there tends to be pushback.
No comments yet