The original LZEXE (A.K.A. Kosinski) compressor source code has been released (clownacy.wordpress.com)

A few weeks ago, I processed a product refund with Amazon via agent. It was simple, straightforward, and surprisingly obvious that it was backed by a language model based on how it responded to my frustration about it asking tons of questions. But in the end, it processed my refund without ever connecting me with a human being.

I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for.

Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too.

automatic6131 · 1h ago

You can (used to?) get a refund on Amazon with normal CRUD app flow. Putting an SLM and a conversational interface over it is a backwards step.

torginus · 1h ago

I just had my first experience with a customer service LLM. I needed to get my account details changed, and for that I needed to use the customer support chat.

The LLM told me what sort of information they need, and what is the process, after which I followed through the whole thing.

After I went through the whole thing it reassured me everything is in order, and my request is being processed.

For two weeks, nothing happened, I emailed the (human) support staff, and they responded to me, that they can see no such request in their system, turns out the LLM hallucinated the entire customer flow and was just spewing BS at me.

dotancohen · 45m ago

This is reason number two why I always request the service ticket number.

Reason number one being that when the rep feels you are going to hold them accountable to the point of requesting such a number, you might not be the type of client to pull shenanigans with. Maybe they suspect me of being a cooperate QC agent? Either way, requesting such a number demonstrably reduces friction.

ttctciyf · 1h ago

There really should be some comeback for this type of enshAItification.

We're supposed to think "oh it's an LLM, well, that's ok then"? A question we'll be asking more frequently as time goes on, I suspect.

exe34 · 1h ago

That's why I take screenshots of anything that I don't get an email confirmation for.

quietbritishjim · 41m ago

Air Canada famously lost a court case recently (though the actual interaction happened in 2022) where their chat bot promised a discount that they didn't actually offer. They tried to argue that the chatbot was a "separate legal entity that is responsible for its own actions"!! It still took that person a court case and countless hours to get the discount so it's hardly a victory really.

https://www.bbc.co.uk/travel/article/20240222-air-canada-cha...

nurettin · 22m ago

This is why law in it's current form is wrong in every country and jurisdiction.

We need "cumulative cases" that work like this: you submit your complaints to existing cumulative cases or open a new one, these are vetted by prosecutors.

They accumulate evidence over time and once it is a respectable sum, a court case is opened (paid by the corporation) everyone receives what they are owed if/when the case is won. If the court case loses, is appealed, and loses again, that cumulative case is banned.

Cumulative cases would have greater reprocussions to large corporate entities than "single person takes to court for several months to fight for a $40 discount".

And the people who complain rightfully eventually get a nice surprise in their bank accounts.

iagooar · 36m ago

I think that part of the beauty of LLMs is their versatility in so many different scenarios. When I build my agentic pipeline, I can plug in any of the major LLMs, add a prompt to it, and have it go off to do its job.

Specialized, fine-tuned models sit somewhere in between LLMs and traditional procedural code. The fine-tuning process takes time and is a risk if it goes wrong. In the meantime, the LLMs by major providers get smarter every day.

Sure enough, latency and cost are a thing. But unless you have a very specific task performed at a huge scale, you might be better off using an off-the-shelf LLM.

flowerthoughts · 1h ago

No mention of mixture-of-exports. Seems related. They do list a DeepSeek R1 distillate as an SLM. The introduction starts with sales pitch. And there's a call-to-action at the end. This seems like marketing with source references sprinkled in.

That said, I also think the "Unix" approach to ML is right. We should see more splits, however currently all these tools rely on great language comprehension. Sure, we might be able to train a model on only English and delegate translation to another model, but that will certainly lose (much needed) color. So if all of these agents will need comprehensive language understanding anyway, to be able to communicate with each other, is SLM really better than MoE?

What I'd love to "distill" out of these models is domain knowledge that is stale anyway. It's great that I can ask Claude to implement a React component, but why does the model that can do taxes so-so also try to write a React component so-so? Perhaps what's needed is a search engine to find agents. Now we're into expensive market place subscription territory, but that's probably viable for companies. It'll create a larger us-them chasm, though and the winner takes it all.

mg · 1h ago

I wonder how the math turns out when we compare the energy use of local vs remote models from first principles.

A server needs energy to build it, house, power and maintain it. It is optimized for throughoutput and can be used 100% of the time. To use the server, additional energy is needed to send packets through the internet.

A local machine needs energy to build and power it. If it lives inside a person's phone or laptop, one could say housing and maintenance is free. It is optimized to have a nice form factor for personal use. It is used maybe 10% of the time or so. No energy for internet packages is needed when using the local machine.

My initial gut feeling is that the server will have way better energy efficiency when it comes to the amount of calculations it can do over its lifetime and how much energy it needs over its lifetime. But I would love to see the actual math.

danhor · 49m ago

As the local machine is there anyway, only the increase in energy usage should be considered, while the server only exists for this use case (distributed across all users).

The local machine is usually also highly constrained in computing power, energy (when battery driven) and thermals, I would expect the compute needed to be very different. The remote user will happily choose a large(r) model, while for the local use case a highly optimized (small) model will be chosen.

rayxi271828 · 1h ago

Wonder what I'm missing here. A smaller number of repetitive tasks - that's basically just simple coding + some RPA sprinkled on top, no?

Once you've settled down on a few well-known paths of action, wouldn't you want to freeze those paths and make it 100% predictable, for the most part?

janpmz · 3h ago

One could start with a large model for exploration during development, and then distill it down to a small model that covers the variety of the task and fits on a USB drive. E.g. when I use a model for gardening purposes, I could prune knowledge about other topics.

dotancohen · 17m ago

In what sense would you need an LLM while gardening for? I'm imagining for problem solving, like asking "what worm looks like a small horse hair". But that would require the LLM to know what a horse hair is. In other words, not a distilled model, but rather a model that contains pretty much anything our gardener's imagination will make analogies out of.

loktarogar · 3h ago

Pruning is exactly what you're looking for in a gardening SLM

moqizhengz · 1h ago

How is SLM the future of AI while we are not even sure about if LMs are the future of AI?

boxed · 1h ago

"Future" maybe means "next two months"? :P

eric-burel · 3h ago

Slightly related, on the cooperation between large models and small models (traditional ML) : https://arxiv.org/abs/2409.06857

ewuhic · 29m ago

Why is this a paper and not a blog post. Anyone who thinks it deserves to be a paper is either dumb or snakeoil salesman.

Restoring a ZX Spectrum+ Toastrack (celso.io)

YouTube No Translation (addons.mozilla.org)

I built something that changed my friend group's social fabric (blog.danpetrolito.xyz)

Genetic code enables zebrafish to mend damaged organs (caltech.edu)

I write type-safe generic data structures in C (danielchasehooper.com)

The new skill in AI is not prompting, it's context engineering (philschmid.de)

Getting weather data from my Acurite sensors was shockingly easy (jeffgeerling.com)

The hidden JTAG in a Qualcomm/Snapdragon device’s USB port (linaro.org)

Donkey Kong Country 2 and Open Bus (jsgroth.dev)

Transparent Electronics (are.na)

Noloco (YC S21) Is Hiring a Founders Associate in Barcelona (ycombinator.com)

Xfinity using WiFi signals in your house to detect motion (xfinity.com)

Proton joins suit against Apple for practices that harm developers and consumers (proton.me)

Ask HN: What Are You Working On? (June 2025)

Aging-related inflammation is not universal across human populations (publichealth.columbia.edu)

Claude Code now supports Hooks (docs.anthropic.com)

Pluto is a unique dialect of Lua with a focus on general-purpose programming (github.com)

GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf] (vldb.org)

Small language models are the future of agentic AI (arxiv.org)

Melbourne man discovers extensive model train network underneath house (sbs.com.au)

End of an Era (erasmatazz.com)

Show HN: A continuation of IRS Direct File that can be self-hosted (github.com)

Abstraction boundaries are optimization boundaries (blog.snork.dev)

Nimtable: Open-source web UI to browse and manage Apache Iceberg tables (github.com)

Sony DTC-700 audio DAT player/recorder (kevinboone.me)

People Keep Inventing Prolly Trees (dolthub.com)

The original LZEXE (A.K.A. Kosinski) compressor source code has been released (clownacy.wordpress.com)

A CarFax for Used PCs; Hewlett Packard wants to give old laptops new life (spectrum.ieee.org)

Jim Boddie codeveloped the first successful DSP at Bell Labs (spectrum.ieee.org)

Entropy of a Mixture (cgad.ski)

So you want to serialize some DER? (alexgaynor.net)

OpenFLOW – Quickly make beautiful infrastructure diagrams local to your machine (github.com)

Rust CLI with Clap (tucson-josh.com)

Virtue garnish: A mental hack to short-circuit bad habits (ledgeroflife.blog)

There are no new ideas in AI only new datasets (blog.jxmo.io)

Why email startups fail (forwardemail.net)

Public Signal Backups Testing (community.signalusers.org)

Creating fair dice from random objects (arstechnica.com)

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

Price of rice in Japan falls below ¥4k per 5kg (japantimes.co.jp)

Tiny macOS utility that mirrors an external monitor in a resizable window (beeno.app)

Printegrated Circuits: Merging 3D Printing and Electronics (spectrum.ieee.org)

Cloud-forming isoprene and terpenes from crops may drastically improve climate (smithsonianmag.com)

Bulk Lots of DB-19s for Sale (bigmessowires.com)

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken (github.com)

'Unprecedented' alerts in France as blistering heat grips Europe (bbc.com)

Reverse Engineering Vercel's BotID (nullpt.rs)

Meta Joins Kotlin Foundation (engineering.fb.com)

Show HN: Open-Source International Space Station Tracker ESP32/Arduino for $20 (github.com)

Jacobi Ellipsoid (en.wikipedia.org)

Small language models are the future of agentic AI

Comments (20)