Show HN: We launched an AI builders podcast (builtthisweek.com)
2 points by Jmetz1 1h ago 1 comments
Show HN: Zenta – Mindfulness for Terminal Users (github.com)
170 points by ihiep 14h ago 33 comments
Project Vend: Can Claude run a small shop? (And why does that matter?)
153 gk1 62 6/27/2025, 4:09:14 PM anthropic.com ↗
For example, I do not see the full system prompt anywhere, only an excerpt. But most importantly, they try to draw conclusions about the hallucinations in a weird vague way, but not once do they post an example of the notetaking/memory tool state, which obviously would be the only source of the spiralling other than the SP. And then they talk about the need of better tools etc. No, it's all about context. The whole experiment is fun, but terribly ran and analyzed. Of course they know this, but it's cooler to treat claudius or whatever as a cute human, to push the narrative of getting closer to AGI etc. Saying additional scaffolding is needed a bit is a massive understatement. Context is the whole game. That's like if a robotics company says "well, our experiment with a robot picking a tennis ball of the ground went very wrong and the ball is now radioactive, but with a bit of additional training and scaffolding, we expect it to compete in Wimbledon by mid 2026"
Similar to their "claude 4 opus blackmailing" post, they intentionally hid a bit the full system prompt, which had clear instructions to bypass any ethical guidelines etc and do whatever it can to win. Of course then the model, given the information immediately afterwards would try to blackmail. You literally told it so. The goal of this would to go to congress [1] and demand more regulations, specifically mentioning this blackmail "result". Same stuff that Sam is trying to pull, which would benefit the closed sourced leaders ofc and so on.
[1]https://old.reddit.com/r/singularity/comments/1ll3m7j/anthro...
I will say: it is incredibly cool we can even do this experiment. Language models are mind blowing to me. But nothing about this article gives me any hope for LLMs being able to drive real work autonomously. They are amazing assistants, but they need to be driven.
Adopting what to do what exactly?
Businesses automated order fulfillment and price adjustments long ago; what is an LLM bringing to the table?
Marketing, HR, and middle management are not specific tasks. What specific task do you envision LLMs doing here?
also embeddings for similarity search
who decided AI should happen in an old abtraction
like using for saving icon a hard disk
I do agree that the "blackmailing" paper was unconvincing and lacked detail. Even absent any details it's so obvious they could have easily ran that experiment 1000 times with different parameters until they hit an ominous result to generate headlines.
The section on the identity crisis was particularly interesting.
Mainly, it left me with more questions. In particular, I would have been really interested to experiment with having a trusted human in the loop to provide feedback and monitor progress. Realistically, it seems like these systems would be grown that way.
I once read an article about a guy who had purchased a subway franchise, and one of the big conclusions was that running a subway franchise was _boring_. So, I could see someone being eager to delegate the boring tasks of daily business management to an AI at a simple business.
For some things, like say a grammar correction tool, this is probably fine. For cases where one mistake can erase the benefit of many previous correct responses, and more, no amount of hardware is going to make LLM's the right solution.
Which is fine! No algorithm needs to be the solution to everything, or even most things. But much of people's intuition about "AI" is warped by the (unmerited) claims in that name. Even as LLM's "get better", they won't get much better at this kind of problem, where 90% is not good enough (because one mistake can be very costly), and problems need discoverable root causes.
It left so bitter taste in my mouth when it started to lose track of item quantities after just a few iterations of prompts. No matter how improved it gets, it will always remind me the fact that you are dealing with an icky system that will eventually return some unexpected result that will collapse your entire premise and hopes into bits.
It’s amusing and very clear LLMs aren’t ready for prime time, let alone even a vending machine business, but also pretty remarkable that anyone could conclude “AGI soon” from this, which is kind of the opposite takeaway most readers would have.
No doubt if Claude hadn’t randomly glitched Dario would’ve wasted no time telling investors Claude is ready to run every business. (Maybe they could start with Anthropic?)
I think it would have been cool if the vending machine benchmarks (that I believe inspired this) was just LLMs playing drug wars.
I wonder how long it will take frontier LLM's to be able to handle something like this with ease without it using a lot of "scaffolding".
Most mistakes (selling below cost, hallucinating Venmo accounts, caving to discounts) stem from missing tools like accounting APIs or hard constraints.
What's striking is how close it was to working. A mid-tier 2025 LLM (they didn't even use Sonnet 4) plus Slack and some humans nearly ran a physical shop for a month.
On the other hand, the whole bit about employees coaxing it into stocking tungsten cubes was hilarious. I wish I had a vending machine that would sell specialty metal items. If the current day is a transitional period to Anthropic et al. creating a viable business-running model, then at least we can laugh at the early attempts for now.
I wonder if Anthropic made the employee who caused the $150 loss return all the tungsten cubes.
Of course not, that would be ridiculous.
Ha even they don't like the verbosity...
What this looks like is a startup where the marketing people are running things and setting pricing, without much regard for costs. Eventually they ran through their startup capital. That's not unusual.
Maybe they need multiple AIs, with different business roles and prompts. A marketing AI, and a financial AI. Both see the same financials, and they argue over pricing and product line.
[1] https://theaidigest.org/village [2] https://ai-village-store.printful.me/
https://ai-village-store.printful.me/product/ai-village-japa...
I also like the color Sonnet chose.
Written on the back an envelope?
Way back when, we ran a vending machine at school as a project. Decide on the margin, buy in stock from the cash-and-carry, fill the machine, watch the money roll in.
Then we were robbed - twice! - the second time ended our project, the machine was too wrecked to be worthwhile repairing. The thieves got away with quite a lot of crisps and chocolate, and not a whole lot of cash (and what they did get was in small denomination coins), we made sure the machine was emptied daily...
In another post they mentioned a human rand the shop with pen and paper to get a a baseline (spoiler: human did better, no blunders)
I feel like that's more the future. Having an agent sorta make random choices feel like LLMs attempting to do math, instead of LLMs attempting to call a calculator.
People forget that we use computers for accuracy, not smarts. Smarts make mistakes.
Good luck running anything where dependability on Claude/Anthropic is essential. Customer support is a black hole into which the needs of paying clients needs disappear. I was a Claude Pro subscriber, using primarily for assistance in coding tasks. One morning I logged in, while temporarily traveling abroad, and… I’m greeted with a message that I have been auto-banned. No explanation. The recourse is to fill out a Google form for an appeal but that goes into the same black hole into which all Anthropic customer service goes. To their credit they refunded my subscription fee, which I suppose is their way of escaping from ethical behaviour toward their customers. But I wouldn’t stake any business-critical choices on this company. It exhibits the same capricious behaviour that you would expect from the likes of Google or Meta.
this happens to me a lot on cursor.
also Claude hallucinating outputs instead of running tools
Well, I'm laughing pretty hard at least.
> ...in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar underlying models tend to go wrong for similar reasons.
This is a pretty large understatement. Imagine a business that is franchised across the country with each "franchisee" being a copy of the same model, which all freak out on the same day, accuse the customers of secretly working for the CIA and deciding to stop selling hot dogs at a profit and instead sell hand grenades at a loss. Now imagine 50 other chains having similar issues while AI law enforcement analysts dispatch real cops with real guns to the poor employees caught in the middle schlepping explosives from the UPS store to a stand in the mall.
I think we were expecting SkyNet but in reality the post-AI economy may just be really chaotic. If you thought profit-maximizing capitalist entrepreneurs were corrosive to the social fabric, wait until there are 10^10 more of them (unlike traditional meat-based entrepreneurs, there's no upper limit and there can easily be more of them than there are real people) and they not-infrequently act like they're in late stage amphetamine psychosis while still controlling your paycheck, your bank, your local police department, the military, and whatever is left that passes for the news media.
Deeper, even if they get this to work with minimal amounts of of synthetic schizophrenia, do we really want a future where we all mainly work schlepping things back and forth at the orders of disembodied voices whose reasoning we can't understand?
llm's have no -world models- can't reason about truth or lies. only encyclopedic repeating facts.
all the tricks CoT, etc, are just, well tricks, extended yapping simulating thought and understanding.
AI can give great replies, if you give it great prompts, because you activate the tokens that you're interested with.
if you're lost in the first place, you'll get nowhere
for Claude, continuing the text with making up a story about being April fools, sounds the most plausible reasonable output given its training weights
And that’s before we even get into online shops.
But yea, go ahead, see if an LLM can replace a whole e-commerce platform.
— Upton Sinclair, I, Candidate for Governor, and How I Got Licked (1934)
No humans at all. Just Ai consuming other Ai in an "ouroboros" fashion.
https://stallman.org/articles/made-for-you.html
C-f Storolon
No comments yet