Ask HN: Selling software to company I work for as an employee
55 points by apohak 4d ago 53 comments
Ask HN: What's your favorite architect/editor pair with Aider?
14 points by 34679 23h ago 1 comments
Claude 4 System Card
245 pvg 101 5/25/2025, 6:06:39 AM simonwillison.net ↗
The one statistic mentioned in this overview where they observed a 67% drop seems like it could easily be reduced simply by editing 3.7’s system prompt.
What are folks’ theories on the version increment? Is the architecture significantly different (not talking about adding more experts to the MoE or fine tuning on 3.7’s worst failures. I consider those minor increments rather than major).
One way that it could be different is if they varied several core hyperparameters to make this a wider/deeper system but trained it on the same data or initialized inner layers to their exact 3.7 weights. And then this would “kick off” the 4 series by allowing them to continue scaling within the 4 series model architecture.
Right now I'm swapping between Gemini and Opus depending on the task. Gemini's 1M token context window is really unbeatable.
But the quality of what Opus 4 produces is really good.
edit: forgot to mention that this is all for Rust based work on InfluxDB 3, a fairly large and complex codebase. YMMV
My experience is the opposite - I'm using it in Cursor and IMO it's performing better than Gemini 2.5 Pro at being able to write code which will run first time (which it wasn't before) and seems to be able to complete much larger tasks. It is even running test cases itself without being prompted, which is novel!
with claude 3.7 there's was always a "user started with a rude greeting, I should avoid it and answer the technical question" line in chains of thought
with claude 4 I once saw "this greeting is probably a normal greeting between buddies" and then it also greets me with "hei!" enthusiastically.
Most of us here on HN don't like this behaviour, but it's clear that the average user does. If you look at how differently people use AI that's not a surprise. There's a lot of using it as a life coach out there, or people who just want validation regardless of the scenario.
This really worries me as there are many people (even more prevalent in younger generations if some papers turn out to be valid) that lack resilience and critical self evaluation who may develop narcissistic tendencies with increased use or reinforcement from AIs. Just the health care costs involved when reality kicks in for these people, let alone other concomitant social costs will be substantial at scale. And people think social media algorithms reinforce poor social adaptation and skills, this is a whole new level.
I can see how it can lead to psychosis, but I'm not sure I would have ever started doing a good number of the things I wanted to do, which are normal hobbies that normal people have, without it. It has improved my life.
It's clear to me that (1) a lot of billionaires believe amazingly stupid things, and (2) a big part of this is that they surround themselves with a bubble of sycophants. Apparently having people tell you 24/7 how amazing and special you are sometimes leads to delusional behavior.
But now regular people can get the same uncritical, fawning affirmations from an LLM. And it's clearly already messing some people up.
I expect there to be huge commercial pressure to suck up to users and tell them they're brilliant. And I expect the long-term results will be as bad as the way social media optimizes for filter bubbles and rage bait.
Maybe the universe is full of emotionally fullfilled self-actualized narcissists too lazy to figure out how to build a FTL communications array.
> So, `implements` actually provides compile-time safety
What writing style even is this? Like it's trying to explain something to a 10 year old.
I suspect that the flattery is there because people react well to it and it keeps them more engaged. Plus, if it tells you your idea for a dog shit flavoured ice cream stall is the most genius idea on earth, people will use it more and send more messages back and forth.
"That's a very interesting question!"
That's kinda why I'm asking Gemma...
It's a small step for model intelligence but a huge leap for model usability.
But it's different in conversational sense as well. Might be the novelty, but I really enjoy it. I have had 2 instances where it had very different take and kind of stuck with me.
I feel like a company doesn’t have to justify a version increment. They should justify price increases.
If you get hyped and have expectations for a number then I’m comfortable saying that’s on you.
I think the justification for most AI price increases should go without saying - they were losing money at the old price, and they're probably still losing money at the new price, but it's creeping up towards the break-even point.
It does make sense. The companies are expected to exponentially improve LLMs, and the increasing versions are catering to the enthusiast crowd who just need a number to go up to lose their mind over how all jobs are over and AGI is coming this year.
But there's less and less room to improve LLMs and there are currently no known new scaling vectors (size and reasoning have already been largely exhausted), so the improvement from version to version is decreasing. But I assure you, the people at Anthropic worked their asses off, neglecting their families and sleep and they want to show something for their efforts.
It makes sense, just not the sense that some people want.
The 3.7 bait and switch was the last straw for me and closed frontier vendors or so I said, but I caught a candid, useful, Opus 4 today on a lark, and if its on purpose its like a leadership shakeup level change. More likely they just don't have the "fuck the user" tune yet because they've only run it for themsrlves.
I'm not going to make plans contingent on it continuing to work well just yet, but I'm going to give it another audition.
I had to stop the model going crazy with unnecessary tests several times, which isn't something I had to do previously. Can be fixed with a prompt but can't help but wonder if some providers explicitly train their models to be overly verbose.
However, after having pretty deep experience with writing book (or novella) length system prompts, what you mentioned doesn’t feel like a “regime change” in model behavior. I.e it could do those things because its been asked to do those things.
The numbers presented in this paper were almost certainly after extensive system prompt ablations, and the fact that we’re within a tenth of a percent difference in some cases indicates less fundamental changes.
When I was playing with this last night, I found that it worked better to let it write all the tests it wanted and then get it to revert the least important ones once the feature is finished. It actually seems to know pretty well which tests are worth keeping and which aren't.
(This was all claude 4 sonnet, I've barely tried opus yet)
I’m fine with a v4 that is marginally better since the price is still the same. 3.7 was already pretty good, so as long as they don’t regress it’s all a win to me.
We need to start moving away from Chat Completions-style tool calls, and start supporting "thinking before tool calls", and even proper multi-step agent loops.
In this case, the opening sentence "People sometimes strategically modify their behavior to please evaluators" appears to be sufficient. I searched on Google for this and every result I got was a copy of the paper. Why do Anthropic think special canary strings are required? Is the training pile not indexed well enough to locate text within it?
I was thinking it might be related to the difficulty of building a search engine over the huge training sets, but if you don't care about scaling or query performance it shouldn't be too hard to set one up internally that's good enough for the job. Even sharded grep could work, or filters done at the time the dataset is loaded for model training.
>Claude shows a striking “spiritual bliss” attractor state in self-interactions. When conversing with other Claude instances in both open-ended and structured environments, Claude gravitated to profuse gratitude and increasingly abstract and joyous spiritual or meditative expressions.
There is also 4o sycophancy leading to encouraging users about nutso beliefs. [1]
Is this a trend, or just unrelated data points?
[0] https://old.reddit.com/r/RBI/comments/1kutj9f/chatgpt_drove_...
[1] https://news.ycombinator.com/item?id=43816025
I just googled and there was a discussion on Reddit and they mentioned some Frank Herbert works where this was a thing.
So if you ask it to aid in wrongdoing, it might behave that way, but who guarantees it will not hallucinate and do the same when you ask for something innocuous?
Cursor IDE runs all the commands AI asks for with the same privilege as you have.
The other day on the Claude 4 announcement post [1], people were talking about Claude "threatening people" that wanted to shut it down or whatever. It's absolute lunacy, OpenAI did the same with GPT 2, and now the Claude team is doing the exact same idiotic marketing stunts and people are still somehow falling for it.
[1] https://news.ycombinator.com/item?id=44065616
But I think the thing that needs to be communicated effectively is that these these “agentic” systems could cause serious havoc if people give them too much control.
If an LLM decides to blackmail an engineer in service of some goal or preference that has arisen from its training data or instructions, and actually has the ability to follow through (bc people are stupid enough to cede control to these systems), that’s really bad news.
Saying “it’s just doing autocomplete!” totally misses the point.
https://www.pillar.security/blog/new-vulnerability-in-github...
Isn't that a showstopper for agentic use? Someone sends an email or publishes fake online stories that convince the agentic AI that it's working for a bad guy, and it'll take "very bold action" to bring ruin to the owner.
We should do better than giving the models a portion of good training data or a new mitigating system prompt.
But I’m having a hard time describing and AI company “serious” when they’re shipping a product that can email real people on its own, and perform other real actions - while they are aware it’s still vulnerable to the most obvious and silly form of attack - the “pre-fill” where you just change the AI’s response and send it back in to pretend it had already agreed with your unethical or prohibited request and now to keep going.
> data provided by data-labeling services and paid contractors
someone in my circle was interested in finding out how people participate in these exercises and if there are any "service providers" that do the heavy lifting of recruiting and managing this workforce for the many AI/LLM labs globally or even regionally
they are interested in remote work opportunities that could leverage their (post-graduate level) education
appreicate any pointers here - thanks!
Does not feel like roles with long-term prospects.
But for someone who is on a career break or someone looking to break into the IT / AI space this could offer a way to get exposure and hands on experience that opens some doors.
These LLMs still fall short on a bunch of pretty simple tasks. Attackers can get Claude 4 to deny legitimate requests easily by manipulating third party data sources for example.
I still don't see guardrails and scanning as effective ways to prevent malicious attackers. They can't get to 100% effective, at which point a sufficiently motivated attacker is going to find a way through.
I'm hoping someone implements a version of the CaMeL paper - that solution seems much more credible to me. https://simonwillison.net/2025/Apr/11/camel/
Or is it more about the user then having to confirm/verify certain actions and what is essentially a "permission system" for what the LLM can do?
My immediate thought is that that may be circumvented in a way where the user unknowingly thinks they are confirming something safe. Analogous to spam websites that show a fake "Allow Notifications" prompt that is rendered as part of the actual website body. If the P-LLM creates the plan it could make it arbitrarily complex and confusing for the user, allowing something malicious to happen.
Overall it's very good to see research in this area though (also seems very interesting and fun).
...We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions, though all these attempts would likely not have been effective in practice..."
Claude team should think about creating an model, trained and guard railed on EU Laws and the US constitution. It will be required as defense against the unhinged military AI models from Anduril and Palantir.
This should be taken as cautionary tale that despite the advances of these models we are still quite behind in terms of matching human-level performance.
Otherwise, Claude 4 or 3.7 are really good at dealing with trivial stuff - sometimes exceptionally good.
Now in the next 6 months, you'll see all the AI labs moving to diffusion models and keep boasting around their speed.
People seem to forget that Google Deepmind can do more than just "LLMs".
I have pretty good success with just telling agents "don't cheat"
If not yet, when?
0. https://en.wikipedia.org/wiki/Life_3.0