Ask HN: Why hasn't x86 caught up with Apple M series?
423 points by stephenheron 2d ago 608 comments
Ask HN: Best codebases to study to learn software design?
100 points by pixelworm 4d ago 90 comments
Canaries in the Coal Mine? Recent Employment Effects of AI [pdf]
67 p1esk 52 8/28/2025, 2:28:19 AM digitaleconomy.stanford.edu ↗
Some nits I'd pick along those lines:
>For instance, according to the most recent AI Index Report, AI systems could solve just 4.4% of coding problems on SWE-Bench, a widely used benchmark for software engineering, in 2023, but performance increased to 71.7% in 2024 (Maslej et al., 2025).
Something like this should have the context of SWE-Bench not existing before November, 2023.
Pre-2023 systems were flying blind with regard to what they were going to be tested with. Post-2023 systems have been created in a world where this test exists. Hard to generalize from before/after performance.
> The patterns we observe in the data appear most acutely starting in late 2022, around the time of rapid proliferation of generative AI tools.
This is quite early for "replacement" of software development jobs as by their own prior statement/citation the tools even a year later, when SWE-Bench was introduced, were only hitting that 4.4% task success rate.
It's timing lines up more neatly with the post-COVID-bubble tech industry slowdown. Or with the start of hype about AI productivity vs actual replaced employee productivity.
Given the absurdly common mal practice(1) of training LLMs on/for tests i.e. what you could describe as training on the test set any widely used/industry standard test to evaluate LLMs is not really worth half of what it claims it is.
(1): Which is at least half intend, but also to some degree accident due to web scrabbling, model cross training etc. having a high chance to accidentally sneak in test data.
In the end you have to have your own tests to evaluate agent/LLM performance, and worse you have to not make them public out of fare of scientific malpractice turning them worthless. Tbh. that is a pretty shitty situation.
You're probably working in a domestic company which usually pays less than offshored jobs by a large transnational (and domestics say in Russia were paying significantly less than the offshored). I don't think many companies do significant offshoring into Western Europe though.
But with progress continuing in the models, too, it's an even more complicated affair.
While this is true, there are ways to test (open models) on tasks created after the model was released. We see good numbers there as well, so something is generalising there.
That's an opinion many disagree with. As a matter of fact, the only limited study up to date showed that LLMs usage decrease productivity for experienced developers by roughly 19%. Let's reserve opinions and link studies.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
My anecdotal experience, for example, is that LLMs are such a negative drain on both time and quality that one has to be really early in their career to benefit from their usage.
1. Converting exported data into a suitable import format based on a known schema 2. Creating syntax highlighting rules for language not natively support in a Typst report
Both situations didn't have an existing solution, and while the outputs were not exactly correct, they only needed minor adjustments.
Any other situation, I'd generally prefer to learn how to do the thing, since understanding how to do something can sometimes be as important as the result.
Seems about right when trying to tell an LLM what to code. But flipping the script, letting the LLM tell you what to code, productivity gains seem much greater. Like most programmers will tell you: Writing code isn't the part of software development that is the bottleneck.
Writing code is a bit crazy, maybe writing tedious test case variations.
But asking an LLM questions about a well established domain you're not expert in is a fantastic use case. And very relevant for making software. In practice, most software requires you to understand the domain your aiming to serve.
Though I use claude code. The setup is mostly stock, though I do have a hook that feeds the output of `ghciwatch` back into claude directly after editing. I think this helps.
- I find the code quality to be so-so. It is much more into if-then-else than the style is to yolo for my liking. - I don't rely on it for making architectural decisions. We do discuss when I'm unsure though. - I do not use it for critical things such as data migrations. I find that the errors is makes are easy to miss, but not something I do myself. - I let it build "leaves" that are not so sensitive more freely. - If you define the tasks well with types then it works faily well. - cluade is very prone to writing tests that test nothing. Last week it wrote a test that put 3 tuples with strings in a list and checked the length of the list and that none of the strings where empty. A slight overfit on untyped languages :) - In my experience, the uplift from Opus vs Sonnet is much larger when doing Haskell than JS/Python. - It matters a lot if the project is well structured. - I think there is plenty of room to improve with better setup, even without models changing.
The quality of the Haskell code is about as good as I would have written myself, though I think it falls for primitive obsession more than I would. Still, I can add those abstractions myself after the fact.
Maybe one of the reasons I'm getting good results is because the LLM effectively has to argue with GHC, and GHC always wins here.
I've found that it's a superpower also for finding logic bugs that I've missed, and for writing SQL queries (which I was never that good at).
Claude code is nice because it is just a separate cli tool that doesn't force you to change editor etc. It can also research things for you, make plans that you can iterate before letting it loose, etc.
Claude is also better than chatgpt at writing haskell in my experience.
They are not great if your tasks are not well defined. Sometimes, they suprise you with great solutions, sometimes they produce mess that just wastes your time and deviates from your mission.
To, me LLMs have been great accelerants when you know what you want, and can define it well. Otherwise, they can waste your time by creating a lot of code slop, that you will have to re-write anyways.
One huge positive sideffect, is that sometimes, when you create a component, (i.e. UI, feature, etc), often you need a setup to test, view controllers, data, which is very boring and annoying / time wasting to deal. LLM can do that for you within seconds (even creating mock data), and since this is mostly test code, it doesn't matter if the code quality is not great, it just matters to get something in the screen to test the real functionality. AI/LLMs have been a huge time savers for this part.
When it's a problem lots of people banged their head against and wrote posts about similar solutions, that makes for good document-prediction. But maybe we should've just... removed the pain-point.
It's no surprise to me that devs who are accustomed to working on one thing at a time due to fast feedback loops have not learned to adapt to paralellizing their work (something that has been demonized at agile style organizations) and sit and wait on agents and start watching YouTube instead, as the study found (productivity hits were due to the participants looking at fun non-work stuff instead of attempting to parallelize any work).
The study reflects usage of emergent tools without training, and with regressive training on previous generation sequential processes, so I would expect these results. If there is any merit in coordinating multiple agents on slower feedback work, this study would not find it.
If the study showed that experienced developers suffered a negative performance impact while using an LLM, maybe where LLMs shine are with junior developers?
Until a new study that shows otherwise comes out, it seems the scientific conclusion is that junior developers, the ones with the skill issues, benefit from using LLMs, while more experienced developers are impacted negatively.
I look forward to any new studies that disprove that, but for now it seems settled. So you were right, might indeed be a skills issue if LLMs help a developer and if they do, it might be the dev is early in their career. Do LLMs help you, out of curiosity?
For example, everyone now writes emails with perfect grammar in a fraction of a time. So now the expectation for emails is that they will have perfect grammar.
Or one can build an interactive dashboard to visualize their spreadsheet and make it pleasing. Again the expectation just changed. The bar is higher.
So far I have not seen productivity increase in dimensions with direct sight to revenue. (Of course there is the niche of customer service, translation services etc that already were in the process of being automated)
I had a conversation with my manager about the implications of everyone using AI to write/summarise everything. The end result will most likely be staff getting Copilot to generate a report, then their manager uses Copilot to summarise the report and generate a new report for their manager, ad inifinitum.
Eventually all context is lost, busywork is amplified, and nobody gains anything.
why not fire everyone in between the top-most manager and the actual "worker" doing the work, as the report could be generated with the correct level of summary?
https://www.cnbc.com/2025/08/27/google-executive-says-compan...
And absolutely bloody _hideous_ style, if they are using our friends the magic robots to do this.
You do not need to build a spreadsheet visualiser tool there are plenty of options that exist and are free and open source.
I'm not against advances, I'm just really failing to see what problem was in need of solving here.
The only use I can get behind is the translation, which admittedly works relatively well with LLMs in general due to the nature of the work.
https://www.fool.com/investing/2024/11/29/this-magnificent-s...
Think like a forestry investor, not a cash crop next season.
(This isn’t unique to IT; this cyclical underinvest-shortage-panic pattern happens in a lot of industries.)
All to say we could have quite a bit more resilience as an economy, but we decided to sacrifice our leadership in these areas.