LLMs are very useful tools for software development, but focusing on employment does not appear to really dig into if it will automate or augment labor (to use their words). Behaviors are changing not just because of outcomes but because of hype and expectations and b2b sales. You'd expect the initial corporate behaviors to look much the same whether or not LLMs turn into fully-fire-and-forget employee-replacement tools.
Some nits I'd pick along those lines:
>For instance, according to the most recent AI Index Report, AI systems could solve just 4.4% of coding problems on SWE-Bench, a widely used benchmark for software engineering, in 2023, but performance increased to 71.7% in 2024 (Maslej et al., 2025).
Something like this should have the context of SWE-Bench not existing before November, 2023.
Pre-2023 systems were flying blind with regard to what they were going to be tested with. Post-2023 systems have been created in a world where this test exists. Hard to generalize from before/after performance.
> The patterns we observe in the data appear most acutely starting in late 2022, around the time of rapid proliferation of generative AI tools.
This is quite early for "replacement" of software development jobs as by their own prior statement/citation the tools even a year later, when SWE-Bench was introduced, were only hitting that 4.4% task success rate.
It's timing lines up more neatly with the post-COVID-bubble tech industry slowdown. Or with the start of hype about AI productivity vs actual replaced employee productivity.
Some nits I'd pick along those lines:
>For instance, according to the most recent AI Index Report, AI systems could solve just 4.4% of coding problems on SWE-Bench, a widely used benchmark for software engineering, in 2023, but performance increased to 71.7% in 2024 (Maslej et al., 2025).
Something like this should have the context of SWE-Bench not existing before November, 2023.
Pre-2023 systems were flying blind with regard to what they were going to be tested with. Post-2023 systems have been created in a world where this test exists. Hard to generalize from before/after performance.
> The patterns we observe in the data appear most acutely starting in late 2022, around the time of rapid proliferation of generative AI tools.
This is quite early for "replacement" of software development jobs as by their own prior statement/citation the tools even a year later, when SWE-Bench was introduced, were only hitting that 4.4% task success rate.
It's timing lines up more neatly with the post-COVID-bubble tech industry slowdown. Or with the start of hype about AI productivity vs actual replaced employee productivity.