It's interesting to compare this with the Post Office Scandal in the UK. Very different incidents, but reading this, there is arguably a root assumption in both cases that people made, which is that "the software can't be wrong". For developers, this is a hilariously silly thing, but for non-developers looking at it from the outside, they don't have the capability or training to understand that software can be this fragile. And they look at a situation like the post office scandal and think "Either this piece of software we paid millions for and was developed by a bunch of highly trained engineers is wrong, or these people are just ripping us off". Same thing with Therac-25, this software had worked on previous models and the rest of the company just had this unspoken assumption that it simply wasn't possible that there was anything wrong with it, so testing it specifically wasn't needed.
benrutter · 2h ago
> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.
If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.
There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.
I was going to recommend that exact podcast episode but you beat me to it. Totally worth listening, especially if you're interested in software bugs.
Another interesting fact mentioned in the podcast is that the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized. Excellent demonstration of the Swiss Cheese Model: https://en.wikipedia.org/wiki/Swiss_cheese_model
AdamN · 1h ago
This is true but there also needs to be good developers as well. It can't just be great process and low quality developer practices. There needs to be: 1/ high quality individual processes (development being one of them), 2/ high quality delivery mechanisms, 3/ feedback loops to improve that quality, 4/ out of band mechanisms to inspect and improve the quality.
Fr3dd1 · 1h ago
I would argue that a good process always has a good self correction mechanism built in. This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process.
quietbritishjim · 1h ago
Right, but if everyone is low quality then there's no one to do that correction.
That may seem a bit hypothetical but it can easily happen if you have a company that systematically underpays, which I'm sure many of us don't need to think hard to imagine, in which case they will systematically hire poor developers (because those are the only ones that ever applied).
ZaoLahma · 35m ago
Replace the "hire poor developers" with "use LLM driven development", and you have the rough outline for a perfect Software Engineering horror movie.
It used to be that the poor performers (dangerous hip-shootin' code commitin' cowpokes) were limited in the amount of code that they could produce per time unit, leaving enough time for others to correct course. Now the cowpokes are producing ridiculous amount of code that you just can't keep up with.
anal_reactor · 22m ago
Sad truth is that average dev is average, but it's not polite to say this out loud. This is particularly important at scale - when you are big tech at some point you hit a wall and no matter how much you pay you can't attract any more good devs, simply because all good devs are already hired. This means that corporate processes must be tailored for average dev, and exceptional devs can only exist in start-ups (or hermetically closed departments). The side effect of that is that whole job market promotes the skill of fitting into corporate environment over the skill of programming. So an a junior dev, for me it makes much more sense to learn how to promote my visibility during useless meetings, rather than learn a new technology. And that's how the bar keeps getting lower.
haunter · 9m ago
My "favorite" part:
>One failure occurred when a particular sequence of keystrokes was entered on the VT100 terminal that controlled the PDP-11 computer: If the operator were to press "X" to (erroneously) select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress and well within the capability of an experienced user of the machine, the edit would not be processed and an overdose could be administered. These edits were not noticed as it would take 8 seconds for startup, so it would go with the default setup
Kinda reminds me how everything is touchscreen nowadays from car interfaces to industry critical software
elric · 1h ago
One of the commenters on the article wrote this:
> Throughout the 80s and 90s there was just a feeling in medicine that computers were dangerous <snip> This is why, when I was a resident in 2002-2006 we still were writing all of our orders and notes on paper.
I was briefly part of an experiment with electronic patient records in an ICU in the early 2000s. My job was to basically babysit the server processing the records in the ICU.
The entire staff hated the system. They hated having to switch to computers (this was many years pre-ipad and similarly sleek tablets) to check and update records. They were very much used to writing medications (what, when, which dose, etc) onto bedside charts, which were very easy to consult and very easy to update. Any kind of dataloss in those records could have fatal consequences. Any delay in getting to the information could be bad.
This was *not* just a case of doctors having unfounded "feelings" that computers were dangerous. Computers were very much more dangerous than pen and paper.
I haven't been involved in that industry since then, and I imagine things have gotten better since, but still worth keeping in mind.
jacquesm · 54m ago
Now we have Chipsoft, arguably one of the worst players in the entire IT space that has a near monopoly (around me, anyway) on IT for hospitals. They charge a fortune, produce crap software and the larger they get the less choice there is for the remainder. It is baffling to me that we should be enabling such hostile players.
misja111 · 34m ago
I worked for them in the early 2000's. There was nothing wrong with the people working there, except for the two founders, a father and son. They were absolutely ruthless. And as so often, that ruthless mentality was what enabled them to gain dominance over the market. I could tell some crazy stories about how they ran the company but better not because it might get me sued.
But if you understand Dutch, you can read more about them e.g. here: https://www.quotenet.nl/zakelijk/a41239366/chipsoft-gerrit-h...
skinwill · 43m ago
Around here we have Epic. If you want a good scare, look up their corporate Willy Wonka-esq jail/campus and their policy of zero remote work.
greazy · 33m ago
It's still an issue. I've heard stories of EMR system going down forcing staff to use pen and paper. It boggles my mind that such systems don't have redundancy.
These are commercial products being deployed.
isopede · 2h ago
I strongly believe that we will see an incident akin to Therac-25 in the near future. With as many people running YOLO mode on their agents as there are, Claude or Gemini is going to be hooked up to some real hardware that will end up killing someone.
Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.
SCdF · 1h ago
The Horizon (UK Royal Mail accounting software) incident killed multiple postmasters through suicide, and bankrupted and destroyed the lives of dozens or hundreds more.
The core takeaway developers should have from Therac-25 is not that this happens just on "really important" software, but that all software is important, and all software can kill, and you need to always care.
maweki · 19m ago
But there is still a difference here. Provenance and proper traceability would have allowed the subpostmasters to show their innocence and prove the system failable.
In the Therac-25 case, the killing was quite immediate and it would have happened even if the correct radiation dose was recorded.
hahn-kev · 1h ago
From what I've read about that incident I don't know what the devs could have done. The company sure was a problem but also the laws basically saying a computer can't be wrong. No dev can solve that problem.
sim7c00 · 58m ago
as you point out this was a messup on a lot of levels. its an interesting effect tho not to be dismissed. how your software works and how its perceived and trusted can impact people psychologically.
fuckaj · 57m ago
Given whole truth testimony?
the-grump · 1h ago
The 737 MAX MCAS debacle was one such failure, albeit involving a wider system failure and not purely software.
Agreed on the future but I think we were headed there regardless.
jonplackett · 1h ago
Yeah reading this reminded me a lot of MCAS. Though MCAS was intentionally implemented and intentionally kept secret.
sim7c00 · 59m ago
talk to anyone in the industries about 'automation' on medical or critical infra devices and they will tell you NO. No touching our devices with your rubbish.
i am pretty confident they wont let claude touch if it they dont even let deterministic automations run...
that being said, maybe there are places. but this is always the sentiment i got. no automating, no scanning, no patching. device is delivered certified and any modifications will invalidate that. any changes need to be validated and certified.
its a different world that makin apps thats for sure.
not to say mistakes arent made and change doesnt happen, but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon... give the folks in safety critical system engineering some credit..
throwaway0261 · 2m ago
> but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon
I don't have the same faith in corporate leadership as you, at least not when they see potentially huge savings by firing some of the expensive developers and using AI to write more of the code.
Maxion · 1h ago
> Personally, I've found even the latest batch of agents fairly poor at embedded systems
I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).
E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.
rossant · 1h ago
The first commenter on this site introduces himself as "a physician who did a computer science degree before medical school." He is now president of the Ray Helfer Society [1], "an honorary society of physicians seeking to provide medical leadership regarding the prevention, diagnosis, treatment and research concerning child abuse and neglect."
While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).
I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).
I'd be interested in knowing how many of y'all are being taught about this sort of thing in college ethics/safety/reliability classes.
I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.
Is this sort of thing still taught to engineers and developers in college these days?
It was taught in a first year software ethics class on my Computer Science programme. Back in 2010. I'm wondering if they still do
3D30497420 · 1h ago
I studied design and I wish we'd had a design ethics class, which would have covered instances like this.
mdavid626 · 1h ago
Some sanity checks are always a good idea before running such destructive action (IF beam_strength > REASONABLY_HIGH_NUMBER THEN error). Of course the UI bug is hard to catch, but the sanity check would have prevented this completely and the machine would just end up in an error, rather than killing patients.
b_e_n_t_o_n · 1h ago
invariants are so useful to enforce even for toy projects. they should never be triggered outside of dev, but if they do sometimes it's better to just let it crash.
bzzzt · 56m ago
Making sure the beam is off before crashing would be better though.
linohh · 1h ago
In my university this case was (and probably still is) subject of the first lecture in the first semester. A lot to learn here and one of the prime examples how the DEPOSE model [Perrow 1984] works for software engineering.
Forgret · 1h ago
What surprised me most was that only one developer was working on such an unpredictable technology, whereas I think I need at least 5 developers to be able to discuss options.
vemv · 1h ago
My (tragically) favorite part is, from wikipedia:
> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.
Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".
rgoulter · 44m ago
Hmm. "poor software design" suggests a high risk that something might go wrong; "poor development practice" suggests that mistakes won't get caught/remedied.
By focusing on particular errors, there's the possibility you'll think "problem solved".
By focusing on process, you hope to catch mistakes as early as possible.
mellosouls · 1h ago
TIL TheDailyWTF is still active. I'd thought it had settled to greatest hits only some years ago.
greatgib · 54m ago
This story is kind of old.
But also I'm suspicious that this was an AI generated content due to this weird paragraph (one becoming "they"):
It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.
edot · 41m ago
Isn’t that the pronoun to use when you’re unsure of gender? This article didn’t feel AI-y to me.
I was taught this incident in university many years ago. It's undeniably an important lesson that shouldn't be forgotten
amelius · 1h ago
> The Therac-25 was the first entirely software-controlled radiotherapy device.
This says it all.
voxadam · 24m ago
(2021)
autonomousErwin · 2h ago
This reminds me of the Belgium 2003 election that was impossibly skewered by a supernova light years away sending charged particles which manage to get through our atmosphere (allegedly) and flipping a bit. Not the only case it's happened.
jve · 1h ago
On the bright side, wow, those computers are really sturdy: takes a whole supernova to just flip a bit :)
kijin · 1h ago
Well the thing is, millions of stars go supernova in the observable universe every single day. Throw in the daily gamma ray burst as well, and you've got bit flips all over the place.
napolux · 2h ago
The most deadly bug in history. If you know any other deadly bug, please share! I love these stories!
One member of the development team, David McDonnell, who had worked on the Epos system side of the project, told the inquiry that “of eight [people] in the development team, two were very good, another two were mediocre but we could work with them, and then there were probably three or four who just weren’t up to it and weren’t capable of producing professional code”.
What sort of bugs resulted?
As early as 2001, McDonnell’s team had found “hundreds” of bugs. A full list has never been produced, but successive vindications of post office operators have revealed the sort of problems that arose. One, named the “Dalmellington Bug”, after the village in Scotland where a post office operator first fell prey to it, would see the screen freeze as the user was attempting to confirm receipt of cash. Each time the user pressed “enter” on the frozen screen, it would silently update the record. In Dalmellington, that bug created a £24,000 discrepancy, which the Post Office tried to hold the post office operator responsible for.
Another bug, called the Callendar Square bug – again named after the first branch found to have been affected by it – created duplicate transactions due to an error in the database underpinning the system: despite being clear duplicates, the post office operator was again held responsible for the errors.
BoxOfRain · 1h ago
More heads should have rolled over this in my opinion, absolutely despicable that they cheerfully threw innocent people in prison rather than admit their software was a heap of crap. It makes me so angry this injustice was allowed to prevail for so long because nobody cared about the people being mistreated and tarred as thieves as long as they were 'little people' of no consequence, while senior management gleefully covered themselves in criminality to cover for their own uselessness.
It's an archetypal example of 'one law for the connected, another law for the proles'.
benrutter · 2h ago
Probably many rather than a single bug, but the botched London Ambulance dispatch software from the 90s, is probably one of the most deadly software issues of all time, although there aren't any estimates I know of that try to quantify the number of lives lost as a result.
Not even close. Israel apparently has AI bombing target intel & selection systems called Gospel and Lavender - https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai.... Claims are these systems have a selectivity of 90% per bombing, and they were willing to bomb up to 20 civilians per person classified by the system as a Hamas member. So assuming that is true, 90% of the time, they kill one Hamas member, and up to 20 innocents. 10% of the time, they kill up to 21 innocents and no Hamas members.
Killing 20 innocents and one Hamas member is not a bug - it is callous, but that's a policy decision and the software working as intended. But when it is a false positive (10% of the time), due to inadequate / outdated data and inadequate models, that could reasonably classified as a bug - so all 21 deaths for each of those bombings would count as deaths caused by a bug. Apparently (at least earlier versions) of Gospel were trained on positive examples that mean someone is a member of Hamas, but not on negative examples; other problems could be due to, for example, insufficient data, and interpolation outside the valid range (e.g. using pre-war data about, e.g. how quickly cell phones are traded, or people movements, when behaviour is different post-war).
I'd therefore estimate that deaths due to classification errors from those systems is likely in the thousands (out of the 60k+ Palestinian deaths in the conflict). Therac-25's bugs caused 6 deaths for comparison.
danadam · 1h ago
Some Google Pixel phones couldn't dial emergency number (still can't?). I don't know if there were any deadly consequences of that.
The MCAS related bugs @ Boeing led to 300+ deaths, so it's probably a contender.
solids · 2h ago
Was that a bug or a failure to inform pilots about a new system?
thyristan · 1h ago
In the same vein one could argue that Therac-25 was not actually a software bug but a hardware problem. Interlocks, that could have prevented the accidents and that where present in earlier Therac models, were missing. The software was written with those interlocks in mind. Greedy management/hardware engineers skipped them for the -25 version.
It's almost never just software. It's almost never just one cause.
actionfromafar · 1h ago
Just to point it out even clearer - there's almost never a rootcause.
AdamN · 1h ago
Both - and really MCAS was fine but the issue was the metering systems (Pitot tubes) and the handling of conflicting data. That part of the puzzle was definitely a bug in the logic/software.
phire · 56m ago
That wasn't a bug.
They deliberately designed it to only look at one of the Pitot tubes, because if they had designed it to look at both, then they would have had to implement a warning message for conflicting data.
And if they had implemented a warning message, they would have had to tell the pilots about the new system, and train them how to deal with it.
It wasn't a mistake in logic either. This design went through their internal safety certification, and passed.
As far as I'm aware, MCAS functioned exactly as designed, zero bugs. It's just that the design was very bad.
kijin · 1h ago
Remember the Airbus that crashed in the middle of the Atlantic because one of the pilots kept pulling on his yoke, and the computer decided to average his input with normal input from the other pilot?
Conflict resolution in redundant systems seems to be one of the weakest spots in modern aircraft software.
NitpickLawyer · 1h ago
I would say plenty of both. They obviously had to inform the pilots, but the way the system didn't reset permanently after 2-3 (whatever) sessions of "oh, the pilot trimmed manually, after 10 seconds we keep doing the same thing" was a major major logic blunder. Failure all across the board, if only from the perspective of end-to-end / integration testing if nothing else.
Worryingly, e2e / full integration testing was also the main cause of other Boeing blunders, like the Starliner capsule.
fuckaj · 50m ago
Not a bug. A non airworthy plane they tried to patch up with software.
reorder9695 · 19m ago
The plane was perfectly airworthy without MCAS, that was never the issue. The issue was it handled differently enough at high angles of attack to the 737NG that pilots would've needed additional training or possibly a new type rating without MCAS changing the trim in this situation. The competition (Airbus NEO family) did not need this kind of new training for existing pilots, so airlines being required to do this for new Boeing but not Airbus planes would've been a huge commercial disadvantage.
fuckaj · 4m ago
I may have understood wrong but thought is possible to get into an unrecoverable stall?
echelon · 2h ago
The 737 Max MCAS is arguably a bug. That killed 346 people.
Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.
To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.
b_e_n_t_o_n · 1h ago
> To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.
These kind of calculations always make me wonder...say someone wasted one minute of everybody's life, is the cost ~250 lives? One minute? Somewhere in between?
rvz · 2h ago
We're more likely to get a similar incident like this very quickly if we continue with the cult of 'vibe-coding' and throwing away basic software engineering principles out of the window as I said before. [0]
Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.
What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.
Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.
The idea of 'vibe-coding' safety critical software is beyond terrifying. Timing and safety critical software is hard enough to talk about intelligently, even harder to code, harder yet to audit, and damn near impossible to debug, and all that's without neophyte code monkeys introducing massive black boxes full of poorly understood voodoo to the process.
auggierose · 1h ago
Wondering if that "one developer" is here on HN.
Forgret · 59m ago
Hahaha, it would be interesting, maybe he just commented on the post here?
If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.
There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.
[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...
Another interesting fact mentioned in the podcast is that the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized. Excellent demonstration of the Swiss Cheese Model: https://en.wikipedia.org/wiki/Swiss_cheese_model
That may seem a bit hypothetical but it can easily happen if you have a company that systematically underpays, which I'm sure many of us don't need to think hard to imagine, in which case they will systematically hire poor developers (because those are the only ones that ever applied).
It used to be that the poor performers (dangerous hip-shootin' code commitin' cowpokes) were limited in the amount of code that they could produce per time unit, leaving enough time for others to correct course. Now the cowpokes are producing ridiculous amount of code that you just can't keep up with.
>One failure occurred when a particular sequence of keystrokes was entered on the VT100 terminal that controlled the PDP-11 computer: If the operator were to press "X" to (erroneously) select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress and well within the capability of an experienced user of the machine, the edit would not be processed and an overdose could be administered. These edits were not noticed as it would take 8 seconds for startup, so it would go with the default setup
Kinda reminds me how everything is touchscreen nowadays from car interfaces to industry critical software
> Throughout the 80s and 90s there was just a feeling in medicine that computers were dangerous <snip> This is why, when I was a resident in 2002-2006 we still were writing all of our orders and notes on paper.
I was briefly part of an experiment with electronic patient records in an ICU in the early 2000s. My job was to basically babysit the server processing the records in the ICU.
The entire staff hated the system. They hated having to switch to computers (this was many years pre-ipad and similarly sleek tablets) to check and update records. They were very much used to writing medications (what, when, which dose, etc) onto bedside charts, which were very easy to consult and very easy to update. Any kind of dataloss in those records could have fatal consequences. Any delay in getting to the information could be bad.
This was *not* just a case of doctors having unfounded "feelings" that computers were dangerous. Computers were very much more dangerous than pen and paper.
I haven't been involved in that industry since then, and I imagine things have gotten better since, but still worth keeping in mind.
These are commercial products being deployed.
Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.
The core takeaway developers should have from Therac-25 is not that this happens just on "really important" software, but that all software is important, and all software can kill, and you need to always care.
In the Therac-25 case, the killing was quite immediate and it would have happened even if the correct radiation dose was recorded.
Agreed on the future but I think we were headed there regardless.
i am pretty confident they wont let claude touch if it they dont even let deterministic automations run...
that being said, maybe there are places. but this is always the sentiment i got. no automating, no scanning, no patching. device is delivered certified and any modifications will invalidate that. any changes need to be validated and certified.
its a different world that makin apps thats for sure.
not to say mistakes arent made and change doesnt happen, but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon... give the folks in safety critical system engineering some credit..
I don't have the same faith in corporate leadership as you, at least not when they see potentially huge savings by firing some of the expensive developers and using AI to write more of the code.
I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).
E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.
While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).
I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).
[1] https://hs.memberclicks.net/executive-committee
[2] https://news.ycombinator.com/item?id=37650402
[3] https://pubmed.ncbi.nlm.nih.gov/30146789/
[4] https://rdcu.be/eCE3l
[5] https://www.sciencedirect.com/science/article/pii/S002234682...
https://www.youtube.com/watch?v=7EQT1gVsE6I
I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.
Is this sort of thing still taught to engineers and developers in college these days?
https://strawpoll.com/NMnQNX9aAg6
> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.
Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".
By focusing on particular errors, there's the possibility you'll think "problem solved".
By focusing on process, you hope to catch mistakes as early as possible.
This says it all.
https://www.theguardian.com/uk-news/2024/jan/09/how-the-post...
One member of the development team, David McDonnell, who had worked on the Epos system side of the project, told the inquiry that “of eight [people] in the development team, two were very good, another two were mediocre but we could work with them, and then there were probably three or four who just weren’t up to it and weren’t capable of producing professional code”.
What sort of bugs resulted?
As early as 2001, McDonnell’s team had found “hundreds” of bugs. A full list has never been produced, but successive vindications of post office operators have revealed the sort of problems that arose. One, named the “Dalmellington Bug”, after the village in Scotland where a post office operator first fell prey to it, would see the screen freeze as the user was attempting to confirm receipt of cash. Each time the user pressed “enter” on the frozen screen, it would silently update the record. In Dalmellington, that bug created a £24,000 discrepancy, which the Post Office tried to hold the post office operator responsible for.
Another bug, called the Callendar Square bug – again named after the first branch found to have been affected by it – created duplicate transactions due to an error in the database underpinning the system: despite being clear duplicates, the post office operator was again held responsible for the errors.
It's an archetypal example of 'one law for the connected, another law for the proles'.
http://www0.cs.ucl.ac.uk/staff/a.finkelstein/papers/lascase....
Killing 20 innocents and one Hamas member is not a bug - it is callous, but that's a policy decision and the software working as intended. But when it is a false positive (10% of the time), due to inadequate / outdated data and inadequate models, that could reasonably classified as a bug - so all 21 deaths for each of those bombings would count as deaths caused by a bug. Apparently (at least earlier versions) of Gospel were trained on positive examples that mean someone is a member of Hamas, but not on negative examples; other problems could be due to, for example, insufficient data, and interpolation outside the valid range (e.g. using pre-war data about, e.g. how quickly cell phones are traded, or people movements, when behaviour is different post-war).
I'd therefore estimate that deaths due to classification errors from those systems is likely in the thousands (out of the 60k+ Palestinian deaths in the conflict). Therac-25's bugs caused 6 deaths for comparison.
https://www.androidauthority.com/psa-google-pixel-911-emerge...
It's almost never just software. It's almost never just one cause.
They deliberately designed it to only look at one of the Pitot tubes, because if they had designed it to look at both, then they would have had to implement a warning message for conflicting data.
And if they had implemented a warning message, they would have had to tell the pilots about the new system, and train them how to deal with it.
It wasn't a mistake in logic either. This design went through their internal safety certification, and passed.
As far as I'm aware, MCAS functioned exactly as designed, zero bugs. It's just that the design was very bad.
Conflict resolution in redundant systems seems to be one of the weakest spots in modern aircraft software.
Worryingly, e2e / full integration testing was also the main cause of other Boeing blunders, like the Starliner capsule.
Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.
To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.
These kind of calculations always make me wonder...say someone wasted one minute of everybody's life, is the cost ~250 lives? One minute? Somewhere in between?
Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.
What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.
Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.
[0] https://news.ycombinator.com/item?id=44764689
[1] https://sketch.dev/blog/our-first-outage-from-llm-written-co...