Famous cognitive psychology experiments that failed to replicate

85 PaulHoule 47 9/17/2025, 6:55:28 PM buttondown.com ↗

Comments (47)

epolanski · 19m ago
> Claimed result: Women risk being judged by the negative stereotype that women have weaker math ability, and this apprehension disrupts their math performance on difficult tests.

I'll never understand stances trying to hide biological differences between different sexes or ethnic backgrounds.

We know for a fact that sex or ethnicity impacts body yet we seem unable to cope with the idea that there are also differences in how brains (and hormones) work.

Women have, on average, a higher emotional intelligence which is e.g. tied to higher linguistic proficiency. That helps in many different fields and, on average, women tend to learn languages easier than men.

At the same time, on average, they may perform slightly worse than men in highly computational fields (math or chess).

I want to iterate what I'm getting at to before the rest of the post:

Genetics matter when you look at very large samples, but they are irrelevant on smaller (or single) samples.

I feel NBA provides a great example.

On average, african americans are taller than white men and have a higher muscular density.

On large samples, they tend to outperform white men. But as soon as you make the samples smaller, even at elite levels, you find out that Larry Bird (30+ years ago) or Nikola Jokic (today) are the best players in the world.

Same applies to women, just because average samples will explain some statistics, such as on average females performing worse on maths, won't change that women can be the best chess players or cryptographers in the world.

us-merul · 10m ago
> We know for a fact that sex or ethnicity impacts body yet we seem unable to cope with the idea that there are also differences in how brains work.

Here is your error. You’re assuming that a physical difference in morphology is linked to behavioral or neural correlates. That’s not the case, since observed statistical- or group-level differences need not be driven by biology. You’re assuming biological determinism, and the evidence for direct genetic effects on behavior isn’t there.

epolanski · 4m ago
It's not an error unless you're able to demonstrate the opposite.

I have yet to see studies that demonstrate that different sexes, hormones or even ethnicities do not impact cognitive abilities or higher proficiency in different fields.

Whereas I've seen plenty that show that women, on average, demonstrate higher cognitive abilities linked to verbal proficiency or text comprehension. Women also tend to have better memory than men.

Facts are that there are genetic differences in how our brains work. And let's not ignore the huge importance of hormones.

us-merul · 2m ago
And how are you able to rule out that societal or environmental effects are the primary driver? How is your argument not circular, that observed differences are therefore the result of biology?
myhf · 15m ago
Circular reasoning can be used to "prove" anything, so it's not helpful as a basis for policy making.
delichon · 2h ago
Approximate replication rates in psychology:

  social      37%
  cognitive   42%
  personality 55%
  clinical    44%
So a list of famous psychology experiments that do replicate may be shorter.

https://www.nature.com/articles/nature.2015.18248

NewJazz · 1h ago
I think one would wish the famous ones to be more often replicable.
tomjakubowski · 28m ago
Nonreplicable publications are cited more than replicable ones (2021)

> We use publicly available data to show that published papers in top psychology, economics, and general interest journals that fail to replicate are cited more than those that replicate. This difference in citation does not change after the publication of the failure to replicate. Only 12% of postreplication citations of nonreplicable findings acknowledge the replication failure.

https://www.science.org/doi/10.1126/sciadv.abd1705

Press release: https://rady.ucsd.edu/why/news/2021/05-21-a-new-replication-...

sunscream89 · 48m ago
There may be minute details like having a confident frame of reference for the confidence tests. Cultures, even psychologies might swing certain ideas and their compulsions.
jbentley1 · 1h ago
This is a great list for people who want to smugly say "Um, actually" a lot in conversation.

Based on my brief stint doing data work in psychology research, amongst many other problems they are AWFUL at stats. And it isn't a skill issue as much as a cultural one. They teach it wrong and have a "well, everybody else does it" attitude towards p-hacking and other statistical malpractice.

wduquette · 28m ago
"they are AWFUL at stats."

SF author Michael Flynn was a process control engineer as his day job; he wrote about how designing statistically valid experiments is incredibly difficult, and the potential for fooling yourself is high, even when you really do know what you are doing and you have nearly perfect control over the measurement setup.

And on top of it you're trying to measure the behavior of people not widgets; and people change their behavior based on the context and what they think you're measuring.

There was a lab set up to do "experimental economics" at Caltech back in the late 80's/early 90's. Trouble is, people make different economic decisions when they are working with play money rather than real money.

sputr · 1h ago
As someone who's part of a startup (hrpotentials.com) trying to bring truly scientifically valid psychological testing into HR processes .... yeah. We've been at it for almost 7 years, and we're finally at a point where we can say we have something that actually makes scientific sense - and we're not inventing anything new, just commercializing the science! It only took an electrical engineer (not me) with a strong grasp of statistics working for years with a competent professor of psychology to separate the wheat from the chaff. There's some good science there it's just ... not used much.
PaulHoule · 58m ago
Yeah, this is an era which is notorious for pseudoscience.
odyssey7 · 42m ago
There’s surely irony here
Waterluvian · 1h ago
Um, actually I’d say it is the responsibility of all scientists, both professional and amateur, to point out falsehoods when they’re uttered, and not an act of smugness.
rolph · 47m ago
[um], has contexts but is usually a cue, that an unexpected, off the average, something is about to be said.

[actually], is a neutral declaration that some cognitive structure was presented, but is at odds with physically observable fact that will now be laid out to you.

sunrunner · 3m ago
No mention of the Stanford Prison Experiment I notice.
glial · 2h ago
The incentive of all psychology researchers is to do new work rather than replications. Because of this, publicly-funded psychology PhDs should be required to perform study replication as part of their training. Protocol + results should be put in a database.
gwd · 52m ago
How interesting would it be if every PhD thesis had to have a "replication" section, where they tried to replicate some famous paper's results.
aeve890 · 1h ago
>Source: Stern, Gerlach, & Penke (2020)

Wow, what are the odds?

https://en.wikipedia.org/wiki/Stern%E2%80%93Gerlach_experime...

NooneAtAll3 · 1h ago
I'm still amazed that wikipedia doesn't have redirect away from its mobile site
dang · 56m ago
(It's on my list to rewrite those URLs in HN comments at least)
Terr_ · 1h ago
> Source: Hagger et (63!) al. 2016

I can't help chuckling at the idea that over 1.98 * 10^87 people were involved in the paper.

picardo · 24m ago
Well, at least the growth mindset study is not fully debunked yet. It's basically a modern interpretation of what we've known to be true about self-fulfilling prophecies. If you tell children they are can be smart and competent if they work hard, then they will work hard and become smart and competent. This should be a given.
fsckboy · 1h ago
famous cognitive psychology experiments that do replicate: IQ tests

http://www.psychpage.com/learning/library/intell/mainstream....

in fact, the foundational statistical models considered the gold standard for statistics today were developed for this testing.

alphazard · 1h ago
> in fact, the foundational statistical models considered the gold standard for statistics today were developed for this testing.

The normal distribution predates the general factor model of IQ by hundreds of years.[0]

You can try other distributions yourself, it's going to be hard to find one that better fits the existing IQ data than the normal (bell curve) distribution.

[0] https://en.wikipedia.org/wiki/Normal_distribution#History

fsckboy · 55m ago
Darwin's cousin, Francis Galton, for whom the log-normal distribution is often called the Galton distribution, was among the first to investigate psychometrics.

apparently hundreds of years late to the game, he still coined the term "median"

more tidbits here https://en.wikipedia.org/wiki/Francis_Galton#Statistical_inn...

gwd · 54m ago
> Smile to Feel Better Effect

> Claimed result: Holding a pen in your teeth (forcing a smile-like expression) makes you rate cartoons as funnier compared to holding a pen with your lips (preventing smiling). More broadly, facial expressions can influence emotional experiences: "fake it till you make it."

I read this about a decade ago, and started, when going into a situation where I wanted to have a natural smile, grimacing maniacally like I had a pencil in my teeth. The thing is, it's just so silly, it always makes me laugh at myself, at which point I have a genuine smile. I always doubted whether the claimed connection was real, but it's been a useful tool anyway.

sunscream89 · 50m ago
Yeah, the marshmallow one taught me to have patience and look for the long returns on investments of personal effort.

I think there may be something to a few of these, and more may need considering regarding how these are conducted.

Let’s leave open our credulities for the inquest of time.

bogtog · 46m ago
Little of this is considered cognitive psychology. The vast majority would be viewed as "social psychology"

Setting that aside, among any scientific field I'm aware of, psychology has taken the replication crisis most seriously. Rigor across all areas of psychology is steadily increasing: https://journals.sagepub.com/doi/full/10.1177/25152459251323...

systemstops · 1h ago
Is anyone tracking how much damage to society bad social science has done? I imagine it's quite a bit.
feoren · 55m ago
We rack up quite a lot of awfulness with eugenics, phrenology, the "science" that influenced Stalin's disastrous agriculture policies in the early USSR, overpopulation scares leading to China's one-child policy, etc. Although one could argue these were back-justifications for the awfulness that people wanted to do anyway.
systemstops · 47m ago
Those things were not done by awful people though - they all thought they were serving the public good. We only judge it as awful now because of the results. Nearly of these ideas (Lysenkoism I think was always fringe) were embraced by the educated elites of the time.
feoren · 3m ago
Lysenkoism! That's the one. Thank you for reminding me of the name (and for knowing what I was grasping at).

I think some "bad people" used eugenics and phrenology to justify prior hate, but they were also effective tools at convincing otherwise "good people" to join them.

izabera · 55m ago
i'm struggling to imagine many negative effects on society caused by the specific papers in this list
systemstops · 41m ago
Public policies were made (or justified) based on some of this research. People used this "settled science" to make consequential decisions.

Stereotype threat for example was widely used to explain test score gaps as purely environmental, which contributed to the public seeing gaps as a moral emergency that needed to be fixed, leading to affirmative action policies.

blindriver · 1h ago
Papers should not be accepted until an independent lab has replicated the results. It’s pretty simple but people are incentivized to not care if it’s replicable because they need the paper to publish to advance their career
ausbah · 1h ago
i wonder the replication rate is for ML papers
PaulHoule · 56m ago
From working in industry and rubbing shoulders with CS people who prioritize writing papers over writing working software I’m sure that in a high fraction of papers people didn’t implement the algorithm they thought they implemented.
avdelazeri · 34m ago
Don't get me started, I have seem repos that I'm fairly sure never ran in their presented form. A guy in our lab thinks authors purposefully mess up their code when publishing on GitHub to make it harder to replicate. I'm starting to come around on his theory.
WesolyKubeczek · 29m ago
> Claimed result: Listening to Mozart temporarily makes you smarter.

This belongs in a dungeon crawl game. You find an artifact that plays music to you. Depending on the music played (depends on the artifact's enchantment and blessed status), it can buff or debuff your intelligence by several points temporarily.

Animats · 59m ago
> Most results in the field do actually replicate and are robust [citation needed], so it would be a pity to lose confidence in the whole field just because of a few bad apples.

Is there a good list of results that do consistently replicate?

hn_throw_250915 · 1h ago
I thought we knew that these were vehicles by wannabe self-help authors to puff up their status for money. See for example “Grit” and “Deep Work” and other bullshit entries in a breathlessly hyped up genre of pseudoscience.
juujian · 1h ago
Now I want to know which cognitive psychology experiments were successfully replicated though.
SpaceManNabs · 1h ago
One thing that confuses me is that some of these papers were successfully replicated, so juxtaposing them to the ones that have not been replicated at all given the title of the page feels a bit off. Not sure if fair.

The ego depletion effect seems intuitively surprising to me. Science is often unintuitive. I do know that it is easier to make forward-thinking decisions when I am not tired so I dont know.

ceckGrad · 50m ago
>some of these papers were successfully replicated, so juxtaposing them to the ones that have not been replicated at all given the title of the page feels a bit off. Not sure if fair.

I don't like Giancotti's claims. He wrote: >This post is a compact reference list of the most (in)famous cognitive science results that failed to replicate and should, for the time being, be considered false.

I don't agree with Giancotti's epistemological claims but today I will not bloviate at length about the epistemology of science. I will try to be brief.

If I understand Marco Giancotti correctly, one particular point is that Giancotti seems to be saying that Hagger et al. have impressively debunked Baumeister et al.

The ego depletion "debunking" is not really what I would call a refutation. It says, "Results from the current multilab registered replication of the ego-depletion effect provide evidence that, if there is any effect, it is close to zero. ... Although the current analysis provides robust evidence that questions the strength of the ego-depletion effect and its replicability, it may be premature to reject the ego-depletion effect altogether based on these data alone."

Maybe Baumeister's protocol was fundamentally flawed, but the counter-argument from Hagger et al. does not convince me. I wasn't thrilled with Baumeister's claims when they came out, but now I am somehow even less thrilled with the claims of Hagger et al., and I absolutely don't trust Giancotti's assessment. I could believe that Hagger executed Baumeister's protocol correctly, but I can't believe Giancotti has a grasp of what scientific claims "should" be "believed."

taeric · 1h ago
The idea isn't that it is easier to do things when not tired. It is that you specifically get tired exercising self control.

I think that can be subtly confused by people thinking you can't get better at self control with practice? That is, I would think a deliberate practice of doing more and more self control every day should build up your ability to do more self control. And it would be easy to think that that means you have a stamina for self control that depletes in the same way that aerobic fitness can work. But, those don't necessarily follow each other.