Yes old, but even worse, it is not a well argued review. Yes, Bayesian statistics are slowly gaining an upper hand at higher levels of statistics, but you know what should be taught to first year undergrads in science? Exploratory data analysis! One of the first books I voluntarily read in stats was Mosteller and Tukey’s gem: Data Analysis and Regression. A gem. Another great book is Judea Pearl’s Book of Why.
nxobject · 32d ago
On the subject of prioritizing EDA:
I need to look this up, but I recall in the 90s a social psychology journal briefly had a policy of "if you show us you're handling your data ethically, you can just show us a self-explanatory plot if you're conducting simple comparisons instead of NHST". That was after some early discussions about statistical reform in the 90s - Cohen's "The Earth is round (p < .05)" I think kick-started things off.
wiz21c · 32d ago
Definitely. It always amazes me that in many situations, I'm applying some stats algorithm just to conclude: let's look at these data some more...
jononor · 32d ago
Yes. And the same for DS/ML people also, please. The amount of ML people that can meaningfully drill down and actually understand the data is surprisingly low sometimes. Even worse for being able to understand a phenomena _using data_.
Charon77 · 32d ago
When you have a lot of fancy metrics/models/bootsraps to throw at, people would just see what sticks.
jononor · 31d ago
Happens all the time. Problems come quickly when the datasets used for evaluation are not clean, or the evaluation is incorrect - data leakage, problematic imbalance between groups, distribution shifts vs the actual production data. Or people just checking the average performance, but not typical or worst case. Have seen many people run in circles chasing metrics that are meaningless to the task they are supposed to be solving.
hnuser123456 · 32d ago
Okay, apparently this is the core of the debate?:
Frequentists view probability as a long-run frequency, while Bayesians view it as a degree of belief.
Frequentists treat parameters as fixed, while Bayesians treat them as random variables.
Frequentists don't use prior information, while Bayesians do.
Frequentists make inferences about parameters, while Bayesians make inferences about hypotheses.
---
If we state the full nature of our experiment, what we controlled and what we didn't... how can it be a "degree of belief"? Sure, it's impossible to be 100% objective, but it is easy to add enough background info to your paper so people can understand the context of your experiment and why you got your results. "we found that at our college in this year, when you ask random students on the street this question, 40% say this, 30% say this..." and then considering how the college campus sample might not fully represent a desired larger sample population... what is different? you can confidently say something about the students you sampled, less so about the town as a whole, less so about the state as a whole...
I don't know, I finished my science degree after 10 years and apparently have an even mix of these philosophies.
Would love to learn more if someone's inclined.
jdietrich · 31d ago
You can never state the full nature of your experiment. Even the simplest experiment under the most controlled conditions has a bunch of unknown unknowns - you can never be certain that you didn't get a bad batch of reagents or a bit flip in your data or just royally screwed something up. Unless you're omniscient, there are always unknown unknowns lurking somewhere in your method.
In the case of your survey, you can't really state anything about the students you sampled with absolute confidence. You might have been asking a question about something that a lot of people are inclined to lie about. You might have been subconsciously biased and subtly influenced people towards the "right" answer. Slight inconsistencies in the wording of the question might radically alter how people respond. The student you tasked with conducting the interviews might have just stayed in bed and fabricated the data.
Bayesianism gives us an incredibly powerful framework for reasoning about this innate and unavoidable fog of uncertainty; frequentism largely pretends that it doesn't exist. The ongoing replication crisis shows why this is not merely pedantry, but the single most urgent issue in science.
throwawaymaths · 31d ago
the replication crisis is more about poor experiment design, low-n studies, and cherry picking results.
and that's not to say cherry picking is bad. lets say you set up an n=5 experiment and you drop your instrument on the floor on experiment #4. please cherry pick that one away.
usgroup · 30d ago
>frequentism largely pretends that it doesn't exist. The ongoing replication crisis shows why this is not merely pedantry, but the single most urgent issue in science.
If you mean that Frequentist methods have no way of dealing with parameter uncertainty then your statement is false.
If you mean that some people who use Frequentist methods don't deal with parameter uncertainty then it may sometimes be the case.
joshjob42 · 31d ago
Well even for simple things there's a large difference. Say you toss a coin N times and observe heads x times. What is the probability of your next toss coming up heads?
A frequentist arguably would say the question doesn't really have any meaning since probabilities are about long run frequencies of things occuring. They might do various tests or tell you the probability of that outcome under various probabilities for heads.
A Bayesian would make an initial assumption about the probability of any given probability, and then compute a posterior using the likelihood function the frequentist may have, and give you a distribution for what you should believe about the what the true probability of heads is on your next coin toss.
In general, the latter is more meaningful and informative. There's also pretty good arguments that any coherent method of representing credences is isomorphic to probability, see Cox's theorem.
Not mentioned in the list, but you can also use likelihood ratio intervals calculated from a likelihood profile: another Frequentist method.
None of the methods -- including the Bayesian, requires an informative prior.
edanm · 31d ago
> If we state the full nature of our experiment, what we controlled and what we didn't... how can it be a "degree of belief"?
Here's a question for you: what's the millionth digit of Pi? (In the standard decimal expansion, of course.)
This is a question (kind of an "experiment") which literally has only one, constant answer, that is theoretically knowable. But unless you search online, or happen to know the answer, you actually don't know which of the digits 0-9 it is. And I can just as easily ask about the 10^100th digit of Pi, which is, again, a constant - and yet no one knows what it is.
So using the Frequentist approach to statistics doesn't make much sense - there's no repeated experiment with possible different outcomes.
But there is a real sense in which your answer should be "it's one of the digits 0-9 with 1/10 probability each". That answer makes sense, because the answer isn't unknowable, just unknown, and probability reflects your lack of knowledge and degree of belief.
immibis · 29d ago
The repeatable experiment is asking for different digits of different normal irrational numbers. If repeated enough times, the answer will be each digit 0-9, each one about 1/10 of the time.
usgroup · 31d ago
I think Bayesian methods have made ground in sciences such as Sociology, Psychology and Ecology, which are mostly observational, but still attempt to make models with intepretable parameters.
With observational studies, representing confounders and uncertainty is a primary concern, because they are the most important source of defeater. Here, Bayesian software such as brms, Stan, pyMC, become a flexible way to integrate may sources of uncertainty. Although, I suspect methods like SEM still dominate for their use cases.
Personally, I find myself using Bayesian methods in a similar bag-of-tricks way that I use Frequentist methods mostly because its difficult to believe that complex phenomena is well described by either, so I use whatever makes the case best.
perrygeo · 32d ago
Frequentists stats aren't wrong. It's just a special case that has been elevated to unreasonable standards. When the physical phenomenon in question is truly random, frequentist methods can be a convenient mathematical shortcut. But should we be teaching scientists the "shortcut"? Should we be forcing every publication to use these shortcuts? Statistic's role in the scientific reproducibility crisis says no.
tgv · 31d ago
NHST, which is part of frequentist statistics, is wrong, plain and simple. It answers the wrong question (what's the probability of the data given the hypothesis vs. what's the probability of the hypothesis given the data), and will favor H1 under conditions that can be manipulated in advance.
There is a total lack of understanding of how it works, but people think they know how to use it. There are numerous articles out there containing statements like "there were no differences in age between the groups (p > 0.05)". Consequently, it is the wrong thing to teach.
That's apart from the more philosophical question: what does it mean when I say that there's a 40% chance that it team A will beat team B in the match tomorrow?
StopDisinfo910 · 31d ago
NHST is not wrong. It’s widely misused by people who barely understand any statistics.
Reducing frequentist statistics to testing and p-value is a huge mistake. I have always wondered if that’s how it is introduced to some and that’s why they don’t get the point of the frequentist approach.
Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air. It’s also a lot of relatively advanced mathematics if you want to teach it well as defining random variables properly requires a fair bit of measure theory. I think the perceived gap comes from there. People have a somewhat hand wavy understanding of sampling and an overall poor grounding in theory and then think Bayes is better because it looks simpler at first.
zozbot234 · 31d ago
> Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air.
You're "pulling priors out of thin air" whether you realize it or not; it's the only way that estimation makes sense mathematically. Frequentist statistics is broadly equivalent to Bayesian statistics with a flat prior distribution over the parameters, and what expectations correspond to a "flat" distribution ultimately depends on how the model is parameterized, which is in principle an arbitrary choice - something that's being "pulled out of thin air". Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience, and frequentists can use "robust" statistical methods to exceptionally take prior information into account; so the difference is even lower than you might expect.
There's also a strong argument against NHST specifically that works from both a frequentist and a Bayesian perspective: NHST rejects the Likelihood principle https://en.wikipedia.org/wiki/Likelihood_principle hence one could even ask whether NHST is even "properly" frequentist.
StopDisinfo910 · 31d ago
> You're "pulling priors out of thin air" whether you realize it or not
No, you are not. That’s an argument I often seen put forward by people who want the Bayesian approach to be the one true approach. There are no prior whatsoever involved in a frequentist analysis.
People who say that generally refer to MLE being somewhat equivalent to MAP estimation with a uniform prior in the region. That’s true but that’s the usual mistake I’m complaining about of reducing estimators to MLE.
The assertion in itself doesn’t make sense.
> Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience
That’s very hand wavy. The issue is that priors have a significant impact on posteriors, one which is often deeply misunderstood by casual statisticians.
NeutralCrane · 31d ago
Frequentists big complaint about priors are that they are subjective and influence the conclusions of the study. But the Frequentist approach is equivalent to using a non-informative prior, which is itself a subjective prior that influences the conclusions of the study. It is making the assumption that we know literally nothing about the phenomenon under examination outside of the collected data, which is almost never true.
tgv · 31d ago
> There are no prior whatsoever involved in a frequentist analysis.
It may not be everywhere, but even in the simplest case of NHST, there certainly is. It assumes no difference between H0 and H1. And NHST is basically the topic of this entire thread: it's what we should have stopped teaching a long time ago.
addcommitpush · 31d ago
Let’s say you run the most basic regression Y = X beta + epsilon. The X is chosen out of the set all possible regressors Z (say you run income ~ age + sex, where you also could have used education, location, whatever).
Is that not equivalent to a prior that the coefficient on variables in Z but not in X is zero?
OrderlyTiamat · 30d ago
NHST is wrong as a matter of theory. It is a weird amalgamation of null hypothesis testing (Fisher) and significance testing (Neyman and Pearson). Those two approaches by themselves are correct, and theoretically sound, given the appropriate assumptions are met.
NHST is not associated with any statistician, and you will find no author claiming to be its inventor. It is a misunderstanding of statistics apparently originating from psychology back in the 1960s, or at least that's as far back as I've found it.
kccqzy · 32d ago
Frequentism methods are strictly less general. For example Laplace used probability theory to estimate the mass of Saturn. But with a frequentist interpretation we have to imagine a large number of parallel universes where everything remains the same except for the mass of Saturn. That's overly prescriptive of what probability means. Whereas in Bayesian statistics what probability means is strictly more general. You can manipulate probabilities even without fully defining them (maximum entropy) subject to intuitive rules (sum rule, product rule, Bayes' theorem), and the results of such manipulation are still correct and useful.
StopDisinfo910 · 32d ago
Laplace is typical use of inference statistics to built an estimator. I don’t really understand your point about parallel universe here. It’s absolutely not necessary for any of the sampling to make sense. Every time you try to measure anything, you are indeed taking a sample of the set of measures you could have gotten given the tools you are using.
I fear you operate under the illusion that frequentist statistics are somehow limited to hypothesis testing. It is absolutely not the case.
perrygeo · 32d ago
Drawing a sample of Saturns from an infinite set of Saturns! It's completely absurd, but that's what you get when you take a mathematical tool for coin flips and apply it to larger scientific questions.
I wonder if the generality of the Bayesian approach is what's prevented its wide adoption? Having a prescribed algorithm ready to plug in data is mighty convenient! Frequentism lowered the barrier and let anyone run stats, but more isn't necessarily a good thing.
IshKebab · 32d ago
I dunno about you guys but I have no problems imagining randomly sampling Saturn.
mitthrowaway2 · 31d ago
What do you mean by "randomly sampling" here?
IshKebab · 31d ago
I mean, Saturn was formed by some process right? And it must be sensitive to some initial conditions that - although maybe not really random, we can treat as random. Now imagine going back in time and changing those conditions a bit so that Saturn ended up differently. Do that 1000 times, giving you 1000 different Saturns. Now pick one randomly.
NeutralCrane · 31d ago
The point is that you can’t do that. That’s the entire conundrum with Frequentism. They object to stating anything about the probability of Saturn, because from an objectivist point of view, any statement is either true or it isn’t, and therefore all probabilistic statements about it must be 0% or 100%. Instead they resort to statements about the frequencies over the long term from repeated processes, like the one you have. There are two problems with this:
1. They aren’t answering the original question. The question is about the probability of a property of Saturn. Not about the process of repeatedly forming thousands of alternative Saturns. This seems like a subtle difference but that’s only because Frequentism has been the default for so long. It doesn’t attempt to answer the questions people are actually asking.
2. The assumptions it makes to answer that alternative question are just as flawed. We can’t go back in time and change the conditions surrounding Saturn’s creation. We can’t run 1000s of repeated trials of the creation of Saturn. For a group of people so ideologically opposed to a statement as simple as “the probability of this flipped coin being heads is 50%”, it seems absurd that they are fine with their entire framework being built around a premise that doesn’t exist and cannot exist.
rxtexit · 30d ago
In other words, creating a model that has little to do with reality then sampling from it to come up with a result that has little to do with reality.
Yes, I think this is kind of standard practice in many fields.
If someone questions this just quote "all models are wrong, but some are useful" as if the quote is actually saying "all models are wrong, but all models are useful".
chuckadams · 31d ago
We do that right now with computer simulations. Not exactly the hardest of evidence, but if the time machine were possible, someone in the future would have done it by now.
roenxi · 31d ago
> But with a frequentist interpretation we have to imagine a large number of parallel universes where everything remains the same except for the mass of Saturn. That's overly prescriptive of what probability means.
That isn't much of an argument to the mathematicians. Nobody ever came up with a compelling explanation for what -1 sheeps look like and yet negative numbers turned out to be extremely practical. If it is absurd and provably works then the math community can roll with that.
kccqzy · 31d ago
It turns out -1 sheep is well defined: we use numbers to represent both absolute quantity and relative difference. A sheep died today, therefore the relative difference between today and yesterday is -1 sheep.
Math people prefer to generalize. But with frequentism, it is not possible to generalize because "frequency" is baked into its very name. Indeed you can imagine Bayesian statistics as the generalization of frequentism.
wenc · 32d ago
Frequentist methods are unintuitive and seemingly arbitrary to a beginner (hypothesis testing, 95% confidence, p=0.05).
Bayesian methods are more intuitive, and fit how most be reason when they reason probabilistically. Unfortunately Bayesian computational methods are often less practical to use in non-trivial settings (usually involves some MCMC).
I'm a Bayesian reasoner, but happily use frequentist computation methods (max likelihood estimation) because they're just more tractable.
porridgeraisin · 31d ago
I'm familiar with hypothesis testing, MCMC, and MLE. Can you explain how they are bayesian or frequentist?
jampekka · 31d ago
p-value testing is problematic, but frequentist CIs typically map to credible intervals with uninformative priors. In practice in Bayesian analyses tend to use so weak priors that they are essentially uninformative.
Maximum likelihood also tends to be equivalent to MAP with uninformative priors.
I find a lot of Bayesian analysis is a bit of cargo culting and frequentist/ML formulations are dismissed with tribalism.
NeutralCrane · 31d ago
Frequentist stats are wrong. They are built entirely on a flawed, non-consistent premise. The only reason for their use is because Bayesian approaches were computationally unfeasible for a long time, but that is no longer the case.
NewsaHackO · 32d ago
It’s weird how random people can submit non peer reviewed articles to preprint repos. Why not just use a blog site, medium or substack?
jxjnskkzxxhx · 32d ago
> Why not just use a blog site, medium or substack?
Because it looks more credible, obviously. In a sense it's cargo cult science: people observe this is the style of science, and so copy just the style; to a casual observer it appears to be science.
nickpsecurity · 32d ago
Professional science has been doing that a long time if one considers that many published works were never independently tested and replicated. If it's a scientist, and uses scientific descriptions, many just repeat it from there.
jxjnskkzxxhx · 32d ago
Overly reductionistic. At the same time a proper rebuttal isn't worth the time for someone who's clearly not looking to understand.
NeutralCrane · 31d ago
Publication in a journal is not a requirement for the scientific method. If anything, the insistence that something not published in a scientific journal is not science is, itself, cargo cult scientism.
groceryheist · 32d ago
Two reasons:
1. Preprint servers create DOIs, making works better citable.
2. Preprint servers are archives, ensuring works remain accessible.
My blog website won't outlive me for long. What happened to geocities could also happen to medium.
lametti · 31d ago
You should check out https://rogue-scholar.org/ - full-text archiving and DOIs for science blogs. I use it and it works great.
SoftTalker · 32d ago
Who would want to cite a random unreviewed preprint?
mitthrowaway2 · 32d ago
You don't get a free pass to not cite relevant prior literature just because it's in the form of an unreviewed preprint.
If you're writing a paper about a longstanding math problem and the solution gets published on 4chan, you still need to cite it.
hansvm · 31d ago
In some fields, sure, cite the 4chan source, ideally with an archived link.
Pure math tends to be much more conservative in citations than other fields though, and even when writing a paper about a longstanding math problem you wouldn't necessarily bother to include existing solutions. You reference the things you actually used, and even then you assume some common background knowledge for your audience and don't reference every little undergrad topology theorem or whatever. The point is to be honest with the reader about what was helpful for this work in particular, both to properly attribute things you actually used and to make any searches based on your work more targeted and fruitful.
NooneAtAll3 · 32d ago
tbf, you cite the paper that described and discussed said solution in the more appropriate form
mousethatroared · 32d ago
You cite the form you encountered and if you're any good of a researcher you will have encountered the original 4chan anon post, Borges' short story, or Chomsky's linguistic paper.
bowsamic · 32d ago
It happens way more than you expect. In my PhD I used to cite unreviewed preprints that were essential to my work but simply for whatever reason hadn’t been pushed to publication. More common for long review like papers
jononor · 32d ago
Anyone who found something useful in it and are writing a new paper.
That something is unreviewed does not mean that it is bad or useless.
amelius · 32d ago
Maybe other pseudoscientists who agree with the ideas presented and want to create a parallel universe with alternative facts?
mousethatroared · 32d ago
And people who care more for gatekeeping will stick to academic echo chambers. The list of community driven medical discoveries encountering entrenched professional opposition is quite long.
Both models are fallible, which is why discernment is so important.
jononor · 32d ago
You can do that with reviewed papers too :)
T-A · 32d ago
> It’s weird how random people can submit non peer reviewed articles to preprint repos.
Assuming that you are referring to the Arxiv, they can't:
Why the gatekeeping. Only what is said matters, not who says it.
tsimionescu · 32d ago
That's a cute fantasy, but it doesn't work beyond a tiny scale. Credentials are critical to help filter data - 8 billion people all publishing random info can't be listened to.
SoftTalker · 32d ago
> 8 billion people all publishing random info can't be listened to.
Yet it's what we train LLMs on.
tsimionescu · 32d ago
It's what we train LLMs on to make them learn language, a thing that all healthy adult human beings are experts on using. It's definitely not what we train LLMs on if we want them to do science.
> We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval
We train on the internet because, for example, I speak a fairly niche English dialect influenced by Hebrew, Yiddish and Aramaic, and there are no digitised textbooks or dictionaries that cover this language. I assume the base weights of models are still using high quality materials.
birn559 · 32d ago
Which are known to be unreliable beyond basic things that most people that have some relevant experience get right anyway.
billfruit · 32d ago
GitHub lets anyone upload code. It works perfectly fine.
tsimionescu · 32d ago
There's no problem in letting anyone upload. The problem is in claiming that we should give the same amount of attention to the work of anyone, that "only what is said matters". Just like we don't run random code off github, we have no reason to read random papers on arxiv. And, even on github, anyone using a project knows that "who is maintaining this project" is a major decision factor.
billfruit · 32d ago
My objection was to the concept of gatekeeping/barriers to entry for posting/uploading. Not that everything uploaded demands the same attention.
tsimionescu · 31d ago
Sure and you're right about that. But the thread was about not judging a paper on the basis of authorship - my quote was a direct quote from the post I replied to.
NeutralCrane · 31d ago
With all the issues surrounding modern science and the replication crisis, much of which stems from the current standard of journal publications, I would argue that your alternative doesn’t scale any better.
birn559 · 32d ago
If what is said has any merit can be very hard to judge beyond things that are well known.
In addition, peer reviews are anonymous for both sides (as far as possible).
ujkiolp · 32d ago
[flagged]
tomhow · 31d ago
You can't comment like this on Hacker News, no matter what you're replying to.
Peer review specifically checks that what is being said passes scrutiny by experts in the field, so it is very much about what is being said.
SJC_Hacker · 32d ago
They why isn't it double blind ?
BDPW · 32d ago
Often reviewing is executed double blind for exactly this reason. This can be difficult in small fields where you can more-or-less guess who's working on what, but the intent is definitely there.
mcswell · 32d ago
I've reviewed computational linguistics papers in the past (I'm retired now, and the field is changing out from under me, so I don't do it any more). But all the reviews I did were double blind.
watwut · 32d ago
Yeah, that is why 4chan became famous for being the source of trustworthy and valuable scientific research. /s
randomNumber7 · 31d ago
Science is in a strange state, but I don't think the current HN audience (of inexperienced ai script kids) is the crowd to have a valuable discussion about it.
billfruit · 32d ago
GitHub works on a similar model, without any barrier of entry, and it works well.
naveen99 · 31d ago
I would start with github. Arxiv, show hn, twitter, TikTok , Super Bowl ad if you are going for max look at me, i am not wearing diapers effect.
constantcrying · 32d ago
>It’s weird how random people can submit non peer reviewed articles to preprint repos.
It is weird how people use a platform exactly how it is supposed to be used.
usgroup · 31d ago
I consider myself an applied Statistician amongst other things, and I find this to be an ideological take mostly.
When we do Statistics, we are firstly doing Applied Mathematics, which we are secondly extending to account for uncertainty for our particular problem. Whether your final model is good will largely depend on how it serves the task it was built for and/or how likely its critics believe it is to be falsified in its alternative hypothesis space. That is, a particular uncertainty extension is not necessary nor sufficient.
For less usual examples, engineers may use Interval Arithmetic to deal with propagation uncertainty, quants might use maximin to hedge a portfolio, management science makes use of scenario analysis (deterministic models under different scenarios): all deal with uncertainty, none necessarily invoke either Frequentist or Bayesian intuitions.
So, in my opinion, the most useful thing to teach neophytes is how to model with Maths. Second, it is how to make cases for the model under uncertainty.
derbOac · 31d ago
I've published stats papers on Bayesian methodology, consider myself a kind of Bayesianist, and stances like the link essay, and many of the posts here, make me sad and a bit frustrated. As you're saying, it's become an ideological, not functional, debate. Different inferential paradigms have their benefits and costs, and Bayesian methods have their own problems.
Whenever you make an inference, it's a gamble, in a sort of literal sense. The choice of frequentist versus Bayesian methods basically is a bet about whether the bias due to your prior is small enough to offset the variance due to the lack of one. If the bias due to a "misguided" prior is high enough, you'll end up making a worse inference than not using a prior.
You can, of course, use an optimally conservative prior, a reference prior, but for many cases what you're left with is a uniform prior, and therefore, frequentist inference. It's not always the same as a uniform prior, but in practice is often the same. I think the philosophical arguments against frequentist lines of thoughts are often unresolvable and involve strawmen characterizations of frequentism often derived from misperceptions of young adults learning any sorts of stats for the first time.
There's also a very strong argument against strong priors in competitive situations, such as in achievement tests for admissions or some such thing. Imagine making priors for your test score based on your demographic background, for instance. Technically this might be ok from a subjective Bayesian perspective but I think almost no one would agree this is acceptable, and the reasons why are telling about statistical inference more generally.
Many of the other arguments equally apply to Bayesian methodology too. The worst problems of NHST are not actually about the tests, they're about how they are used and interpreted, and would creep up if everyone was using credibility intervals too.
I kind of think Bayesian methodology should be taught more at an earlier stage, but it would be irresponsible in my mind to not teach frequentist methodology at the same time.
nurettin · 31d ago
You have a jar with five green, three blue marbles. First random marble you pick is green, what is the chance you get blue next? There is no use for bayes here.
Now you have two jars you can't see inside of, 5g/3b and 3b/5r. You take one jar and want to guess which one you picked. You start pulling marbles and updating your priors until you reach an acceptable certainty. Now you have to use bayes or similar.
These are tools, not ideologies. People who pit these tools against each other are demagogues.
NeutralCrane · 31d ago
> You have a jar with five green, three blue marbles. First random marble you pick is green, what is the chance you get blue next? There is no use for bayes here.
There’s no use for Frequentism there either, it’s basic probability theory.
nurettin · 31d ago
And "frequentism" uses that basic probability theory, unless you have a "frequentist" answer to the first example that is something other than 3/7?
ratorx · 31d ago
Are you implying that Bayesian doesn’t use basic probability theory or gives a different answer? To calculate the simplest possible conditional probability, you need to have defined probability first.
nurettin · 30d ago
The first example is not a demonstration for bayesian inference. It uses a simple tool used in the "frequentist" approach. The bayesian approach example is below that. My point is the two are mutually exclusive in these scenarios. I am trying to demonstrate that there are formulas for different scenarios and you don't pit them against each other. The response to that was just pedantic.
ratorx · 30d ago
Your example is flawed because the example has nothing to do with the difference between Bayesian and Frequentist inference, because neither approach is needed.
In the same way that you don’t use Bayesian approach, you also have not done anything frequentist. All you need for this problem is to calculate some probabilities. There is nothing “frequentist” about calculating probabilities (and nothing Bayesian about it either), because it is more fundamental than that.
There is nothing Bayesian or Frequentist about basic probability theory. They are both interpretations that rely on the existence of probability theory to make sense. So solving a problem with basic probability theory and claiming I did “Frequentism” is correct, but meaningless.
nurettin · 30d ago
Maybe you would care to write a sensible example, then?
ratorx · 30d ago
In my opinion, Bayes is the more general approach. But the Frequentist approach can be simpler if your prior is uniform and you only care about the output, not the distribution.
E.g. Neural Networks, where the computational cost makes modelling the problem in a Bayesian way much more expensive.
Or you can say the Frequentist approach is simpler for estimating the probability of heads for a biased coin, as long as you have no prior and only care about the most likely value, not the distribution.
nurettin · 30d ago
It is not about "more expensive computation" that is just absolute nonsensical rhetoric. Again, given these basic examples, probability of outcomes have nothing to do with bayes. Bayes needs at least two scenarios to distinguish between. Simply P and 1-P reduces the formula into a counting problem. I am done with this.
ratorx · 30d ago
> absolute nonsense rhetoric
Could you explain why you believe this?
Bayesian models at the scale of modern LLMs are not commonly used because the equivalent techniques (like MCMC) are more expensive, which limits how big and useful the model can be. This is a practical example of when pragmatic frequentism is better in a real scenario.
> it is a counting problem
It isn’t a counting problem in the general case. If you use MLE and do number of heads / total flips, that is the Frequentist approach. Of course I deliberately picked the most simple random variable I could think of, so the APPROACH could be differentiated.
The Bayesian approach starts with a prior. This is implicit in Frequentist, but explicit in Bayesian. In this case, the equivalent prior is a uniform distribution between 0 and 1. Then the Bayesian approach to the problem uses Bayes theorem to decide how to update the uniform distribution based on the result of every flip.
Is the result the same? Yes - because these are different approaches, which are both valid. However, in this case the Frequentist approach resulted in a simpler solution because these implicit assumptions matched the ones we would do anyway and matched our intuition. However, if you believed that the prior distribution was non-uniform, then Bayes may become easier.
> Bayes needs at least two scenarios
The general case is that Bayes needs a prior distribution (in this case, probability of heads is uniformly between 0 and 1 is a beta distribution). Then you use Bayes rule conditioned on the data to generate the “update” rule to generate the posterior, given the result of n coin flips.
getnormality · 31d ago
What I hear when I read this: the way we do things today has definite and well-known problems. Wouldn't it be wonderful to do things in a different way whose problems are not yet well-understood or widely known?
throwaway81523 · 31d ago
Some time back I remember a blog post about stuff you could straightforwardly do with frequency statistics that were much more difficult with Bayesian methods. I thought I bookmarked it but have no idea where it is now. I half remembered it being on Andrew Gelman's blog, but I spent a while looking there for it. No luck.
This helped a non-statistician like myself understand what it is I was supposedly taught wrong: https://xkcd.com/1132/
(I still only sorta get it: I know it's reductionist so as to be funny, but to me those dice are quite literally a hidden variable)
firejake308 · 31d ago
You're right that the dice are literally intended to represent a hidden variable. The difference between the frequentost and the Bayesian is that the frequentist only accounts for the hidden variable, whereas the Bayesian also accounts for the pre-test probability of the sun exploding. Since the probability of the sun exploding is 1 in a bazillion, multiplying that by 35 still gives you a very very low post-test probability that the sun has gone nova because 35 in a bazillion is still pretty unlikely
rawgabbit · 31d ago
It is like running a casino versus gambling as an individual.
If I was running a casino and presiding over tens of thousands of bets daily, I would use frequency statistics to guarantee the house always wins.
If I was an individual gambler and determining my odds at a particular bet, I would use Bayesian.
usgroup · 31d ago
It depends what the null hypothesis is here, but by construction, under a reasonable null, the p-value for an appropriate test would not be acceptable under a Frequentist framework.
bmacho · 32d ago
Article is from 2012, compare [0] and [1].
The pdf got replaced for some reason (bug, sensitive information in the meta or idk), but the article seems to have stayed the same, except the date.
I need to look this up, but I recall in the 90s a social psychology journal briefly had a policy of "if you show us you're handling your data ethically, you can just show us a self-explanatory plot if you're conducting simple comparisons instead of NHST". That was after some early discussions about statistical reform in the 90s - Cohen's "The Earth is round (p < .05)" I think kick-started things off.
Frequentists view probability as a long-run frequency, while Bayesians view it as a degree of belief.
Frequentists treat parameters as fixed, while Bayesians treat them as random variables.
Frequentists don't use prior information, while Bayesians do.
Frequentists make inferences about parameters, while Bayesians make inferences about hypotheses.
---
If we state the full nature of our experiment, what we controlled and what we didn't... how can it be a "degree of belief"? Sure, it's impossible to be 100% objective, but it is easy to add enough background info to your paper so people can understand the context of your experiment and why you got your results. "we found that at our college in this year, when you ask random students on the street this question, 40% say this, 30% say this..." and then considering how the college campus sample might not fully represent a desired larger sample population... what is different? you can confidently say something about the students you sampled, less so about the town as a whole, less so about the state as a whole...
I don't know, I finished my science degree after 10 years and apparently have an even mix of these philosophies.
Would love to learn more if someone's inclined.
In the case of your survey, you can't really state anything about the students you sampled with absolute confidence. You might have been asking a question about something that a lot of people are inclined to lie about. You might have been subconsciously biased and subtly influenced people towards the "right" answer. Slight inconsistencies in the wording of the question might radically alter how people respond. The student you tasked with conducting the interviews might have just stayed in bed and fabricated the data.
Bayesianism gives us an incredibly powerful framework for reasoning about this innate and unavoidable fog of uncertainty; frequentism largely pretends that it doesn't exist. The ongoing replication crisis shows why this is not merely pedantry, but the single most urgent issue in science.
and that's not to say cherry picking is bad. lets say you set up an n=5 experiment and you drop your instrument on the floor on experiment #4. please cherry pick that one away.
If you mean that Frequentist methods have no way of dealing with parameter uncertainty then your statement is false.
If you mean that some people who use Frequentist methods don't deal with parameter uncertainty then it may sometimes be the case.
A frequentist arguably would say the question doesn't really have any meaning since probabilities are about long run frequencies of things occuring. They might do various tests or tell you the probability of that outcome under various probabilities for heads.
A Bayesian would make an initial assumption about the probability of any given probability, and then compute a posterior using the likelihood function the frequentist may have, and give you a distribution for what you should believe about the what the true probability of heads is on your next coin toss.
In general, the latter is more meaningful and informative. There's also pretty good arguments that any coherent method of representing credences is isomorphic to probability, see Cox's theorem.
Here are a whole collection methods for how to estimate p and calculate a confidence interval for it: https://en.wikipedia.org/wiki/Binomial_distribution#Confiden...
One of the methods is Bayesian; the rest are not.
Not mentioned in the list, but you can also use likelihood ratio intervals calculated from a likelihood profile: another Frequentist method.
None of the methods -- including the Bayesian, requires an informative prior.
Here's a question for you: what's the millionth digit of Pi? (In the standard decimal expansion, of course.)
This is a question (kind of an "experiment") which literally has only one, constant answer, that is theoretically knowable. But unless you search online, or happen to know the answer, you actually don't know which of the digits 0-9 it is. And I can just as easily ask about the 10^100th digit of Pi, which is, again, a constant - and yet no one knows what it is.
So using the Frequentist approach to statistics doesn't make much sense - there's no repeated experiment with possible different outcomes.
But there is a real sense in which your answer should be "it's one of the digits 0-9 with 1/10 probability each". That answer makes sense, because the answer isn't unknowable, just unknown, and probability reflects your lack of knowledge and degree of belief.
With observational studies, representing confounders and uncertainty is a primary concern, because they are the most important source of defeater. Here, Bayesian software such as brms, Stan, pyMC, become a flexible way to integrate may sources of uncertainty. Although, I suspect methods like SEM still dominate for their use cases.
Personally, I find myself using Bayesian methods in a similar bag-of-tricks way that I use Frequentist methods mostly because its difficult to believe that complex phenomena is well described by either, so I use whatever makes the case best.
There is a total lack of understanding of how it works, but people think they know how to use it. There are numerous articles out there containing statements like "there were no differences in age between the groups (p > 0.05)". Consequently, it is the wrong thing to teach.
That's apart from the more philosophical question: what does it mean when I say that there's a 40% chance that it team A will beat team B in the match tomorrow?
Reducing frequentist statistics to testing and p-value is a huge mistake. I have always wondered if that’s how it is introduced to some and that’s why they don’t get the point of the frequentist approach.
Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air. It’s also a lot of relatively advanced mathematics if you want to teach it well as defining random variables properly requires a fair bit of measure theory. I think the perceived gap comes from there. People have a somewhat hand wavy understanding of sampling and an overall poor grounding in theory and then think Bayes is better because it looks simpler at first.
You're "pulling priors out of thin air" whether you realize it or not; it's the only way that estimation makes sense mathematically. Frequentist statistics is broadly equivalent to Bayesian statistics with a flat prior distribution over the parameters, and what expectations correspond to a "flat" distribution ultimately depends on how the model is parameterized, which is in principle an arbitrary choice - something that's being "pulled out of thin air". Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience, and frequentists can use "robust" statistical methods to exceptionally take prior information into account; so the difference is even lower than you might expect.
There's also a strong argument against NHST specifically that works from both a frequentist and a Bayesian perspective: NHST rejects the Likelihood principle https://en.wikipedia.org/wiki/Likelihood_principle hence one could even ask whether NHST is even "properly" frequentist.
No, you are not. That’s an argument I often seen put forward by people who want the Bayesian approach to be the one true approach. There are no prior whatsoever involved in a frequentist analysis.
People who say that generally refer to MLE being somewhat equivalent to MAP estimation with a uniform prior in the region. That’s true but that’s the usual mistake I’m complaining about of reducing estimators to MLE.
The assertion in itself doesn’t make sense.
> Of course, Bayesian statistics also often involves assigning "uninformative" priors out of pure convenience
That’s very hand wavy. The issue is that priors have a significant impact on posteriors, one which is often deeply misunderstood by casual statisticians.
It may not be everywhere, but even in the simplest case of NHST, there certainly is. It assumes no difference between H0 and H1. And NHST is basically the topic of this entire thread: it's what we should have stopped teaching a long time ago.
Is that not equivalent to a prior that the coefficient on variables in Z but not in X is zero?
NHST is not associated with any statistician, and you will find no author claiming to be its inventor. It is a misunderstanding of statistics apparently originating from psychology back in the 1960s, or at least that's as far back as I've found it.
I fear you operate under the illusion that frequentist statistics are somehow limited to hypothesis testing. It is absolutely not the case.
I wonder if the generality of the Bayesian approach is what's prevented its wide adoption? Having a prescribed algorithm ready to plug in data is mighty convenient! Frequentism lowered the barrier and let anyone run stats, but more isn't necessarily a good thing.
1. They aren’t answering the original question. The question is about the probability of a property of Saturn. Not about the process of repeatedly forming thousands of alternative Saturns. This seems like a subtle difference but that’s only because Frequentism has been the default for so long. It doesn’t attempt to answer the questions people are actually asking.
2. The assumptions it makes to answer that alternative question are just as flawed. We can’t go back in time and change the conditions surrounding Saturn’s creation. We can’t run 1000s of repeated trials of the creation of Saturn. For a group of people so ideologically opposed to a statement as simple as “the probability of this flipped coin being heads is 50%”, it seems absurd that they are fine with their entire framework being built around a premise that doesn’t exist and cannot exist.
Yes, I think this is kind of standard practice in many fields.
If someone questions this just quote "all models are wrong, but some are useful" as if the quote is actually saying "all models are wrong, but all models are useful".
That isn't much of an argument to the mathematicians. Nobody ever came up with a compelling explanation for what -1 sheeps look like and yet negative numbers turned out to be extremely practical. If it is absurd and provably works then the math community can roll with that.
Math people prefer to generalize. But with frequentism, it is not possible to generalize because "frequency" is baked into its very name. Indeed you can imagine Bayesian statistics as the generalization of frequentism.
Bayesian methods are more intuitive, and fit how most be reason when they reason probabilistically. Unfortunately Bayesian computational methods are often less practical to use in non-trivial settings (usually involves some MCMC).
I'm a Bayesian reasoner, but happily use frequentist computation methods (max likelihood estimation) because they're just more tractable.
Maximum likelihood also tends to be equivalent to MAP with uninformative priors.
I find a lot of Bayesian analysis is a bit of cargo culting and frequentist/ML formulations are dismissed with tribalism.
Because it looks more credible, obviously. In a sense it's cargo cult science: people observe this is the style of science, and so copy just the style; to a casual observer it appears to be science.
1. Preprint servers create DOIs, making works better citable.
2. Preprint servers are archives, ensuring works remain accessible.
My blog website won't outlive me for long. What happened to geocities could also happen to medium.
If you're writing a paper about a longstanding math problem and the solution gets published on 4chan, you still need to cite it.
Pure math tends to be much more conservative in citations than other fields though, and even when writing a paper about a longstanding math problem you wouldn't necessarily bother to include existing solutions. You reference the things you actually used, and even then you assume some common background knowledge for your audience and don't reference every little undergrad topology theorem or whatever. The point is to be honest with the reader about what was helpful for this work in particular, both to properly attribute things you actually used and to make any searches based on your work more targeted and fruitful.
That something is unreviewed does not mean that it is bad or useless.
Both models are fallible, which is why discernment is so important.
Assuming that you are referring to the Arxiv, they can't:
https://info.arxiv.org/help/endorsement.html
Yet it's what we train LLMs on.
> We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval
We train on the internet because, for example, I speak a fairly niche English dialect influenced by Hebrew, Yiddish and Aramaic, and there are no digitised textbooks or dictionaries that cover this language. I assume the base weights of models are still using high quality materials.
In addition, peer reviews are anonymous for both sides (as far as possible).
https://news.ycombinator.com/newsguidelines.html
It is weird how people use a platform exactly how it is supposed to be used.
When we do Statistics, we are firstly doing Applied Mathematics, which we are secondly extending to account for uncertainty for our particular problem. Whether your final model is good will largely depend on how it serves the task it was built for and/or how likely its critics believe it is to be falsified in its alternative hypothesis space. That is, a particular uncertainty extension is not necessary nor sufficient.
For less usual examples, engineers may use Interval Arithmetic to deal with propagation uncertainty, quants might use maximin to hedge a portfolio, management science makes use of scenario analysis (deterministic models under different scenarios): all deal with uncertainty, none necessarily invoke either Frequentist or Bayesian intuitions.
So, in my opinion, the most useful thing to teach neophytes is how to model with Maths. Second, it is how to make cases for the model under uncertainty.
Whenever you make an inference, it's a gamble, in a sort of literal sense. The choice of frequentist versus Bayesian methods basically is a bet about whether the bias due to your prior is small enough to offset the variance due to the lack of one. If the bias due to a "misguided" prior is high enough, you'll end up making a worse inference than not using a prior.
You can, of course, use an optimally conservative prior, a reference prior, but for many cases what you're left with is a uniform prior, and therefore, frequentist inference. It's not always the same as a uniform prior, but in practice is often the same. I think the philosophical arguments against frequentist lines of thoughts are often unresolvable and involve strawmen characterizations of frequentism often derived from misperceptions of young adults learning any sorts of stats for the first time.
There's also a very strong argument against strong priors in competitive situations, such as in achievement tests for admissions or some such thing. Imagine making priors for your test score based on your demographic background, for instance. Technically this might be ok from a subjective Bayesian perspective but I think almost no one would agree this is acceptable, and the reasons why are telling about statistical inference more generally.
Many of the other arguments equally apply to Bayesian methodology too. The worst problems of NHST are not actually about the tests, they're about how they are used and interpreted, and would creep up if everyone was using credibility intervals too.
I kind of think Bayesian methodology should be taught more at an earlier stage, but it would be irresponsible in my mind to not teach frequentist methodology at the same time.
Now you have two jars you can't see inside of, 5g/3b and 3b/5r. You take one jar and want to guess which one you picked. You start pulling marbles and updating your priors until you reach an acceptable certainty. Now you have to use bayes or similar.
These are tools, not ideologies. People who pit these tools against each other are demagogues.
There’s no use for Frequentism there either, it’s basic probability theory.
In the same way that you don’t use Bayesian approach, you also have not done anything frequentist. All you need for this problem is to calculate some probabilities. There is nothing “frequentist” about calculating probabilities (and nothing Bayesian about it either), because it is more fundamental than that.
There is nothing Bayesian or Frequentist about basic probability theory. They are both interpretations that rely on the existence of probability theory to make sense. So solving a problem with basic probability theory and claiming I did “Frequentism” is correct, but meaningless.
E.g. Neural Networks, where the computational cost makes modelling the problem in a Bayesian way much more expensive.
Or you can say the Frequentist approach is simpler for estimating the probability of heads for a biased coin, as long as you have no prior and only care about the most likely value, not the distribution.
Could you explain why you believe this?
Bayesian models at the scale of modern LLMs are not commonly used because the equivalent techniques (like MCMC) are more expensive, which limits how big and useful the model can be. This is a practical example of when pragmatic frequentism is better in a real scenario.
> it is a counting problem
It isn’t a counting problem in the general case. If you use MLE and do number of heads / total flips, that is the Frequentist approach. Of course I deliberately picked the most simple random variable I could think of, so the APPROACH could be differentiated.
The Bayesian approach starts with a prior. This is implicit in Frequentist, but explicit in Bayesian. In this case, the equivalent prior is a uniform distribution between 0 and 1. Then the Bayesian approach to the problem uses Bayes theorem to decide how to update the uniform distribution based on the result of every flip.
Is the result the same? Yes - because these are different approaches, which are both valid. However, in this case the Frequentist approach resulted in a simpler solution because these implicit assumptions matched the ones we would do anyway and matched our intuition. However, if you believed that the prior distribution was non-uniform, then Bayes may become easier.
> Bayes needs at least two scenarios
The general case is that Bayes needs a prior distribution (in this case, probability of heads is uniformly between 0 and 1 is a beta distribution). Then you use Bayes rule conditioned on the data to generate the “update” rule to generate the posterior, given the result of n coin flips.
(I still only sorta get it: I know it's reductionist so as to be funny, but to me those dice are quite literally a hidden variable)
If I was running a casino and presiding over tens of thousands of bets daily, I would use frequency statistics to guarantee the house always wins.
If I was an individual gambler and determining my odds at a particular bet, I would use Bayesian.
The pdf got replaced for some reason (bug, sensitive information in the meta or idk), but the article seems to have stayed the same, except the date.
[0]: https://arxiv.org/pdf/1201.2590v1.pdf
[1]: https://web.archive.org/web/0if_/https://arxiv.org/pdf/1201....