Bayesian updating is a very attractive idea. We take a guess at our uncertainty about a parameter of interest by specifying a prior distribution about it; then we collect some data and update our beliefs about the parameter by combining our data with our prior to produce a posterior distribution; then we collect more data and combine our old posterior with our new data to get a new posterior; etc… If we update enough we will eventually converge on the truth, so they say. This idea feels really good, and often makes people feel as though Bayesian updating will save them from having to think about confusing things like p-values or confidence intervals.

The problem with this thinking is that Bayesian updating requires the choice of a particular statistical model, and this choice must be correct for Bayesian updating to work.

In a recent paper, leading Bayesians argue that doing Bayesian statistics well often requires using p-values and confidence intervals too:

To sum up, what Bayesian updating does when the model is false (i.e., in reality, always) is to try to concentrate the posterior on the best attainable approximations to the distribution of the data, ‘best’ being measured by likelihood. But depending on how the model is misspecified, and how $\theta$ represents the parameters of scientific interest, the impact of misspecification on inferring the latter can range from non-existent to profound. Since we are quite sure our models are wrong, we need to check whether the misspecification is so bad that inferences regarding the scientific parameters are in trouble. It is by this non-Bayesian checking of Bayesian models that we solve our principal–agent problem.

And frequentists have been using Bayesian ideas for a while now (e.g. empirical Bayes / shrinkage).

Can we stop bickering now…

1. February 10, 2013 10:19 pm

I read a similar comment about the appeal of Bayesian approaches in a Bayesian text today. I cannot for the life of me see how anyone sees Bayesian approaches as being an intuitive or appealing way of doing statistics, compared to traditional methods. I see them as being vague, extremely poorly explained from both a philosophical and computational standpoint (to the point of being incomprehensible), and not altogether coherent, whereas tests based on statistical distributions and p values are very clear and straightforward, as in conditional probability generally. Likelihood computations are murkier, but the basic ideas are still clear.

I’m getting a little tired of the Bayesian bandwagon, because they never really show me just how their methods provide a better understanding of whatever I’m studying, nor do statisticians in general seem to have any real sense that we have a LOT of other things to consider that likely affect the quality of our results as much or more than these statistical technicalities.

• February 10, 2013 11:10 pm

Hi Jim. Thanks for the comment. But the only part of your comment that I agree with is “…a LOT of other things…affect the quality of our results…” But even here I’m curious if you have an example of what kinds of things you have in mind — measuring the wrong variables? asking the wrong questions (i.e. type ‘III’ error)? experimental subjects dying? hurricane destroyed my study site?

I absolutely do not believe that Bayesian approaches are worse than non-Bayesian approaches, nor do I believe that the opposite is true. What matters much more is how well you use the approach that you choose. If you find Bayes to be troublesome, than my advice is don’t use Bayes. Its like that star trek episode when Dr. Beverly Crusher’s patient says something like ‘it hurts when i raise my arm’, and Crusher replies ‘well don’t raise your arm than’. There are many people who use Bayesian methods very well. If you don’t want to read their papers then you might be missing out — then again there are lots of good papers to choose from these days so maybe its no big deal?

The point of my post wasn’t to attack Bayesian approaches, but rather to attack a justification for Bayesian approaches that I dislike. In other words, Bayesian approaches come with many benefits (as do many non-Bayesian approaches), but Bayesian updating is not one of them.

What’s wrong with Bayesian updating in particular? It assumes that your model is perfect. Models are never perfect, obviously. Therefore, a better way to view Bayesian models is not as an update of our beliefs, but rather as just another model to be tested, probed, and checked. In other words, check the assumptions of the model that results from Bayes’ theorem, and learn from the aspects of the model that fail. See…no different than p-values and all that, but with a prior. What’s the problem?

• February 11, 2013 1:28 am

Thanks Steve. Well, mainly I was ranting about how poorly presented Bayesian approaches/techniques are compared to frequentist techniques, where the concepts and methods are crystal clear and tight, to me anyway. Fisher, like him or hate him, at least he was clear as to exactly what he was doing and why and had an airtight mathematical foundation for it. Bayes on the other hand…I often feel like there’s no there there–or if instead there really is something there, it’s not explained very well…which in turn makes me think there’s nothing really there. I am just not convinced we need anything other than sound application of maximum likelihood principles, and some good hard thinking about the nature of our system, which comes from an integrated combination of theory, observations, and the literature.

To answer your question–yes you mentioned several of the issues. The main one in my mind is we have to measure the right variable(s), where “right” choices require a solid understanding of the system in question, and we have to measure them correctly–i.e. at the right scale, accurately etc. And just the logistics of getting enough of the right kind of data are a major effort, or can be. Spending time trying to figure our whether to follow Fisher or Bayes isn’t going to help us if we mess that up.

I like your idea about using the theorem to evaluate the legitimacy of one’s model instead of assuming the model’s fine and updating your probabilities like an automaton. I don’t know how to implement that in practice though.

• February 11, 2013 3:33 am

I hear where you’re coming from but your tone is just a little over the top for me — e.g. describing Fisher as “…crystal clear and tight…” with an “…airtight mathematical foundation…”. If only that were true.

For example, here’s a description of something bad that can happen if one applies Fisher’s “airtight mathematical foundation”:
http://www.stat.columbia.edu/~gelman/research/published/power4r.pdf

This article describes issues with the estimation of small effects in small samples. Essentially the problem is that null hypothesis testing with maximum likelihood estimation — in small samples — allows you to interpret estimates of effect sizes ONLY if the estimates are are very large. Small effects should lead to large effect size estimates roughly 5% of the time. If we collect enough data we might not mind waiting until Fisher’s theory tells us we can interpret our estimates, and when that happens we’ll be ‘rewarded’ with an unreasonably large estimate of the effect. So in these settings p-values and ML can lead scientists to make unreasonable claims with ‘confidence’. Bayes (with informative priors) on the other hand tends to shrink estimates towards more moderate and reasonable values.

Don’t get me wrong…blind application of Bayes’ theorem is no good either. My overall position on these issues is that some very smart people have been debating the foundations of statistics for over a century now, and we still aren’t anywhere near a consensus. The ‘Bayesians’ have good points and so do the ‘frequentists’, not to mention the fact that almost zero applied (or even theoretical??) statisticians are 100% in one camp or the other. Its just not a debate where its valid to offhandedly dismiss one side or the other.

To end on what will probably be a slightly more agreeable note, I do think that a big problem with Bayes is the large amount of time required to learn the mathematical and computational skills to properly apply the methods:
https://stevencarlislewalker.wordpress.com/2012/10/19/the-catch-22-of-slide-22-pedagogical-troubles-with-conjugate-priors/

2. February 11, 2013 12:11 pm

Thanks Steve, real good answer, and the catch-22 article is great–the time/effort investment issues you discuss there are definitely my concerns.

W.r.t. the Gelman article: OK I can see that apparently some people forget that you need a large effect size when your sample is small, to really detect an effect, although that issue’s obvious to me (and most of us I would guess: as always, make sure you have a good sample size!) What I don’t get is this: on *exactly* what basis did they set the mean of their prior distribution to zero? Presumably because we know, apriori, that X and Y chromosomes segregate as 1:1, therefore boys = girls. But they don’t explain that, they just set the mean to zero, mentioning briefly that they altered the variance to test for sensitivity to shape of the distribution. This is a biologically sound basis for a prior distribution for sure–chromosomes in diploids *do* usually segregate 1:1, but they’re not trying to test what “usually” happens in diploids: they’re interested in what happens in the subset of human beings labeled by some as “most attractive”. If you already know that X/Y chromosomes *always* segregate 1:1 in humans, then you **already know the answer** to what it is you’re supposedly testing, which is: “no, there’s no difference in boys/girls, there *can’t* be–outside of sampling error”. Of course, this is not Gelman et al’s doing–it’s inherent rather in the original study they’re critiquing.

Here’s a practical problem for me. We know that autocorrelation in a time series leads to spurious trends more than 5% of the time (way more as the lag-1 ac coefficient approaches 1.0). So we observe a time series of some observed variable, having a high lag-1 ac, say 0.9 or whatever, and a very steep linear regression slope, say b = 1.0. How do we set a Bayesian prior to estimate the relative likelihoods between the hypotheses of (1) the slope is spurious, due to auto-correlated data vs (2) the slope is real (due to some external driver of the variable). I have no theoretical basis to guide me, nothing analogous to chromosome segregation rules. And it’s not helped all that much if I have a suspected driver variable measured either, because, if that one’s autocorrelated as well, I will get spurious correlations between the two variables at a high rate. How does the Reverend Bayes help me in my hour of need here? Would he tell me to set the mean of the prior distribution of true slopes at zero, or at the slope value I’ve actually observed in the data? Or do one of each and compare them? Or something else again? And why?

3. February 13, 2013 2:40 am

Pedantic point: the paper you link to is by one leading self-described Bayesian (Andrew) and one leading self-described *frequentist* (Cosma). 😉

• February 13, 2013 4:35 pm

No…this is a very good point. I figured that it wasn’t very interesting to point out that a frequentist is critical of Bayesian updating. But now that I think about it, just pointing out that a Bayesian and frequentist are writing together is probably the best evidence for my belief that the Bayesian-frequentist debate is just not relevant anymore.