Bayes is not a phase

May 15

> Are you really going to rely on that number?

Only if I check all the assumptions and calculations very very VERY carefully. If I do, then I would. But it's insanely hard to make sure you didn't make a mistake and I consider this one of the biggest downsides of Bayes in practice.

That said, I'm pretty OK with people who use the math as a metaphor for vibes. Done well, I think this leads to better vibes, at least sometimes.

Expand full comment

Victualis

Apr 13

Is the aleatoric/epistemic distinction actually adding value here? In practice there is a line between stuff one might be able to model, and everything else (that might as well be aleatory even if one could in theory control it by spending resources to measure it and then push to change it). The former goes in the epsilon in the model, and the rest goes into the model itself. Statisticians I know usually prefer either frequentist or Bayesian approaches, but will use the tools from the other toolbox when it is useful to do so.

Expand full comment

Silvan Büdenbender

Mar 9

Well, hello!

First, thanks for the distinction between "uncertainty in the world" (aleatoric) and "uncertainty about the world" (epistemic), that seems really useful.

I got caught up on the phrase "optimal decision procedures".

I get that you refer to maximising expected value. I just want to question the implication that taking decisions to do so results in optimal decisions in an individual setting with a) finite resources b) finite time

Individually I might run out of either or both before my decisions can average out.

So I would argue that many scenarios exist where the baysian optimal does not align with a practical optimal.

Expand full comment

Mar 10

> I just want to question the implication that taking decisions to do so results in optimal decisions in an individual setting with a) finite resources b) finite time

This is quite similar to my argument! With an infinite amount of time, I think I'd create a Bayesian model. But creating them is so hard (and so "fragile") that I rarely want to do that in practice.

Expand full comment

Silvan Büdenbender

I see. I was getting at "given a model, I might be unable to realize returns as I can't take action long enough" but as I understand you you'd (given you could) embed the drainage of resources into the model in the first place?

Expand full comment

I don't know how to model Bayesian inference when that inference is somehow self-reflective and aware of the costs of its own reasoning. I'd be quite surprised if that doesn't run into halting problem type issues, though!

Expand full comment

RDM

Mar 3

I am in the 99.9% complement to the 0.1% b-probability competence set you are in. Even so, a question:

Given the 'fragility' of Bayesian analysis IRL w/r/t parameters...is there an approach/formalism in Bayes World™ that has multiple models 'checking on each other'? To make this vague idea somewhat clearer, I am not asking about updating a given model. Or improving its prior. Nor am I talking about ensembles of models (as I assume the IPPC Posse refers to...)..but instead multiple concurrent models that somehow <energetic hand waving> learn from each other and adapt to explore 'parameter space' for valid values?

So to model real phenomenon "P", we have models A, B, C, ... etc and they not only operate on the data with different parameterizations, they are interacting and learn from each other's mistakes and update according -- based on P data and {A,B, C, ...} prediction errors.

Is that a sensible question? If so, is there such a practice or formalism?

Expand full comment

Mar 3

The simple answer, as far as I know is, no. The complex answer is that in Bayes, whatever you do has to be done by creating distributions such that the rules of probability do what you want. So (not what you're asking) you can create an ensemble of models and then the rules of probability will naturally mean that the models that fit the data better have more influence in the posterior. If you wanted to do what you're saying, you'd need to have some kind of complex cross-model prior where the likelihood of model A having parameters θ depends on the probability of model B having parameters φ. Then when you do inference you'll both figure out how likely A and B are (as in an ensemble) but maybe the parameters would also influence each other. I'm not aware of anyone doing that but it's the closest thing to your idea I can think of!

Expand full comment

RDM

Mar 3

Appreciate the good listening and thoughtful answer.

Expand full comment

Dan Glick

Mar 1

Did you deliberately construct your example so that "leaving money on the table" would be literal?

Expand full comment

Sean

Mar 1

"You take a penny and flip it 20 times, getting 16 heads". Even if I was 'pretty sure' this was a fair coin, I wouldn't bet on it.

Expand full comment

Evan Goldfine

I don't trust people who would offer coin games like this. I don't want to be adversely selected for choosing to play a mug's game, so I wouldn't risk a guaranteed payout of any amount of money beyond perhaps the entertainment value of getting swindled.

Expand full comment

Jacob

My struggle with Bayesian reasoning - as typically used by non-statisticians - is the absence of error bars. If somebody says "I think there is a 5% chance of AGI by 2030" my immediate question is "what would persuade it was 6%?". I would be more accepting of somebody who said "5% plus or minus 5 points", thus acknowledging just how squishy our assumptions.

And then when new information comes along we should do _two_ things: adjust our prediction *and* our bounds.

So for example if you had asked me last week what I thought the odds of the LLM bubble bursting this year I would have said 50% plus or minus 25 points, because, as Warren Buffett pointed out, markets are capable of staying irrational for a very long time. And then Microsoft quietly mentioned that it is dialing back the massive investments it had planned in data centers to run LLMs, and my new guess would be 70% plus or minus 10 points, which means it's time to take some Nvidia profits out of the market.

The other advantage of this is that helps you to think about best case and worst case scenarios, and whether you are risk averse or risk tolerant. If I were risk averse, I would go with the 80% number and decide it's a good time to take any profits I have in NVidia off the table. If I were risk tolerant, I would go with the 60% number and take just half of my money and let the rest ride.

[The other problem I have is the one you point out, that in real life it's really hard to build a decent model. In my experience, when most non-experts talk about using Bayesian reasoning, they can't even justify their initial prior.]

Expand full comment

Reply (3)

MoltenOak

Mar 2

Yeah, I feel like Bayes focuses on a single number which happens to be a probability, but doesn't provide bounds on it. Comparing to you example, it's like saying "How many new LLMs will come out in 2030? 5." Here, the Bayesian thing would be to provide some kind of distribution or at least range over the values. If you did that, even for values which end up being a probability, you would get useful error bars.

Expand full comment

skaladom

Yeah that's another kicker too. There's a whole lot of information we hold about an uncertain event, beyond the bare credence as a number between 0 and 1. Some things are "let's say 50% because I have no idea", others are "precisely 50% because the math says so", and Bayes treats them exactly the same.

So on second thought, I'm not really convinced that Bayes is complete enough as a formalization of proper probabilistic reasoning.

Expand full comment

https://www.youtube.com/watch?v=UvBnyVwd47s

For hardcore Bayesians, I think the attitude is that the kind of uncertainty you're talking about is "inside the model" and is supposed to be "integrated out" when you state the final probability.

Like, maybe I think there's a 90% chance that AGI would happen by 2050, *unless* there's large-scale nuclear war, in which case there's only a 10% chance. If I think there's a 30% chance on nuclear war, then I'd just compute .7 × 90% + .3 × 10% = 66%. But then if there's a new giant nuclear arms control treaty, I might change to .9 × 90% + .1 × 10% = 82%.

Probably you could do something where you only state *bounds* on your priors and get some sort of bound on the final result? This could be quite useful, since it's often not obvious which priors are really impacting the solution. I'd be surprised if there's not some academic work out there somewhere, although nothing comes to mind...

Expand full comment

Feb 27Edited

>find new word "stochastic"

>use context clues to piece together meaning

>confirm with google

>means random? Why not just say random?

>years pass

>find new word "aleatoric"

Expand full comment

Reply (2)

Sean P.

Feb 27Edited

Hipsters of a certain age learned the word from Read, Eat, Sleep by The Books.

Expand full comment

I know, I know! I kid you not, I tried asking some LLMs if there was any equivalent in everyday language to express the distinction between "aleatoric" and "epistemic" uncertainty.

Expand full comment

skaladom

Feb 27Edited

> And yet, here I have a blog where I’ve examined if seed oil is bad for you and if alien aircraft are visiting earth and if it’s a good idea to take statins or use air purifiers or get colonoscopies or eat aspartame or practice gratitude or use an ultrasonic humidifier. And I have used formal Bayesian models never. Why?

This is the thing I was waiting for someone to say. In practice, we don't actually "do Bayes" in the formal sense of the world. Not only because it's a lot of work, but also because it's incredibly fragile. A slight error in the middle of a 500-line model means we get the completely wrong result, and your intuition probably won't save you because you've explicitly gone into "computer mode" instead of using the usual heuristics, intuitions and shortcuts, and being careful with them.

I still remember the crazy endless substack post from some guy trying to Bayes his way through the question of whether COVID was a lab leak or zoonosis, and ending up with massive odds in favor of lab leak. Yet most other people who considered the evidence tended to come in the ballpark of 50%. Guess who I trust better!

So yeah, I'm not too surprised that there would be a backlash against people bringing up their bayesianism all the time. Yet, as you say, it really *is* the formalization of how credence get updated, so it's 100% good that people are learning to reason at that level, and to translate their reasonings to bayesian form.

EDIT: the COVID guy came up with massive odds in favor of lab leak, not zoonosis.

Expand full comment

I couldn't agree more. It's *incredibly* fragile, in a way that I think is impossible to appreciate without experience. I think the most common mistake is that people state priors over different variables without seeming to worry about the dependence between those variables in the prior. (This is why I don't trust Bayesian versions of the Drake equation, for example.)

I wouldn't trust a model without some kind of aerospace-level verification level of each line of the model. Except it's worse, because there's no real modularity.

I've daydreamed about inventing some kind of "safe" "mini-Bayesian" reasoning. I'm not sure what this even means, but it seems like there's got to be SOMETHING between "state complete formal model of everything" and "read a lot and see how you feel at the end".

Expand full comment

Reply (2)

Julian D'Costa

can you say more about why the Bayesian models are so fragile in your experience? what happens if you do a sensitivity analysis?

it seemed to me that the issue with the rootclaim stuff was more about the incredible number of degrees of freedom available in choosing how you factorize the joint probability distributions and what variables you pick to factorize over

Expand full comment

I mean broadly speaking you just have a very large range of possible choices that yield different predictions. I'm not sure exactly how one would do a sensitivity analysis in general. You could certainly do that for single parameters in a given model. But what about different distribution choices? What if you've assumed things are independent when you shouldn't have?

Expand full comment

MoltenOak

Mar 2

I guess you could go with a formal, yet explicitly sub-complex model? Kind of how for regression, I can just do a linear regression as a first approximation which will probably give me somewhat reasonable results where I have lots of data, but I am aware I shouldn't put too much trust in it, especially not when extrapolating. Or I could use a prior as a sane "ok this is what my intuition/reading etc tell me" value, and then, depending on my felt uncertainty regarding all the parts of the process I'm unsure about (correct model? all info collected? Situation understood? Etc), I could update my prior more or less. Kind of like using so called pseudo-observations, but less formal.

Expand full comment

David Khoo

Feb 27Edited

Look, if you define everything from pure frequentism (i.e. only aleatoric uncertainty exists) to full Bayesian as Bayesian, then of course you can't escape being Bayesian. You've defined everything as Bayesian, even the opposite. Your coin example is 100% aleatoric uncertainty, for example.

There are plenty of ways to make decisions in the presence of epistemic uncertainty without resorting to Bayesianism. You can use heuristics (e.g. when in doubt, take the safe bet -- don't bet on the coin). You can perform rule-based reasoning (e.g. if someone takes out one coin and asks you to bet on it, always refuse).

Sure, these methods may be strictly worse than Bayesian decision making, in the sense that they have poorer expected value, but people don't have to optimize for expected value. If anything, that's what defines Bayesianism to me: the belief that "winning bets", maximizing expected value, is the sine qua non of life and decision making. It's not.

Expand full comment

I'm curious if you saw the place where I explicitly said it's debatable if the coin example is really Bayesian?

Expand full comment

David Khoo

Yes, I did. And I'm giving my opinion that it's not Bayesian in the sense you define in the article. Your later elaboration on the coin problem -- like the case where you don't know how many of each coin there are -- do have epistemic uncertainty, and are "Bayesian" in the sense you describe.

So let's consider the case where you don't know how many of each coin there is. You don't know if they are nice and it's 50% or they're nasty and it's 1% gold. You might have some belief, which you can represent as a probability distribution. Cool, then go do Bayesian things if you like. But very often you have no prior belief in such cases, no probability function. (No, the "uninformative prior" doesn't actually represent this.) How do you do Bayesian reasoning now? And don't tell me everyone always has some belief -- that's a Bayesian catechism, not reality.

Or you just don't want to be Bayesian. You could just always refuse bets you can't quantify. Sure, maybe you miss some easy wins, but who says life is about winning? Maybe you just want to keep things simple. Or you just have a simple heuristic, that doesn't rise to the level of "combining aleatoric and epistemic uncertainty". "If I don't know the probability, act as if it's the worst case always, so I act as if 0% gold, even if I actually believe it's something else, because I recognize my thinking can be crap sometimes." That doesn't combine epistemic with aleatoric -- it just says if there's epistemic uncertainty then discard the aleatoric. That's non-Bayesian by your definition. That's also a reasonable rule. Again, maybe not maximum EV, but who says you have to max EV?

Expand full comment

You're making a lot of statements that while I don't 100% endorse, I have sympathy with, and that I thought I had *expressed* sympathy with in the post. So maybe you can help me understand where I went wrong.

Here's a couple things that I'm worried I might not have said as clearly as I should have:

- I agree that Bayesian reasoning is more valuable/reliable/meaningful when you have stronger reasons to believe your prior is true.

- I agree that in practice, you don't want to be Bayesian in every situation. (My argument was that stating a formal prior and reasoning on the basis of it is hard/expensive/dangerous. But there couple be other reasons!)

- I agree there's no such thing as an uninformative prior. (OK, I *know* I didn't make this mistake. :))

Maybe one difficulty is that my goal with the initial coin example was to explain why people are obsessed about Bayesianism. And I think that example *is* a correct description of why people get so obsessed! But (I think) I agree with you the initial coin example is a poor model for how Bayes is often used in practice.

Expand full comment

Lynn Childress