29 Comments
User's avatar
Suki Park's avatar

Hi! You might like a book called "The Physics of Filter Coffee". Actually it's out of print. I can send it to you when I'm through with it if you want?

Thomas DeWitt's avatar

Nice. Ive actually done this experiment for a different reason, to test whether thin or thick walls in ceramic cups actually insulate differently. They do not. But, also, the initial drop can be made sense of since as conduction>>convection+radiation+latent cooling. The water first equilibrates with the cup, then more slowly cools. I wonder what the contributions of latent cooling (evaporation) vs. convection vs radiation are.

Art Kuo's avatar

Okay I asked AI to run a simulation, and got a poorer result that looks like a lot like a single exponential (labeled Newton). https://image2url.com/r2/default/images/1774048113152-135b4f32-eaac-47d7-b0a8-342ccaa8b14a.png

GitHub CoPilot CLI running gpt-5.3-codex (medium), same basics as Dynomight's. Agent mostly set up everything on its own, except I had to install OpenFOAM and remind it to use my uv python. It fixed some compatibility issues with the OpenFOAM version and even downloaded a disk image containing tutorials, which it consulted to figure out how to run everything. The first simulation was going to take 1 hr to simulate 5 min, so it offered to do a simpler model which was 10x10x20 mesh with dT=0.1s, runs in about 7s. The plot is for volume-averaged temp.

Although I'm disappointed in the result, I was impressed by CoPilot's ability to set everything up. A human expert would likely need 20 min to do this. Now the model's ready for refinement, where the expert could really go to town. I'm not confident an amateur like myself would be able to generate a decent prediction, but maybe the LLM could guide parameter decisions.

Art Kuo's avatar

Here's a quick way to think about speed of cooling between beginning and end. If coffee were a single thermal mass, the temperature should have exponential decay with time constant tau = RC, where R is the thermal resistance and C is thermal capacitance.

But temp drops much more quickly at the beginning. Now think of the coffee as a bunch of "finite element" thermal masses, each with their own R and C, and different R for each side, like the cup wall vs. another interior element. Now there are a bunch of different time constants to temperature, which now decays as the sum of weighted exponentials. The initial temp drop will be dominated by the faster time constants, and later by the slow ones. We don't know the weightings because of the complications dynomight mentioned, but we do expect faster initial drop no matter what.

Rapa-Nui's avatar

Anthropic wins again.

Chris Lawnsby's avatar

I predict that if I showed this to my friends they would say something like "well there is a firmly established physics equation for this. Dynomight claims that there are many free variables but there really aren't you can just look this up in a textbook."

I wonder if there are any experiments that would surprise them! I guess I could just ask Claude lol.....

dynomight's avatar

Yeah, in retrospect I think this is an experiment that might appear to be much easier to predict than it actually is. (For people who don't know much physics.)

Mira's avatar

the real question is whether it's actually predicting or just retrieving similar setups it's seen before... like run something slightly exotic and see if it still works, that's the benchmark that would actually matter

dynomight's avatar

I think they are actually predicting! For Kimi I could read the full chain of thought and it goes through a billion permutations of different mechanistic models and derives the final —not very accurate—equation from that. For Gemini/GPI/Opus you can only see the summarized chain of thought but it still shows a LOT of this kind of thinking.

Mira's avatar

That's genuinely compelling — mechanistic model derivation is much harder to explain as pure retrieval than curve-fitting would be. Though the "not very accurate" endpoint is interesting: does the reasoning process look correct even when the final equation is off, or does it just confidently arrive at the wrong place through plausible-looking steps?

Ignacio's avatar

Cool experiments and interesting results!

Could you also share the actual equations of (some of) the models, including their assumptions? I’m curious why two models came to a much lower temperature at t=0

Katie's avatar

My predictions also would have been different both to the LLM predictions and what actually happened, but the main thing that surprised me here was boiling the water in the microwave :p Do you not own a kettle???

dynomight's avatar

I do! (I used it in a previous middle-school science project https://dynomight.net/fahren-height/) But here I wanted to make sure I had exactly 8 oz of water, so this seemed easier since I could verify that I didn't lose anything significant to the boiling and then pour it quickly into the mug.

Katie's avatar

Ah fair enough I withdraw my critique. Happy to see more progress in the hot drink optimisation field.

Paulin's avatar

I thought this was going to be a very technical point about what LLMs *really* predict

As opposed to "the next token", a guess most readers would make after reading the first part of the title (because they are next-token predictors themselves)

But the actual article is equally lovely

dynomight's avatar

Ha, you're saying you parsed the title as "the thing that LLMs really predict is: my coffee"?

(I realized the title wasn't completely clear, but I figured it was in the right part of feature space...)

Geran Kostecki's avatar

You hit the right balance of making me click to see wth you meant without making me disappointed I did

Paulin's avatar

Well it could have been something like "the thing they predict is so broad I could say it's a bunch of things, like my coffee"

I didn't feel misled to be clear

Alexander Kaplan's avatar

I would have guessed fairly rapid cooling followed by fairly slow cooling because I am a daily coffee drinker and that has been my experience. I suppose there's some lesson about machine cognition vs. human cognition in there.

Also, the fact that you hang out with Dynomight in real life is cool. Good job on the middle-school science experience.

Ben's avatar

Now ask them for a prediction for a scenario starting at 95 degrees...

dynomight's avatar

It was 100 degrees (before pouring it into the mug). I checked.

Harjas Sandhu's avatar

It's interesting that none of them thought that the act of pouring the water would dissipate heat, which is something anyone who has made instant noodles knows (since the very act of pouring generates a lot of steam and cools the water)

dynomight's avatar

In their defense, I did skim the chain of thought for a few of them and they considered a LOT of other modeling options. It's just that they decided a simpler model was safer in the end.

I don't remember them specifically mentioning this, though the chain of thought was often like 10,000 words long, so it wouldn't surprise me.

Harjas Sandhu's avatar

wow 10k words is a lot. I'm also quite impressed with them here tbh - I don't think I would've been able to do this myself, and I was a physics major in college (I mean I could draw a graph but idk how good it would've been).

dynomight's avatar

Yeah, I think this is one of those problems that's more impressive when you have more knowledge: You need to know a lot of physics to appreciate how hard it is!

Ben's avatar

Thanks for clarifying!