16 Comments
User's avatar
Mira's avatar

the real question is whether it's actually predicting or just retrieving similar setups it's seen before... like run something slightly exotic and see if it still works, that's the benchmark that would actually matter

Ignacio's avatar

Cool experiments and interesting results!

Could you also share the actual equations of (some of) the models, including their assumptions? I’m curious why two models came to a much lower temperature at t=0

Katie's avatar

My predictions also would have been different both to the LLM predictions and what actually happened, but the main thing that surprised me here was boiling the water in the microwave :p Do you not own a kettle???

dynomight's avatar

I do! (I used it in a previous middle-school science project https://dynomight.net/fahren-height/) But here I wanted to make sure I had exactly 8 oz of water, so this seemed easier since I could verify that I didn't lose anything significant to the boiling and then pour it quickly into the mug.

Paulin's avatar

I thought this was going to be a very technical point about what LLMs *really* predict

As opposed to "the next token", a guess most readers would make after reading the first part of the title (because they are next-token predictors themselves)

But the actual article is equally lovely

dynomight's avatar

Ha, you're saying you parsed the title as "the thing that LLMs really predict is: my coffee"?

(I realized the title wasn't completely clear, but I figured it was in the right part of feature space...)

Paulin's avatar

Well it could have been something like "the thing they predict is so broad I could say it's a bunch of things, like my coffee"

I didn't feel misled to be clear

Alexander Kaplan's avatar

I would have guessed fairly rapid cooling followed by fairly slow cooling because I am a daily coffee drinker and that has been my experience. I suppose there's some lesson about machine cognition vs. human cognition in there.

Also, the fact that you hang out with Dynomight in real life is cool. Good job on the middle-school science experience.

Ben's avatar

Now ask them for a prediction for a scenario starting at 95 degrees...

dynomight's avatar

It was 100 degrees (before pouring it into the mug). I checked.

Harjas Sandhu's avatar

It's interesting that none of them thought that the act of pouring the water would dissipate heat, which is something anyone who has made instant noodles knows (since the very act of pouring generates a lot of steam and cools the water)

dynomight's avatar

In their defense, I did skim the chain of thought for a few of them and they considered a LOT of other modeling options. It's just that they decided a simpler model was safer in the end.

I don't remember them specifically mentioning this, though the chain of thought was often like 10,000 words long, so it wouldn't surprise me.

Harjas Sandhu's avatar

wow 10k words is a lot. I'm also quite impressed with them here tbh - I don't think I would've been able to do this myself, and I was a physics major in college (I mean I could draw a graph but idk how good it would've been).

dynomight's avatar

Yeah, I think this is one of those problems that's more impressive when you have more knowledge: You need to know a lot of physics to appreciate how hard it is!

Ben's avatar

Thanks for clarifying!