the real question is whether it's actually predicting or just retrieving similar setups it's seen before... like run something slightly exotic and see if it still works, that's the benchmark that would actually matter
Could you also share the actual equations of (some of) the models, including their assumptions? I’m curious why two models came to a much lower temperature at t=0
My predictions also would have been different both to the LLM predictions and what actually happened, but the main thing that surprised me here was boiling the water in the microwave :p Do you not own a kettle???
I do! (I used it in a previous middle-school science project https://dynomight.net/fahren-height/) But here I wanted to make sure I had exactly 8 oz of water, so this seemed easier since I could verify that I didn't lose anything significant to the boiling and then pour it quickly into the mug.
I thought this was going to be a very technical point about what LLMs *really* predict
As opposed to "the next token", a guess most readers would make after reading the first part of the title (because they are next-token predictors themselves)
I would have guessed fairly rapid cooling followed by fairly slow cooling because I am a daily coffee drinker and that has been my experience. I suppose there's some lesson about machine cognition vs. human cognition in there.
Also, the fact that you hang out with Dynomight in real life is cool. Good job on the middle-school science experience.
It's interesting that none of them thought that the act of pouring the water would dissipate heat, which is something anyone who has made instant noodles knows (since the very act of pouring generates a lot of steam and cools the water)
In their defense, I did skim the chain of thought for a few of them and they considered a LOT of other modeling options. It's just that they decided a simpler model was safer in the end.
I don't remember them specifically mentioning this, though the chain of thought was often like 10,000 words long, so it wouldn't surprise me.
wow 10k words is a lot. I'm also quite impressed with them here tbh - I don't think I would've been able to do this myself, and I was a physics major in college (I mean I could draw a graph but idk how good it would've been).
Yeah, I think this is one of those problems that's more impressive when you have more knowledge: You need to know a lot of physics to appreciate how hard it is!
the real question is whether it's actually predicting or just retrieving similar setups it's seen before... like run something slightly exotic and see if it still works, that's the benchmark that would actually matter
Cool experiments and interesting results!
Could you also share the actual equations of (some of) the models, including their assumptions? I’m curious why two models came to a much lower temperature at t=0
Sure, you can see them all here: https://dynomight.net/coffee/#:~:text=Appendix
My predictions also would have been different both to the LLM predictions and what actually happened, but the main thing that surprised me here was boiling the water in the microwave :p Do you not own a kettle???
I do! (I used it in a previous middle-school science project https://dynomight.net/fahren-height/) But here I wanted to make sure I had exactly 8 oz of water, so this seemed easier since I could verify that I didn't lose anything significant to the boiling and then pour it quickly into the mug.
I thought this was going to be a very technical point about what LLMs *really* predict
As opposed to "the next token", a guess most readers would make after reading the first part of the title (because they are next-token predictors themselves)
But the actual article is equally lovely
Ha, you're saying you parsed the title as "the thing that LLMs really predict is: my coffee"?
(I realized the title wasn't completely clear, but I figured it was in the right part of feature space...)
Well it could have been something like "the thing they predict is so broad I could say it's a bunch of things, like my coffee"
I didn't feel misled to be clear
I would have guessed fairly rapid cooling followed by fairly slow cooling because I am a daily coffee drinker and that has been my experience. I suppose there's some lesson about machine cognition vs. human cognition in there.
Also, the fact that you hang out with Dynomight in real life is cool. Good job on the middle-school science experience.
Now ask them for a prediction for a scenario starting at 95 degrees...
It was 100 degrees (before pouring it into the mug). I checked.
It's interesting that none of them thought that the act of pouring the water would dissipate heat, which is something anyone who has made instant noodles knows (since the very act of pouring generates a lot of steam and cools the water)
In their defense, I did skim the chain of thought for a few of them and they considered a LOT of other modeling options. It's just that they decided a simpler model was safer in the end.
I don't remember them specifically mentioning this, though the chain of thought was often like 10,000 words long, so it wouldn't surprise me.
wow 10k words is a lot. I'm also quite impressed with them here tbh - I don't think I would've been able to do this myself, and I was a physics major in college (I mean I could draw a graph but idk how good it would've been).
Yeah, I think this is one of those problems that's more impressive when you have more knowledge: You need to know a lot of physics to appreciate how hard it is!
Thanks for clarifying!