Discussion about this post

User's avatar
dynomight's avatar

OK, turns out I originally screwed up the examples and used one example with an illegal move! (Yet the examples were helpful anyway? What?)

Anyway, I've re-run those experiments and updated the figures now.

Expand full comment
Hugh (Spike) McLarty's avatar

Maybe a dumb question, but how do your LLMs know that you want them to win? Ie to prefer moves that lead to winning? Every published game, except for the occasional draw, has a sequence of moves that led to a win AND a sequence of moves that lead to a loss. And when grandmasters play, they usually play grandmasters ie their games offer comparably ranked examples of both winning and losing. Even if the model was trained to distinguish the winner’s moves from the loser’s, how does it know that you want it to play ‘like the winners’?

Expand full comment
40 more comments...

No posts