I have a Discord bot emulating me using LLaMA-3-405B and somewhat janky retrieval over a few hundred thousand Discord conversation chunks. It's not that hard. It has a tendency to overindex on the current conversation context though.
Authorship attribution would work if you've written a lot of text with similar features as the text on dynomight.com under your real name. Another plausible explanation is just that the data is somewhere, not necessarily searchable on the Internet, but in the model's training data.
Your friend is pursuing an ultimately sterile dead end. You would be better served to pick human collaborators you respect than a random agglomeration of strangers boiled down into a Mad Libs generator inferred from your past writing. This approach is the apotheosis of intellectual masturbation.
Regarding whether personality is only around 4-6 bits, I think it's possible to make a good faith case to the contrary. But it's difficult to reconcile that with your defense of Myers-Briggs. Even with your proposal to discretize the four axes into 5 categories each, that yields 625 personality types. Going up to 10 bits gets us to 1024 combinations. Yes, 10 > 6, but I doubt that most people who'd scoff at a 6 bit upper bound would accept a 10 bit one.
The model's matching of your real initials to your Dynomight writing is a bit scary, though! I don't have a ton of writing on the internet under my real name, but I do worry about the stylometric capabilities of future models. I haven't had much time to write in the last year anyway, but I'm torn between wanting to seed future generations of models with my writing and worrying about getting my Substack account matched to my actual identity.
I made another estimate for bits: How many random people would I have to sample to find someone who has something "very similar" to me? My guess was that I'm "1 in 2000". So that gets us to... 11 bits.
Regarding stylometry, I'm pretty sure that even current methods could easily identify most of us. I'd imagine it's within the grasp of a single dedicated person today.
I'm curious to hear how you came up with your 1 in 2000 estimate! Perhaps your history has been different, but since high school, I've been in somewhat intellectually cloistered environments. Between honors and AP classes in high school, then attending a selective college and a grad program, and then working at jobs that employ people from a more narrow band of cognitive styles, I have a hard time drawing inferences about how rare my personality style is in the general population. I also think there's a bit of a feedback loop here: perhaps if I'd taken classes and worked with more normies, some of my idiosyncracies would've been smoothed out.
Edited to add: I realize that this sounds like bragging, but that really wasn't my intention! I don't actually think it's a positive thing that I'm so out of touch with most of the population.
Truth is, I just made it up. But if I had to justify it: Like you, I interact with people that are strongly selected to be similar to me, but even so, only a small fraction share my particular sense of humor.
I mean, what fraction of the general population would ever be interested in reading dynomight.net? 5%? 1%? And even in that group, what fraction find https://dynomight.net/warby-parker/ funny?
I’ve never read your writing, or heard of your blog (thanks TLDR link!) but I suppose you should be further alarmed/amused/attempting to cope with the fact that, should your content mirror that of the voice of the AI one in this post (which you seem to be struck by the fact that it DOES); then, congrats! AI just earned you a new reader!!
In all seriousness your comment about the state of modern AI and the comparative advantage of humans was… revelatory.
> "That even though we do have alien overlords who made us and who look down on us with pity, and they aren’t benevolent alien overlords who mostly leave us alone, and they don’t have the decency to keep their existence a secret, and they are so horrifically awful that we’d prefer to be dead than live under their rule, and there’s no way to rebel, and they’ll never leave, and they’ve taken away everyone we love, at least they allow us to keep living."
Sounds like a description of the deeply creepy webcomic 'Everything is Fine' (I'm not saying it's aliens, but...).
Of course they do. If you watch a language model early in training, it learns increasingly fine syntactic detail before it makes much semantic sense. Human communication is "scored" more on somewhat higher-level features; LLMs have to predict the exact wording the input is likely to use.
From what little you've shared---trying to automate you, seeing human personality as "4 or 6 bits," I get a distinct impression that your friend's a dick.
Pure k-shot ICL? No finetuning?
I'd like to sponsor your friend...
It's definitely ICL with no finetuning. (Not 100% sure what k-shot means, though!) He says this only works because he used a base model.
Given how much I wish this kind of AI wasn't possible, I think I'll just hope he doesn't see your offer to sponsor...
I have a Discord bot emulating me using LLaMA-3-405B and somewhat janky retrieval over a few hundred thousand Discord conversation chunks. It's not that hard. It has a tendency to overindex on the current conversation context though.
https://github.com/osmarks/autobotrobot/blob/master/src/sentience.py#L71, though it doesn't have the code for the retrieval part.
Authorship attribution would work if you've written a lot of text with similar features as the text on dynomight.com under your real name. Another plausible explanation is just that the data is somewhere, not necessarily searchable on the Internet, but in the model's training data.
I'm sure it would be easy to identify me with stylometry. But the model didn't seem to know who I was when directly prompted. Very odd!
“OK, it can replace me, but at least there was a 'me' worthy of replacement. My replacement even knows my name!”
Your friend is pursuing an ultimately sterile dead end. You would be better served to pick human collaborators you respect than a random agglomeration of strangers boiled down into a Mad Libs generator inferred from your past writing. This approach is the apotheosis of intellectual masturbation.
Regarding whether personality is only around 4-6 bits, I think it's possible to make a good faith case to the contrary. But it's difficult to reconcile that with your defense of Myers-Briggs. Even with your proposal to discretize the four axes into 5 categories each, that yields 625 personality types. Going up to 10 bits gets us to 1024 combinations. Yes, 10 > 6, but I doubt that most people who'd scoff at a 6 bit upper bound would accept a 10 bit one.
The model's matching of your real initials to your Dynomight writing is a bit scary, though! I don't have a ton of writing on the internet under my real name, but I do worry about the stylometric capabilities of future models. I haven't had much time to write in the last year anyway, but I'm torn between wanting to seed future generations of models with my writing and worrying about getting my Substack account matched to my actual identity.
I made another estimate for bits: How many random people would I have to sample to find someone who has something "very similar" to me? My guess was that I'm "1 in 2000". So that gets us to... 11 bits.
Regarding stylometry, I'm pretty sure that even current methods could easily identify most of us. I'd imagine it's within the grasp of a single dedicated person today.
I'm curious to hear how you came up with your 1 in 2000 estimate! Perhaps your history has been different, but since high school, I've been in somewhat intellectually cloistered environments. Between honors and AP classes in high school, then attending a selective college and a grad program, and then working at jobs that employ people from a more narrow band of cognitive styles, I have a hard time drawing inferences about how rare my personality style is in the general population. I also think there's a bit of a feedback loop here: perhaps if I'd taken classes and worked with more normies, some of my idiosyncracies would've been smoothed out.
Edited to add: I realize that this sounds like bragging, but that really wasn't my intention! I don't actually think it's a positive thing that I'm so out of touch with most of the population.
Truth is, I just made it up. But if I had to justify it: Like you, I interact with people that are strongly selected to be similar to me, but even so, only a small fraction share my particular sense of humor.
I mean, what fraction of the general population would ever be interested in reading dynomight.net? 5%? 1%? And even in that group, what fraction find https://dynomight.net/warby-parker/ funny?
Sure, but I want to read your writing, not what the AI is spitting out.
♡
> Do LLMs have emergent stylometry abilities?
LLMs have emergent stylemetry abilities. https://www.lesswrong.com/posts/dLg7CyeTE4pqbbcnp/language-models-model-us
I’ve never read your writing, or heard of your blog (thanks TLDR link!) but I suppose you should be further alarmed/amused/attempting to cope with the fact that, should your content mirror that of the voice of the AI one in this post (which you seem to be struck by the fact that it DOES); then, congrats! AI just earned you a new reader!!
In all seriousness your comment about the state of modern AI and the comparative advantage of humans was… revelatory.
Great post mate.
> "That even though we do have alien overlords who made us and who look down on us with pity, and they aren’t benevolent alien overlords who mostly leave us alone, and they don’t have the decency to keep their existence a secret, and they are so horrifically awful that we’d prefer to be dead than live under their rule, and there’s no way to rebel, and they’ll never leave, and they’ve taken away everyone we love, at least they allow us to keep living."
Sounds like a description of the deeply creepy webcomic 'Everything is Fine' (I'm not saying it's aliens, but...).
https://m.webtoons.com/en/horror/everything-is-fine/list?title_no=2578
OK this is REALLY weird.
I DO NOT feel thankful having learned this! Not at all. I guess that's the joke.
This was definitely a missed opportunity for a good joke. ("Underrated reasons to be apprehensive"?)
> Do LLMs have emergent stylometry abilities?
Of course they do. If you watch a language model early in training, it learns increasingly fine syntactic detail before it makes much semantic sense. Human communication is "scored" more on somewhat higher-level features; LLMs have to predict the exact wording the input is likely to use.
From what little you've shared---trying to automate you, seeing human personality as "4 or 6 bits," I get a distinct impression that your friend's a dick.
I've sent this to him (with commentary: "check out this sick burn") but have not been able to get any response. He's maddeningly even-keeled.