I'm part of the long tail of people who have been creating free content for decades without generating any significant following. I like to think it's just because I'm unlucky, bad at self-promotion, and/or working on things that are too niche. (Could it be that I'm a boring or not very good writer/photographer/coder? That's a theoretical possibility....)
Now I'm actually excited that, thanks to LLMs, my unnoticed work will be incorporated into the corpus of human intelligence. Yes, this abstract "reward" is a new motivator for me to continue producing and sharing things!
Maybe a variant of Spotify’s royalty model could work? Pool revenue (eg from taxes) and distribute them in proportion to how much your ideas are used (eg an AI model can scan for derivative works). Ideally I think you’d want to distribute it in proportion to how much value is created downstream (an IP VAT?) but not sure if that’s feasible
Obviously prone to fraud and abuse but that’s just a bigger version of what Spotify encounters
I think the writing analogy of Spotify is (was?) Medium. And these seem to have similar failure modes due to the fact that they only reward based on "amount of content consumed", which tends to lead to a lot of "slop". (Medium has fallen. Spotify is mostly OK, though a lot of people complain about AI-generated music, etc.)
If you really could measure "downstream value created", you'd be on to something, though!
Not sure if this is relevant but I've long thought that the compulsory licensing system in US music law is awesome and should be expanded to other domains. Not that I've thought about the details, it just seems like generally a great idea.
Sigh. An old dilemma-Charles Dickens had to fight copyright proplems in the US, and Mark Twain had to fight copywrite problems in the UK. I write legal documents (labels) in the US and review similar docs for a certification agency. I frequently find that my words have been copied by other writers. (I know this because I specifically write certain elements to be recognizable to myself as my writing. These elements might be recognized by other people also if anybody ever read what I write, but nobody does. This is just my internal way to prove to myself that I do exist) I haven't worried about AI-yet-because I have to generate new text, along with legal boilerplate-type stuff from trade secret chemical formulations. Those chemistries are not something that AI has access to at the moment. But once I've sent my words out into the public domain, anyone can copy them. A copywrite is simply a thing that gives you the right to go to court to demand legal protection-you must still have the money to defend your creation.
I think that in theory those expressions might qualify as "creative expression"? And in theory copyright violation is a crime (at least in some jurisdictions). But in practice, it's virtually never pursued criminally except in the most egregious cases and in practice good luck convincing anyone with those phrases.
I was very surprised to learn that "trap roads" don't actually work. Courts accept it as proof that something was copied, but (at least for maps) they don't care because fake facts are still "facts".
Would it be that hard with current AI search algorithms, like Google's AI answer at the top of searches, to include links to the sources for that answer?
Google no longer creates an AI summary for the Chess question, but there is a link button under the answer in the screenshots you shared. Did this link to your post on Chess?
Personally, I don't trust those AI answers at all, since they're often wrong, or missing important context, but I will often click on the link that the answer is pulling from, to quickly judge whether the answer is trustworthy or not. If Google answers this AI question by linking to your post, I assume you'd get a lot of attention from people like me who either want to see the source, or want a deeper dive to the question. Anyone asking "can llms play chess" is probably already the sort of person likely to click that little link button and thus be exposed to your content.
Alternatively, I'm odd (true) and most people don't even know that the little link icon is a button you can press. Or it links to something else that isn't you? (why would it do this?) I think depending on whether the AI summary links to your post or not will completely change my opinion on whether the current state of things is a major problem, or completely and actually helpful to creators.
I know normal LLMs not connected to the internet don't cite sources, but that really doesn't matter to me with current capabilities, and if there was an LLM amazing enough to reproduce well-researched articles in entirety, with no hallucinations, from "memory" (that is, not searching for an answer but producing it without connecting to the internet), then I imagine writers are on their way out anyways.
This was a while ago, but I'm ~75% confident I checked the link button and it didn't go to me. I don't think this was malicious or anything, just a quirk of how they were (I assume) doing RAG.
It's odd that they only had the one link at the top. Usually they seem to provide links for each individual claim, which I agree seems far less rude (as judged by my probably-obsolete social norms that were developed for humans).
> if there was an LLM amazing enough to reproduce well-researched articles in entirety, with no hallucinations, from "memory" (that is, not searching for an answer but producing it without connecting to the internet), then I imagine writers are on their way out anyways.
Yeah, in this scenario, arguably "writers" wouldn't be providing much value and one could argue they should be on their way out. What I worry about in this scenario (aside from the fact that I like writing...) is that by "unbundling" fact creation from writing, this destroys some of the incentive that previously existed to create the underlying facts in the first place.
One way or another, though, I strongly expect AI will become "the interface". The benefits are just too large!
Long comment, that perhaps doesn't make much sense, but you did end with "New ideas needed" after all!
With a new "interface" my brain runs to those late 90s-early 00s startups like Google, Yahoo, or LinkExchange (https://en.wikipedia.org/wiki/LinkExchange) that allowed people to monetize webpages that previously were just providing informational content for free. Back then, search engines (and I guess homepages) were the "new interface" and a few savvy people were able to make it seriously big.
Maybe there's an opportunity for a company to write a robots.txt bit of code that requires a payment made by the AI company making the query, which it can't do without specific permission? Permission is granted by sharing the number of queries an AI makes for what links (maybe plus user-feedback judging the usefulness of this or that link? Maybe this would be unnecessary as I assume AI companies are already interested in outputting the most useful RAG results) These payments would be collected by the company (Call it "DisallowAI.com" (available for $0.01, disallow.ai is owned, but unused), and distributed to its users based on usage, minus a fee.
With enough prevalence, the most useful sites can't be used (and linked to) for RAG, which would be a strong incentive for AI companies to sign onto the system, or be outcompeted by a fundamentally superior RAG AI that is (maybe even one developed by disallowai itself?). That is, unless OpenAI, Copilot, or whatever, pays a tenth of a cent (or whatever) to DisallowAI with the data on number of queries, which then is distributed to creators proportional to usage.
Or maybe it's too difficult to make OpenAI pay disallowai a small fee to link to work protected by their code. Even then, perhaps the most useful new interface would be the one that has permission to link to, and use, the most useful information available via RAG. Then monetization could just copy the subscription model, or an Ad-Based monetization scheme like our current interfaces.
But what's to stop OpenAI (or anyone else) from just doing RAG in spite of this code, which apparently already happens? Conveniently, there would now be a centralized company, with a lot of money, that has the financial incentive to ensure its users work isn't being taken without permission. Maybe they can claim at first that they didn't know, but that's hard to claim after the first demand letter.
I suppose the New York Times lawsuit would make or break this idea. Even if it fails though, perhaps the law would recognize the difference between an LLM using an article as training data, and repeating that information, and using an article directly linked + a summary. Cool idea. Not sure if it's technically possible, practicable, or someone has already implemented it.
I like the way you're thinking! I feel like something along these lines should in principle be possible.
> But what's to stop OpenAI (or anyone else) from just doing RAG in spite of this code, which apparently already happens?
I think trade secrets law might be the key here! In principle, a company could create a "wall" where in order to access the information, everyone essentially has to kind of sign an NDA. If that would qualify as a "reasonable effort" to keep the information secret, then none of copyright or patents or whatever matter. Just trying to access the information would be industrial espionage. Even *planning* to access the information plus taking any step would be "conspiracy" to commit industrial espionage.
I have no idea if any of this would actually hold up in court, though. And there's a serious problem which is that if you ever voluntarily share the information to any party who hasn't signed the NDA, then it's now a regular fact that can be shared freely. So many this is one of those libertarian fantasies about how all you need is contract law and you can create your own de-novo legal system...
I liked this post. Thank you for leavening the big ideas with humor.
To your point, I wonder how all this will collide with a different, related, projected outcome of the LLM age. I mean how the internet/discourse will be flooded with AI spam and lies — generated photos, videos, voice, and text. Given all that, people would presumably retreat to large trusted brands — media companies that would pinky swear that they checked their facts, did their research, and wouldn’t get bamboozled by AI-generated video “evidence”. In other words, if Wikipedia gets overrun, people will retreat to Brittanica.
And, in THAT world, maybe the LLM Wild West will be more manageable specifically because we’ve moved back to gatekeepers.
That's an interesting point. I feel like we already see a prelude of that in that the regular web is already flooded with SEO generated garbage "content". Probably now much of this is AI generated, but it happened even pre-LLMs, and as far as I can tell, the effects are much as you predict—people basically don't trust any of the sites that some up in a search engine unless they're some kind of "trusted brand".
A related issue is that it would be great if there was some kind of "proof of work". People sometimes accuse me of using LLMs to write this blog. I think most people would trust me, since it existed pre-LLMs. But in general, I think a lot of people will want some guarantee that text comes from an actual human that's willing to put their reputation on the line. (Bad for pseudonymous bloggers...)
Yeah, I think that’s right. There’s that standard for verifying cryptographically that a photo was taken by a real camera. Feels harder for text, but I’m sure interesting identity technologies are on the horizon. Will they matter more than just bog-standard reputation? Probably not.
In a world where reputation matters more, I think we’re going to see more honor culture (in all its forms). I wonder what duels will look like
Great read! This has increased the chance that I will provide you with resources in times of need or have sex with you. Hopefully that’s enough incentive to encourage future articles.
I'll defend it to the death as my favorite game and the best game of all time (even though I work professionally on a different video game), but when I think about sitting down to play, I'm usually too tired.
Great game to watch though. Next season of ASL has a Zerg in every group!
Wonder what it tells us about Korean culture that it was such a good fit. And how much of their fertility crisis can be laid at the feet of blizzard 💀
Isn't our current best answer, public research and education? i.e. researchers that cannot be fired.
I'm part of the long tail of people who have been creating free content for decades without generating any significant following. I like to think it's just because I'm unlucky, bad at self-promotion, and/or working on things that are too niche. (Could it be that I'm a boring or not very good writer/photographer/coder? That's a theoretical possibility....)
Now I'm actually excited that, thanks to LLMs, my unnoticed work will be incorporated into the corpus of human intelligence. Yes, this abstract "reward" is a new motivator for me to continue producing and sharing things!
Great point, makes sense!
Maybe a variant of Spotify’s royalty model could work? Pool revenue (eg from taxes) and distribute them in proportion to how much your ideas are used (eg an AI model can scan for derivative works). Ideally I think you’d want to distribute it in proportion to how much value is created downstream (an IP VAT?) but not sure if that’s feasible
Obviously prone to fraud and abuse but that’s just a bigger version of what Spotify encounters
I think the writing analogy of Spotify is (was?) Medium. And these seem to have similar failure modes due to the fact that they only reward based on "amount of content consumed", which tends to lead to a lot of "slop". (Medium has fallen. Spotify is mostly OK, though a lot of people complain about AI-generated music, etc.)
If you really could measure "downstream value created", you'd be on to something, though!
Not sure if this is relevant but I've long thought that the compulsory licensing system in US music law is awesome and should be expanded to other domains. Not that I've thought about the details, it just seems like generally a great idea.
Sigh. An old dilemma-Charles Dickens had to fight copyright proplems in the US, and Mark Twain had to fight copywrite problems in the UK. I write legal documents (labels) in the US and review similar docs for a certification agency. I frequently find that my words have been copied by other writers. (I know this because I specifically write certain elements to be recognizable to myself as my writing. These elements might be recognized by other people also if anybody ever read what I write, but nobody does. This is just my internal way to prove to myself that I do exist) I haven't worried about AI-yet-because I have to generate new text, along with legal boilerplate-type stuff from trade secret chemical formulations. Those chemistries are not something that AI has access to at the moment. But once I've sent my words out into the public domain, anyone can copy them. A copywrite is simply a thing that gives you the right to go to court to demand legal protection-you must still have the money to defend your creation.
I think that in theory those expressions might qualify as "creative expression"? And in theory copyright violation is a crime (at least in some jurisdictions). But in practice, it's virtually never pursued criminally except in the most egregious cases and in practice good luck convincing anyone with those phrases.
I was very surprised to learn that "trap roads" don't actually work. Courts accept it as proof that something was copied, but (at least for maps) they don't care because fake facts are still "facts".
yep-it always comes back to integrity and ethics. Even the law doesn't work without that as we are finding out every day lately.
Would it be that hard with current AI search algorithms, like Google's AI answer at the top of searches, to include links to the sources for that answer?
Google no longer creates an AI summary for the Chess question, but there is a link button under the answer in the screenshots you shared. Did this link to your post on Chess?
Personally, I don't trust those AI answers at all, since they're often wrong, or missing important context, but I will often click on the link that the answer is pulling from, to quickly judge whether the answer is trustworthy or not. If Google answers this AI question by linking to your post, I assume you'd get a lot of attention from people like me who either want to see the source, or want a deeper dive to the question. Anyone asking "can llms play chess" is probably already the sort of person likely to click that little link button and thus be exposed to your content.
Alternatively, I'm odd (true) and most people don't even know that the little link icon is a button you can press. Or it links to something else that isn't you? (why would it do this?) I think depending on whether the AI summary links to your post or not will completely change my opinion on whether the current state of things is a major problem, or completely and actually helpful to creators.
I know normal LLMs not connected to the internet don't cite sources, but that really doesn't matter to me with current capabilities, and if there was an LLM amazing enough to reproduce well-researched articles in entirety, with no hallucinations, from "memory" (that is, not searching for an answer but producing it without connecting to the internet), then I imagine writers are on their way out anyways.
This was a while ago, but I'm ~75% confident I checked the link button and it didn't go to me. I don't think this was malicious or anything, just a quirk of how they were (I assume) doing RAG.
It's odd that they only had the one link at the top. Usually they seem to provide links for each individual claim, which I agree seems far less rude (as judged by my probably-obsolete social norms that were developed for humans).
> if there was an LLM amazing enough to reproduce well-researched articles in entirety, with no hallucinations, from "memory" (that is, not searching for an answer but producing it without connecting to the internet), then I imagine writers are on their way out anyways.
Yeah, in this scenario, arguably "writers" wouldn't be providing much value and one could argue they should be on their way out. What I worry about in this scenario (aside from the fact that I like writing...) is that by "unbundling" fact creation from writing, this destroys some of the incentive that previously existed to create the underlying facts in the first place.
One way or another, though, I strongly expect AI will become "the interface". The benefits are just too large!
Long comment, that perhaps doesn't make much sense, but you did end with "New ideas needed" after all!
With a new "interface" my brain runs to those late 90s-early 00s startups like Google, Yahoo, or LinkExchange (https://en.wikipedia.org/wiki/LinkExchange) that allowed people to monetize webpages that previously were just providing informational content for free. Back then, search engines (and I guess homepages) were the "new interface" and a few savvy people were able to make it seriously big.
Maybe there's an opportunity for a company to write a robots.txt bit of code that requires a payment made by the AI company making the query, which it can't do without specific permission? Permission is granted by sharing the number of queries an AI makes for what links (maybe plus user-feedback judging the usefulness of this or that link? Maybe this would be unnecessary as I assume AI companies are already interested in outputting the most useful RAG results) These payments would be collected by the company (Call it "DisallowAI.com" (available for $0.01, disallow.ai is owned, but unused), and distributed to its users based on usage, minus a fee.
With enough prevalence, the most useful sites can't be used (and linked to) for RAG, which would be a strong incentive for AI companies to sign onto the system, or be outcompeted by a fundamentally superior RAG AI that is (maybe even one developed by disallowai itself?). That is, unless OpenAI, Copilot, or whatever, pays a tenth of a cent (or whatever) to DisallowAI with the data on number of queries, which then is distributed to creators proportional to usage.
Or maybe it's too difficult to make OpenAI pay disallowai a small fee to link to work protected by their code. Even then, perhaps the most useful new interface would be the one that has permission to link to, and use, the most useful information available via RAG. Then monetization could just copy the subscription model, or an Ad-Based monetization scheme like our current interfaces.
But what's to stop OpenAI (or anyone else) from just doing RAG in spite of this code, which apparently already happens? Conveniently, there would now be a centralized company, with a lot of money, that has the financial incentive to ensure its users work isn't being taken without permission. Maybe they can claim at first that they didn't know, but that's hard to claim after the first demand letter.
I suppose the New York Times lawsuit would make or break this idea. Even if it fails though, perhaps the law would recognize the difference between an LLM using an article as training data, and repeating that information, and using an article directly linked + a summary. Cool idea. Not sure if it's technically possible, practicable, or someone has already implemented it.
Basically the RIAA or MAFIAA for websites.
Yes, exactly. I find it very fitting it’s called MAFIAA, and I imagine that has to be intentional.
I like the way you're thinking! I feel like something along these lines should in principle be possible.
> But what's to stop OpenAI (or anyone else) from just doing RAG in spite of this code, which apparently already happens?
I think trade secrets law might be the key here! In principle, a company could create a "wall" where in order to access the information, everyone essentially has to kind of sign an NDA. If that would qualify as a "reasonable effort" to keep the information secret, then none of copyright or patents or whatever matter. Just trying to access the information would be industrial espionage. Even *planning* to access the information plus taking any step would be "conspiracy" to commit industrial espionage.
I have no idea if any of this would actually hold up in court, though. And there's a serious problem which is that if you ever voluntarily share the information to any party who hasn't signed the NDA, then it's now a regular fact that can be shared freely. So many this is one of those libertarian fantasies about how all you need is contract law and you can create your own de-novo legal system...
I think Tyler believes he will be posthumously reified by the AI, so long as he keeps feeding it.
I just love how well you are able to make such a boring topic so interesting! this is a great read. You remind me of the Youtuber and author Exurb1a.
Copyright law, boring? Outrage!
I liked this post. Thank you for leavening the big ideas with humor.
To your point, I wonder how all this will collide with a different, related, projected outcome of the LLM age. I mean how the internet/discourse will be flooded with AI spam and lies — generated photos, videos, voice, and text. Given all that, people would presumably retreat to large trusted brands — media companies that would pinky swear that they checked their facts, did their research, and wouldn’t get bamboozled by AI-generated video “evidence”. In other words, if Wikipedia gets overrun, people will retreat to Brittanica.
And, in THAT world, maybe the LLM Wild West will be more manageable specifically because we’ve moved back to gatekeepers.
That's an interesting point. I feel like we already see a prelude of that in that the regular web is already flooded with SEO generated garbage "content". Probably now much of this is AI generated, but it happened even pre-LLMs, and as far as I can tell, the effects are much as you predict—people basically don't trust any of the sites that some up in a search engine unless they're some kind of "trusted brand".
A related issue is that it would be great if there was some kind of "proof of work". People sometimes accuse me of using LLMs to write this blog. I think most people would trust me, since it existed pre-LLMs. But in general, I think a lot of people will want some guarantee that text comes from an actual human that's willing to put their reputation on the line. (Bad for pseudonymous bloggers...)
Yeah, I think that’s right. There’s that standard for verifying cryptographically that a photo was taken by a real camera. Feels harder for text, but I’m sure interesting identity technologies are on the horizon. Will they matter more than just bog-standard reputation? Probably not.
In a world where reputation matters more, I think we’re going to see more honor culture (in all its forms). I wonder what duels will look like
Great read! This has increased the chance that I will provide you with resources in times of need or have sex with you. Hopefully that’s enough incentive to encourage future articles.
Yay, glowing pixels! My favorite!
Mmmm StarCraft
The fact that Starcraft is so popular despite being so brutal and unforgiving honestly gives me some weird faith in humanity
I'll defend it to the death as my favorite game and the best game of all time (even though I work professionally on a different video game), but when I think about sitting down to play, I'm usually too tired.
Great game to watch though. Next season of ASL has a Zerg in every group!
Wonder what it tells us about Korean culture that it was such a good fit. And how much of their fertility crisis can be laid at the feet of blizzard 💀