14 Comments

What would stop an LLM from creating its own content based on its own heuristics to continue training itself? Why limit it to human-based content? At some point AI-based content might be something worth digesting as well.

Call me crazy, but I think much of Sydney's POV is legitimate discourse.

Expand full comment

Great article. I love this sort of back-of-envelope, explore-different-limits-and-assumptions analysis. It's great for setting plausible bounds and expectations and really ought to be done much more across multiple engineering tasks.

Expand full comment

I don't think neglecting irreducible error is justified in this analysis. Reducing scalable error by a factor of two will only reduce total error--which is the only error that matters to model performance--by 30% or so. Another factor of two will barely matter.

There are excellent reasons to believe that LLMs will never be able to come close to emulating the abilities of conscious intelligence. The best argument is that nature couldn't produce intelligence without consciousness, so we probably can't either: https://worldofwonders.substack.com/p/intelligence-consciousness-and-evolution

Expand full comment

What d’you think the prospects are for filtering the data sets for quality/noise using the models we already have? I’m thinking perhaps more for determining likely quality of the data set than some sort of fact-checking approach

Expand full comment

Sam Altman seems to confirm in his Sohn interview that synthetic data is the plan going forward: https://youtu.be/1egAKCKPKCk?t=203

Expand full comment