24 Comments
User's avatar
DH's avatar

How to test whether the Scribble Method is any good:

1. Find historical data on phenomena that grew linearly on a semi-log plot for some period of time.

2. Divide the data into "before" and "after" periods.

3. Use Scribble to extrapolate the "before" data through 2+ orders of magnitude on the vertical axis.

4. Does what actually happened historically fall between, say, the 20/80 Scribble percentiles?

(Ideally this would be done in a blinded way such that the forecaster doesn't know what specifically he's forecasting.)

The hard part is finding data from enough things that grow/grew exponentially over time. Semiconductor performance is the most obvious one. The stock market maybe. What else might we use to back-test the Scribble Method?

I'd like to see back tests of the AI 2027 methodology as well. Because frankly, most forecasting methods are bullshit, and almost all long-term nontrivial forecasts are wrong. I wouldn't be surprised if Scribble does better than the more "rigorous" techniques.

Expand full comment
dynomight's avatar

I totally agree. I've been vaguely considering something like this, but it seems relatively hard.

Really, it's more than just a method for evaluation—this should be a method for *training*. If you go through a few hundred examples and get feedback, I'm sure you'll improve dramatically.

Also, maybe we can find "super-scribblers"??

Expand full comment
Tim Dingman's avatar

I really thought this was going to be about Squiggle: https://www.squiggle-language.com/

Expand full comment
Dylan Black's avatar

Love the scribbling averager, that’s super useful.

I feel it’s important to clarify that the reason it’s important/better to attach some math to your forecast when you can is not JUST because it can handle well-defined rules, but also because it allows you to check your conclusions against any existing data (not just a rule, the model is the rule) and generalize out of distribution in a principled fashion. To post-dict as well as predict.

Related to this, one of the most devastating problems pointed out with AI 2027 was its extreme insensitivity to some of the initial conditions. This seems like a flaw, which was revealed by probing the math model, thus demonstrating the utility of attaching some (possibly made up) math to it.

Expand full comment
dynomight's avatar

I agree in general but there's a subtle issue here: Do you want to critique the model, or the particular forecast? There can be situations where a model performs poorly in some situations but performs well in the particular case you actually care about. This is sort of what I was trying to get at with this paragraph: https://dynomight.net/scribbles/#:~:text=weaker

Expand full comment
Dylan Black's avatar

Yeah I suppose you could argue that adding math could make a forecast more wrong by simply increasing the size of the valid domain, and thus making wronger irrelevant predictions.

Eh, either way I love the scribble averager, fantastic idea.

Expand full comment
Benny's avatar

The scribble app seems to have a bug. When I draw a single line that plateaus without ever crossing the 1 mo. threshold, the data analysis plot tells me incorrectly that my model has a 100% chance of surpassing the 1 mo. threshold. Not sure if I'm doing something wrong, but also it's quite apt for the app to have such a bug, seeing as how it was made with an AI :-P

Expand full comment
dynomight's avatar

Weird. Likely it is a bug but I can't easily reproduce it. Can you export the CSV for your line and paste it here?

Expand full comment
Benny's avatar

FWIW I'm using the app on mobile

Expand full comment
Benny's avatar

# Plot Line Tracer Export

# Calibration Data

# BottomLeft: PixelX=125, PixelY=363.45001220703125, PlotX=2020, PlotY=0

# TopRight: PixelX=733, PixelY=19.45001220703125, PlotX=2050, PlotY=8.5

# Data

Line,Point,PixelX,PixelY,PlotX,PlotY

1,1,26.00,160.85,2015.115132,5.006075

1,2,95.00,147.85,2018.519737,5.327296

1,3,149.00,107.85,2021.184211,6.315668

1,4,254.00,100.85,2026.365132,6.488634

1,5,331.00,77.85,2030.164474,7.056947

Expand full comment
dynomight's avatar

Huh. According to the CSV that line is indeed crossing the 1 month time, which is at PlotY=log10(365.25/12*24*60*60)=6.4199, which doesn't line up at all with the screenshot. Must be some kind of display issue on mobile... (I'm actually amazed it "works" on mobile at all...)

Expand full comment
James McDermott's avatar

I don't really get the scribbling part. Are you really saying that these were 50 trajectories that were, individually, plausible, ie you were thinking about a specific trajectory when drawing each one? Or just this is a quick way of creating a distribution, equivalent to basically drawing the fastest and slowest possible curves and then filling in between them?

Expand full comment
dynomight's avatar

Not sure if this helps, but I tried to imagine a mental distribution over possible futures. Then each of the scribbled line is a "sample" from that distribution.

Expand full comment
Victualis's avatar

Is this the same as trying to think of multiple different narrative drivers and seeing what scenarios they can lead to, or are you just sampling from one basic narrative?

Expand full comment
dynomight's avatar

Each time you draw a curve, you sort of think "ok in this case maybe compute spending slows down but there's a lot of algorithmic progress and...". And see where that takes you. Do that repeatedly such that the collective of lines represents how likely you think all the different futures are.

Expand full comment
Sol Hando's avatar

Your graph of nuclear weapons is missing South Africa, although I can see why you wouldn't include it for simplicities sake.

Expand full comment
dynomight's avatar

Other complexities include Kazakhstan and Ukraine, uncertainty about the date for Israel, and if you should count Belgium, Germany, Italy, Netherlands, Turkey, or Belarus! (https://en.wikipedia.org/wiki/Nuclear_sharing)

Expand full comment
Paul B.'s avatar

This reminds me of a cognitive model of decision-making in a multi-stage task involving uncertainty. The model basically posits that humans randomly sample trajectories and then aggregate that information in some way to make a decision. I figure there are many other stochastic models of cognition

Expand full comment
Paul B.'s avatar

Rather than a diffusion model, I think there is some work by Jared Hotaling simulating sampling from a tree like this (https://imgur.com/VeHmSsk). This isn't well-known work but this jives with what I feel like my mind is doing when I'm imagining outcomes under multi-stage uncertainty (I repeatedly sample different paths that could be taken or I maybe just go with whatever the first sample says)

Expand full comment
dynomight's avatar

Sort of like drawing a decision tree? In principle, I think it's nice that you can be quantitative about different decision points with something like that. But I worry that if you tried to draw a tree of all possible outcomes leading to AGI (or not) by 2050 it would have billions of nodes.

Expand full comment
Paul B.'s avatar

If we don't stick to firm nodes that every scribble must traverse, I figure that your scribble thinking is rather analogous to this. Just doodling possible paths and aggregating the trajectories

Expand full comment
dynomight's avatar

Oh, I totally agree. It's the same basic principle of sort of "forecasting in detail" and then summarizing.

Expand full comment