10 Comments

My starting assumption is that preventing people from building any harmful AIs is basically impossible in the long run, and that attempting to do so is counterproductive because it will disproportionately affect ethical AI creators rather than the underground, unethical ones that will try to evade restrictions, causing adverse selection.

So, other initiatives to prevent the possible bad outcomes from AI, rather than preventing bad AI itself, seem quite important even if technically orthogonal. I'd like to see a lot more work in the world going to tracking and mitigating new pathogens, keeping tabs on radioactive materials, and all the other stuff that will make society more resilient to bad actors.

Expand full comment

Maybe we must first locate the thing [gene[s] ] that make about 50ish% of humans bad [no empathy, incapable of acquiring critical thinking skills vs magical thinking and so on. Must first develop a list of qualities; can AI do this?] and then eliminate that before making a new god. The current gods are insufficient to the task.

Expand full comment

BTW "AI guardian" is possibly literally Davidad's plan https://www.lesswrong.com/posts/KX3xx8LTnE7GKoFuj/boundaries-for-formalizing-a-bare-bones-morality

Expand full comment
author

Good catch! Certainly, Davidad seems to follow some similar-ish logic and reach a similar-ish conclusion. Although some of this stuff (https://www.lesswrong.com/posts/pKSmEkSQJsCSTK6nH/an-open-agency-architecture-for-safe-transformative-ai#Fine_grained_decomposition) I must admit I'm not well-read enough on AI safety to full understand.

Expand full comment
author

This is one of those cases where I was reeaaalllly hoping someone would come along with some devastating logic and tell me I was all completely wrong. I guess I agree with the conclusion, but holy hell—building a godlike AI and trusting it to protect us is not... reassuring.

Expand full comment

Fortunately I (and Davidad) think the things we want are objectively definable. That would solve outer alignment. (Not inner alignment tho ¯\_(ツ)_/¯ )

Expand full comment

I think this problem is what is described here (https://arbital.com/p/pivotal/) as the need for a "pivotal act":

--------------------------------

"Potential pivotal acts:

* human intelligence enhancement powerful enough that the best enhanced humans are qualitatively and significantly smarter than the smartest non-enhanced humans

* a limited Task AGI that can:

- upload humans and run them at speeds more comparable to those of an AI

- prevent the origin of all hostile superintelligences (in the nice case, only temporarily and via strategies that cause only acceptable amounts of collateral damage)

- design or deploy nanotechnology such that there exists a direct route to the operators being able to do one of the other items on this list (human intelligence enhancement, prevent emergence of hostile SIs, etc.)

--------------------------------

So I pretty much agree with everything you wrote here--aligning one AGI is hard enough, but you then pretty much have to use that AGI to (pivotally) obstruct others from being created, or else the very next one might ruin everything. There could be game-theory scenarios that break this theory, but it certainly seems reasonable!

Expand full comment

One way is to be creating tools that become the standard for building AIs. These tools would have security measures built in. This would limit accidental bad AIs significantly.

The other way is to reach AI advantage by computational superiority. Unlike home PCs the clusters (and in the far future quantum computers) do require a center of operations which, if deemed to be used for wrongdoing, can be easily taken out of order with a conventional weapon. Who is going to punish this missile strike? Only states with nuclear weaponry which is about where we are at now.

I am not so worried about an AI being too smart to destroy countries let alone civilizations but rather people being too dumb and losing information warfare.

After all the most convenient war is the one where the oponents give themselves up.

Expand full comment

I think option 1 is the unwritten, unstated, intrusive-thought-like, cognitive-dissonance-causing, anti-memetic plan of every major AI lab. And their AI will enthusiastically execute that plan anyway, so they may as well try to get it to do that at least somewhat in their preferred way.

Expand full comment