445. Moral Error and Moral Disagreement

445. Moral Error and Moral Disagreement

Yudkowsky begins the post with a quote from Richard Chapell:

“When Bob says “Abortion is wrong”, and Sally says, “No it isn’t”, they are disagreeing with each other.

I don’t see how Eliezer can accommodate this. On his account, what Bob asserted is true iff abortion is prohibited by the morality_Bob norms. How can Sally disagree? There’s no disputing (we may suppose) that abortion is indeed prohibited by morality_Bob…

Since there is moral disagreement, whatever Eliezer purports to be analysing here, it is not morality.”

But things like moral disagreement, moral error, and moral progress are the primary motivations for Yudkowsky’s work on FAI since without moral disagreement there wouldn’t be much need for FAI.

Richard claims, “There’s no disputing (we may suppose) that abortion is indeed prohibited by morality_Bob.”

We may not suppose, and there is disputing.  Bob does not have direct, unmediated, veridical access to the output of his own morality.

I tried to describe morality as a “computation“.  In retrospect, I don’t think this is functioning as the Word of Power that I thought I was emitting.

Let us read, for “computation”, “idealized abstract dynamic”—maybe that will be a more comfortable label to apply to morality.

We don’t have complete introspective access to this “idealized abstract dynamic” just like we don’t have complete access to our other desires and motives.

…I conceive of morality_Bob as something that unfolds into Bob’s morality—like the way one can describe in 6 states and 2 symbols a Turing machine that will write 4.640 × 1014391s to its tape before halting.

So morality_Bob refers to a compact folded specification, and not a flat list of outputs.

So, how can Bob be wrong about the output of his own morality?

First there are errors about instrumental goals caused by empirical mistakes (which also can lead to erroneous generalizations of terminal values): 

Bob could be empirically mistaken about the state of fetuses, perhaps believing fetuses to be aware of the outside world.  (Correcting this might change Bob’s instrumental values but not terminal values.)

Bob could have formed his beliefs about what constituted “personhood” in the presence of confusion about the nature of consciousness, so that if Bob were fully informed about consciousness, Bob would not have been tempted to talk about “the beginning of life” or “the human kind” in order to define personhood.

And there are meta-ethical mistakes: If Bob believes that the word of Allah determines moral truth he’ll have lots of moral beliefs he would change if he recognized the falsity of the Koran.

But maybe Bob and Sally have indeed different moral algorithms. In this case they wouldn’t really have a disagreement. (Yukdowsky postulates that “disagreement has two prerequisites: the possibility of agreement and the possibility of error.” I don’t think this definition is useful but let’s not fight over words.)

You cannot have a disagreement about which algorithm should direct your actions, without first having the same meaning of should—and no matter how you try to phrase this in terms of “what ought to direct your actions” or “right actions” or “correct heaps of pebbles“, in the end you will be left with the empirical fact that it is possible to construct minds directed by any coherent utility function.

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.

Yudkowsky thinks it’s unlikely that two humans are in fact entirely different optimization processes:

Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths… Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, “fix” them.

According to Yudkowsky it’s unreasonable to say to someone:

“We have nothing to argue about, we are only different optimization processes.”

That should be reserved for paperclip maximizers, not used against humans whose arguments you don’t like.

Well, I think it is very unlikely that there are no stable psychopaths, and it may very well be true that lots of humans are “different optimization processes”.

After all, many commenters don’t agree with Yudkowsky on the “moral unity of mankind” as it were. This comment by Roko starts a great thread in which cousin_it and wedrifid offer further persuasive counterarguments to CEV.


“I would not extrapolate the volitions of people whose volitions I deem to be particularly dangerous, in fact I would probably only extrapolate the volition of a small subset (perhaps 1 thousand – 1 million) people whose outward philosophical stances on life were at least fairly similar to mine.”


Then you are far too confident in your own wisdom. The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.

I’m sure that Archimedes of Syracuse thought that Syracuse had lots of incredibly important philosophical and cultural differences with the Romans who were attacking his city.

Had it fallen to Archimedes to build an AI, he might well have been tempted to believe that the whole fate of humanity would depend on whether the extrapolated volition of Syracuse or of Rome came to rule the world – due to all those incredibly important philosophical differences.

Without looking in Wikipedia, can you remember what any of those philosophical differences were?

And you are separated from Archimedes by nothing more than a handful of centuries.”


“The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.

“Wiser”? What’s that mean?

Your comment makes me think that, as of 12 August 2008, you hadn’t yet completely given up on your dream of finding a One True Eternal Morality separate from the computation going on in our heads. Have you changed your opinion in the last two years?”


“I like what Roko has to say here and find myself wary of Eliezer’s reply. He may have just been signalling naivety and an irrational level of egalitarianism so people are more likely to ‘let him out of the box’. Even so, taking this and the other statements EY has made on FAI behaviours (yes, those that he would unilaterally label friendly) scares me.”


“The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes? How do we determine, in general, which things a document like CEV must spell out and which things can/should be left to the mysterious magic of “intelligence”?”


“This has been my thought exactly. Barring all but the most explicit convolution any given person would prefer their own personal volition to be extrapolated. If by happenstance I should be altruistically and perfectly infatuated by, say Sally, then that’s the FAI’s problem. It will turn out that extrapolating my volition will then entail extrapolating Sally’s volition. The same applies to caring about ‘humanity’, whatever that fuzzy concept means when taken in the context of unbounded future potential.

I am also not sure how to handle those who profess an ultimate preference for a possible AI that extrapolates other than their own volition. I mean, clearly they are either lying, crazy or naive. It seems safer to trust someone who says “I would ultimately prefer FAIbut I am creating FAIfor the purpose of effective cooperation.”

Similarly, if someone wanted to credibly signal altruism to me it would be better to try to convince me that CEV<someone> has a lot of similarities with CEV<benefactor> that arise due to altruistic desires rather than saying that they truly sincerely prefer CEV<someone, benefactor>. Because the later is clearly bullshit of some sort.

How do we determine, in general, which things a document like CEV must spell out, and which things can/should be left to the mysterious magic of “intelligence”?

I have no idea, I’m afraid.”

Sigh. This is all very frustrating.


This entry was posted in CEV, Lesswrong Zusammenfassungen, meta-ethics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s