442. Morality as Fixed Computation

442. Morality as Fixed Computation

Fortunately I’m not the only one who is somewhat confused by Yudkowsky’s meta-ethics:

Lukeprog writes on his blog:

His 442nd post is Morality as Fixed Computation, an attempt to “boil down” his theory of morality and be clear about it, but I remain as baffled as ever.

And Yudkowsky begins this post with:

Toby Ord commented:

Eliezer,  I’ve just reread your article and was wondering if this is a good quick summary of your position (leaving apart how you got to it):

‘I should X’ means that I would attempt to X were I fully informed.

Toby’s a pro, so if he didn’t get it, I’d better try again.

Maybe we can do better.

Suppose we program a superintelligent AI with the only goal of maximizing our preferences. But the AI would just alter our preferences so that e.g. we are blissed out by iron atoms because such preferences are easy to fulfill. But according to Yudkowsky such a future would be horrible:

We, ourselves, do not imagine the future and judge, that any future in which our brains want something, and that thing exists, is a good future.  If we did think this way, we would say: “Yay!  Go ahead and modify us to strongly want something cheap!”  But we do not say this, which means that this AI design is fundamentally flawed: it will choose things very unlike what we would choose; it will judge desirability very differently from how we judge it.

I agree that such a future would be suboptimal, but I don’t fully understand why he agrees with me. After all he claims (this time explicitly in the comment-section) that the only justification for our moral norms are our moral norms. And the only causal explanations for why we have our current moral norms are evolutionary psychology and memetic history.

Yudkowsky would of course reply that this kind of meta-circularity is not problematic, and is just like the recursive justification we employ when we’re trying to justify using our brain, Occam’s Razor or math.

Maaaaybe…. But the whole thing still looks fishy.

Yudkowsky then uses the analogy of two different calculators:

A calculator that, when you press ‘2’, ‘+’, and ‘3’, tries to compute:
“What is 2 + 3?”

A calculator that, when you press ‘2’, ‘+’, and ‘3’, tries to compute:
“What does this calculator output when you press ‘2’, ‘+’, and ‘3’?”

The Type 1 calculator, as it were, wants to output 5.

The Type 2 “calculator” could return any result; and in the act of returning that result, it becomes the correct answer to the question that was internally asked.

We are more like the Type 1 calculator. We want to do what is “right”, not whatever satisfies our preferences (or so we believe). And if somebody changed our preferences, the fulfillment of our desires wouldn’t be right anymore.

Like the calculator we want to know the answer to a fixed question, namely “what is right?”. And since we don’t have full access to our neurological makeup and our introspection through other means is also limited we can’t just print out the question. So we hope to build a superintelligent agent who can help us with “extracting” the full question in detail.

Anyway, the key point of this post is the idea that,

… what we name by ‘right’ is a fixed question, or perhaps a fixed framework. We can encounter moral arguments that modify our terminal values, and even encounter moral arguments that modify what we count as a moral argument; nonetheless, it all grows out of a particular starting point.  We do not experience ourselves as embodying the question “What will I decide to do?” which would be a Type 2 calculator; anything we decided would thereby become right.  We experience ourselves as asking the embodied question:  “What will save my friends, and my people, from getting hurt?  How can we all have more fun?  …” where the “…” is around a thousand other things.

So ‘I should X’ does not mean that I would attempt to X were I fully informed.

‘I should X’ means that X answers the question, “What will save my people?  How can we all have more fun? How can we get more control over our own lives?  What’s the funniest jokes we can tell?  …”

And I may not know what this question is, actually; I may not be able to print out my current guess nor my surrounding framework; but I know, as all non-moral-relativists instinctively know, that the question surely is not just “How can I do whatever I want?”

Well, I wouldn’t describe Yudkowsky’s metaethics as explicitly ‘non-relativistic’ albeit there are certainly “more” relativistic metaethics out there.

Anyway, here are some hopefully illuminating comments:

Toby Ord:

“In one way, I’m glad that you didn’t like my attempted summary as I think the position therein is false, but it does mean that we should keep looking for a neat summary. You currently have:

‘I should X’ means that X answers the question, “What will save my people? How can we all have more fun? How can we get more control over our own lives? What’s the funniest jokes we can tell? …”

But I’m not clear where the particular question is supposed to come from. I understand that you are trying to make it a fixed question in order to avoid deliberate preference change or self-fulling questions. So lets say that for each person P, there is a specific question Q_P such that:

For a person P, ‘I should X’, means that X answers the question Q_P.

Now how is Q_P generated? Is it what P would want were she given access to all the best empirical and moral arguments (what I called being fully informed)? If so, do we have to time index the judgment as well? i.e. if P’s preferences change at some late time T1, then did the person mean something different by ‘I should X’ before and after T1 , or was the person just incorrect at one of those times? What if the change is just through acquiring better information (empirical or moral)?”

Eliezer Yudkowsky replies:

“So lets say that for each person P, there is a specific question Q_P such that:

For a person P, ‘I should X’, means that X answers the question Q_P.

Now how is Q_P generated?

Generated? By that do you mean, causally generated? Q_P is causally generated by evolutionary psychology and memetic history.

Do you mean how would a correctly structured FAI obtain an internal copy of Q_P? By looking/guessing at person P’s empirical brain state.

Do you mean how is Q_P justified? Any particular guess by P at “What is good?” will be justified by appeals to Q_P; if they somehow obtained an exact representation of Q_P then its pieces might or might not all look individually attractive.

These are all distinct concepts!

Is it what P would want were she given access to all the best empirical and moral arguments (what I called being fully informed)? If so, do we have to time index the judgment as well? i.e. if P’s preferences change at some late time T1, then did the person mean something different by ‘I should X’ before and after T1 , or was the person just incorrect at one of those times? What if the change is just through acquiring better information (empirical or moral)?

(Items marked in bold have to be morally evaluated.)

I do believe in moral progress, both as a personal goal and as a concept worth saving; but if you want to talk about moral progress in an ideal sense rather than a historical sense, you have to construe a means of extrapolating it – since it is not guaranteed that our change under moral arguments resolves to a unique value system or even a unique transpersonal value system.

So I regard Q_P as an initial state that includes the specification of how it changes; if you construe a volition therefrom, I would call that EV_Q_P.

If you ask where EV_Q_P comes from causally, it is ev-psych plus memetic history plus your own construal of a specific extrapolation of reactivity to moral arguments.

If you ask how an FAI learns EV_Q_P it is by looking at the person, from within a framework of extrapolation that you (or rather I) defined.

If you ask how one would justify EV_Q_P, it is, like all good things, justified by appeal to Q_P.

If P’s preferences change according to something that was in Q_P or EV_Q_P then they have changed in a good way, committed an act of moral progress, and hence – more or less by definition – stayed within the same “frame of moral reference”, which is how I would refer to what the ancient Greeks and us have in common but a paperclip maximizer does not.

Should P’s preferences change due to some force that was / would-be unwanted, like an Unfriendly AI reprogramming their brain, then as a moral judgment, I should say that they have been harmed, that their moral frame of reference has changed, and that their actions are now being directed by something other than “should”.”

Okay, let me get this straight:

1. Moral norms are justified by this fixed computation in our head which in turn is justified by itself. You can’t justify this fixed computation by something else. This sounds stupid, but really isn’t because the recursive justification of Occam’s Razor makes sense, too. Okay, let’s be generous and agree with Yudkowsky on that.

2. Changes in moral norms that are caused through hearing new arguments, acquiring new information are good because our current fixed computation sees these type of changes as good. But the very nature of this fixed computation itself was changed many times through things that it would deem pretty bad. So the fixed computation looks at its origins and history and is appalled, but hey, we should trust it nonetheless because, well, it’s all we have.

Yeah, that makes sense.

Robin Hanson:

“With Toby, I’m still not clear on what is being suggested. Apparently you approve of some processes that would change your moral beliefs and disapprove of others, but you aren’t willing to describe your approval in terms of how close your beliefs would get to some ideal counterfactual such as “having heard and understood all relevant arguments.” So you need some other way to differentiate approved vs. disapproved influences.”

Eliezer Yudkowsky:

“Apparently you approve of some processes that would change your moral beliefs and disapprove of others,

Well, yes. For example, learning a new fact is approved. Administering to me a drug is unapproved. Would you disagree with these moral judgments?

you aren’t willing to describe your approval in terms of how close your beliefs would get to some ideal counterfactual such as “having heard and understood all relevant arguments”

Oh, I’d be perfectly willing to describe it in those terms, if I thought I could get away with it. But you can’t get away with that in FAI work.

Words like “relevant” assume precisely that distinction between approved and unapproved.

Humans don’t start out all that tremendously coherent, so the “ideal counterfactual” cannot just be assumed into existence – it’s at least possible that different orders in which we “hear and understand” things would send us into distinct attractors.

You would have to construe some specific counterfactual, and that choice itself would be morally challengeable; it would be a guess, part of your Q_P. It’s not like you can call upon an ideal to write code; let alone, write the code that defines itself.

For EV_Q_P to be defined coherently, it has to be bootstrapped out of Q_P with a well-defined order of operations in which no function is called before it has been defined. You can’t say that EV_Q_P is whatever EV_Q_P says it should be. That either doesn’t halt, or outputs anything.

When you use a word like ideal in “ideal counterfactual”, how to construe that counterfactual is itself a moral judgment. If that counterfactual happens to define “idealness”, you need some non-ideal definition of it to start with, or the recursion has no foundation.

Interesting, so a recursion needs a foundation. But what’s the foundation of our morality itself? And please don’t say “morality itself”.

 

Advertisements
This entry was posted in CEV, ethics, Fundamentals, Lesswrong Zusammenfassungen, meta-ethics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s