toto A tutorial on additive utility functions

1. Introduction

The aim of decision theory is to help decision makers (DM) who face very complex problems choosing between the different possible alternatives, taking into account the consequences of each decision and the DM's preferences (see the definition of the glossary of Cybernetics).

In most practical situations, the difficulty in the act of taking a decision results mainly from two problems:

  1. First, when the DM takes her decision, some data are still uncertain. For instance, in Lauritzen and Spiegelhalter (88), a doctor must determine whether her patient has a dyspnea, a breath illness, so as to give him/her the right medication. No medical device appropriate to diagnose this illness is available to the doctor. So her diagnostic can only be based on the dyspnea's symptoms. Unfortunately, these are quite similar to those of tuberculosis, lung cancer and bronchitis. A good knowledge of the patient's activities can help the doctor in her diagnostic. Indeed, journeys in Asia statistically increase the risk of tuberculosis, smoking is known to increase lung cancer risks, and so on. The doctor will have certain pieces of information available, but there will still remain some uncertainty when she diagnoses the illness.

    It may also happen that available data are incomplete. See for instance the ninth chapter of Pearl (88).

  2. The second problem lies in the number and the complexity of the parameters (also referred to as attributes or criteria) that the DM takes into account to reach her decision. For example, in Keeney and Raiffa (93), secretary Bracamontes, of the public works ministry, must advise president Echeverria on the possible construction of a new airport for Mexico city, and especially on the best location to build it. His advice must take into account many parameters, including noise pollution, the comfort level of the neighbor populations, the evolution of air traffic, new runway construction methods, security, and so on. Here, the complexity results not only from the high number of criteria, but also from their interaction. For instance, it is rather clear that the comfort level of the neighbor populations is intimately related to the level of noise pollution.

    Note that the number of criteria to be taken into account can be awfully high. For instance, in Andersen, Andreassen, and Woldbye (86), there are up to 25 relevant criteria which, of course, interact with each other. This gives some insight of how complex practical situations can be.

We can see intuitively that a computer program that can help decision makers taking their decisions must rely on two essential components:

  1. a good modelisation of the DM's preferences (in order to take decisions that comply with the DM's desiderata);

  2. a good management of risks and uncertainties.

Now, what is meant by "a good modelisation of the DM's preferences"? When taking her decision, the DM has to choose between a certain number of alternatives. Each alternative will have different consequences (also referred to as outcomes). And it is obvious that, for the DM, some consequences will be more appealing than others. The DM can thus "order" the consequences according to their being more or less appealing. Modeling the DM's preferences simply amounts to finding a mathematical or computer model that represents this ordering.

For "small" problems, the ordering may be represented by a two-dimensional array: rows and columns would correspond to the possible consequences, and to each cell of the array would be assigned one of the three following values: "-1" if the outcome corresponding to the row is preferred to that of the column, "0" if the DM is indifferent between the two outcomes, and "1" if the DM prefers the outcome corresponding to the column to that of the row. Once the array is available, extracting the preferences of the decision maker from it is a very easy task. This type of representation, known as pairwise comparison, is quite similar to the arrays of figure 2 in Eckenrode (65).

Of course, when the number of outcomes is high, this modelisation is inefficient because the size of the array is much too big. In such cases, other models have to be used, the most popular of which is utility theory.


tocTable of contents

2. Utility theory

The principle behind utility theory is quite simple: it is to assign to each object on which the DM has preferences a real number, in such a way that the higher the number, the preferred the object. Thus, comparing objects amounts to comparing their associated numbers, which is a trivial task for a computer. The DM expresses her preferences through a set of attributes (or criteria). Each attribute can take a certain number of values (aka levels of satisfaction). For instance, when you wish to buy a car, your comparison of the different cars available on the market will certainly be based on their brand, their price, their color, their level of comfort, and so on. So each car can be represented as a tuple (price,brand,etc). The objects can thus be represented as tuples of levels of satisfaction of attributes, and modeling the DM's preferences over these objects thus amounts to evaluate a multiple argument real valued function. This is precisely what we call a utility function. In mathematical terms:

Definition 1: Utility function.

Let be the set of objects over which the DM has preferences and be the set of real numbers. Let be the DM's preference relation, that is, means that the DM prefers to or is indifferent between and .
is a utility function representing if and only if

for all .

When there are multiple attributes, the sets of levels of satisfaction of which are the 's, i in {1,...,n}, the object set can be represented as . A function is a utility function representing if and only if

for all , .

Utility functions are computationally very attractive because they provide easy and fast ways to extract the DM's preferences. Moreover, unlike the pairwise comparison method, in many practical situations, they do not usually consume huge amounts of memory.


tocTable of contents

3. Decision under certainty, risk or uncertainty

As we saw in section 2.1, the DM's preferences over the set of possible alternatives are related to the consequences induced by these alternatives. As an illustration, Savage (54) gives the following example: your wife is cooking an omelet. She has already broken five eggs in a plate and she asks you to complete the cooking. There remains one unbroken egg and you wonder whether you should put it in the omelet or not. Here, you have three possible alternatives:

  1. you can break the egg in the plate containing the other eggs;

  2. you can break it into another plate and examine it before mixing it with the five other eggs;

  3. you can not use the egg.

How can we find the best suitable decision? Well, simply by examining the consequences of each decision. Thus, if the egg is good the first alternative should be better than the other ones because the omelet will be bigger, but if it is not, by choosing the first alternative we loose the five other eggs. If we choose the second alternative and the egg is good, we stain a dish that we will have to wash, and so on. By closely examining the consequences of each alternative, we should be able to select that which seems to be the most preferable.

As shown in this example, each alternative may have several consequences depending on whether the egg is good or not. In Decision Theory, these uncertain factors (here the state of the egg) are called events and, as in Probability Theory, elementary (or atomic) events are very important. They are called states of nature. To each state of nature (the egg is good or not), the choice of one of the three possible alternatives will induce a unique consequence. Thus alternatives can be represented as sets of pairs (event, consequence), which are called acts. More formally, let be the set of possible alternatives, be the set of all possible consequences, and the set of states of nature. An act is a mapping from to that assigns to each state a consequence in . Thus the constant act corresponding to choosing the first alternative (break the egg in the plate containing the other eggs) is such that (good egg) = "big omelet" and (bad egg) = "lose 5 eggs".

As alternatives can be represented by acts, we can associate to the DM's preference relation over alternatives a preference relation over the set of acts (see Savage (54) or von Neumann & Morgenstern (44) for a detailed discussion on this matter). Let us denote by this preference relation. A utility function representing is thus a mapping from to such that


Of course, preferences over acts reveal both preferences over consequences (the DM will certainly prefer having a big omelet than losing five eggs) and the DM's belief in the chances that each event obtains. Thus if the DM knows that his wife is very careful about the food she keeps in her fridge, the impact of the pair (bad egg, lose 5 eggs) in the evaluation of alternative 1 will be marginal, whereas it will become important if the DM knows that his wife usually does not pay attention to this sort of things. Thus function must take into account the plausibility of realisation of the events. Of course, this can be done only through the knowledge that the DM has of the events and not through their "true" plausibility of realisation because the decisions choosen by the DM are only based on what he/she knows. For instance, consider a decision inducing two different consequences depending on whether head or tail obtains when tossing a fair coin. You do not know that the coin is fair, but I tossed the coin thrice and you saw that two heads and one tail obtained. So your decision will be based on the fact that the chance of obtaining a head seems higher than that of having a tail, although in practice they have the same chance of realisation. Of course, different kinds of knowledge will involve different models for function . The following three are the most important ones in Decision Theory:

  • Decision under certainty: for every state of nature a given act has always the same consequence. For instance, when you decide to choose a menu rather than another one in a restaurant, consequences are known for sure: you know what you will eat (well, at least this should be).

    Let be the preference relations over the set of acts and be that over the set of consequences. Assume that and are represented by and respectively. Let denote the consequence of "act". Then Decision under certainty amounts to:

    for all .

  • Decision under risk: An act can have several consequences, depending on the realisation of an event. Moreover, an "objective" probability distribution over the events is supposed to exist and to be known by the DM. This is the case when you decide whether to gamble or not on a game such as loto: the probability of winning is known. The model used in decision under risk is called the expected utility model and has been axiomatised by von Neumann & Morgenstern (44). As an act can be represented by pairs (event,consequence) and as there exist probabilities on events, acts can be represented by sets of pairs (consequence, probability of the consequence). These sets are usually called lotteries. Thus assume that an act corresponds to the lottery , i.e. this act has consequence with probability , with probability and so on. Then von Neumann-Morgenstern show that function representing preferences over acts is decomposable as:

    ,

    where is a utility function representing the DM's preferences over the outcomes.

  • Decision under uncertainty: this is quite similar to the preceding case except that we do not assume the existence of a probability distribution on the events set but rather this one is derived from a set of axioms (see Savage (54)) that express that fact that the DM has a "rational" behaviour. Here the probability distribution is not "objective" as in von Neumann-Morgenstern's theory but it is subjective, that is it expresses the beliefs of the DM concerning the chances of realisation of the events (instead of the "true" chance that the event will obtain). In this model, as in von Neumann-Morgenstern, the utility of an act is decomposable as:

    ,

    where is a utility function representing the DM's preferences over the outcomes, and is the subjective probability that the DM assigns to event .

In the remaining of this tutorial, we will study only decision under certainty.


tocTable of contents

4. Additive utility theory under certainty

When I said that utility functions do not consume huge amounts of memory, well, shame on me, I lied. In fact, in the general case, they require as much memory as pairwise comparison tables and, moreover, their construction is a nightmare as they require asking many cognitively difficult questions to the DM. Fortunately, in most practical situations, a property called additive separability holds, that enables tremendous memory savings and that requires to ask the DM only "simple" questions to construct the whole utility function. We saw that the outcome set can be an n-dimensional set and thus utility functions are multiple parameters functions. Suppose that there are 5 attributes and that each one has 10 levels of satisfactions, then the outcome set is a 510 = 9765625 element set and, to store the utility function into memory, up to 9765625 bytes (or words) may be needed. If, on the other hand, the utility function can be separated as follows:

,

then storing each requires only 10 bytes, and the whole utility is stored in at most 50 bytes, which seems to be a reasonable amount for modern computers. Such a utility function is called an additive utility function. In mathematical terms:


Definition 2: Additive utility function.
is an additive utility function if and only if it is a utility function and there exist functions such that . In other words,
.

Additive utilities are computationally much more appealing than general utility functions. We have seen that they reduce tremendously the memory storage requirements, but this is not the only advantage: Let us recall that these functions are used to represent the DM's preference relations. When the utility is additively separable, this means that there exists some kind of independence between the attributes, and this independence makes the elicitation of the utility much easier to perform. The following example may help understanding why:

Example 1:

Let be a preference relation over a set . Suppose that is representable by an additive utility function and that . By definition 2, the following equation holds:


which is obviously equivalent to . But then, for all in ,

.


Thus, again by definition 2, . In other words, the existence of an additive utility allowed us to prove that does not depend on the value of . This simple remark simplifies considerably the utility elicitation procedure.

Another advantage of additive utility functions lies in their computation: thanks to the additive separability, they are usually very simple mathematical functions and therefore they can be usually computed very quickly, especially when using parallel algorithms.

Unfortunately, not all utility functions are additively separable, as shown in the following example, taken from Fishburn (70), page 42:

Example 2:

Suppose that you have to choose between chicken and fish for today's and tomorrow's dinners. Then = {chicken today, fish today} x {chicken tomorrow, fish tomorrow}. If you always have to choose between these two meals, one day you will get bored of eating the same meal, and you will have a preference relation much like the following:

(fish today,chicken tomorrow) (chicken today,fish tomorrow) (fish today,fish tomorrow) (chicken today,chicken tomorrow).

In such a case, the attributes are not independent. Indeed, what you prefer eating tomorrow is conditioned by what you eat today. One consequence is that the utility elicitation process requires a huge number of comparisons between elements of .


tocTable of contents

5. Existence of additive utilities

As is shown in the above example, all utilities are not additively separable. So, it would be very interesting to know in which cases preference relations can be represented by additive utility functions and in which ones they cannot.

From a theoretical point of view, necessary and sufficient conditions have been provided that ensure additive representability; see for example Jaffray (74b) for conditions in two-dimensional outcome sets, Jaffray (74a) for conditions on sets of arbitrary finite dimensions, Fishburn (92) for an extension of these results. In their ninth chapter, Krantz, Luce, Suppes and Tversky, A. (1971) also provide interesting necessary and sufficient conditions. However, from a practical point of view, the results mentioned above are very difficult to use because they are highly technical. The alternative, often used in additive conjoint measurement (this is the true name for the theory of additive utilities), is to provide only sufficient -- but easily testable -- conditions of existence.

In most research papers, for mathematical reasons, it is assumed that the set of outcomes is a Cartesian product (of levels of satisfaction of attributes). Note that there exist a few papers that provide testable (sufficient) conditions for the case where is a subset of a Cartesian product. Let us cite for instance Chateauneuf and Wakker (93), Segal (91) and Segal (94).

In the sequel, we will assume that is an n-dimensional Cartesian product, i.e., that


Given a binary relation over , I introduce the indifference relation , i.e., this is the symmetric part of , the strict preference relation , and . Now, let us see some necessary conditions for the existence of additive utilities representing .

tocTable of contents


5.1. Necessary conditions

Suppose that an additive utility exists, representing , i.e., for all and all ,


Then, clearly, the following axiom holds:

Axiom 1: weak ordering.
is a weak order of , i.e., is complete (for all , or ) and transitive (for all , if and , then ).

Now, suppose that and that is representable by an additive utility . Then . But this inequality is obviously equivalent to for all , which, in turn, is equivalent to . Therefore, the following axiom, which is sometimes referred to as coordinate independence (see e.g. Wakker (89), page 30), is necessary for additive representability.

Axiom 2: independence.
For all and all ,
.

Note that the independence axiom enables to define, for every subset of , the weak order on as: for all , if and only if for some . In particular, is the weak order induced by on , i.e., for all , [there exists such that ].

Using an argument similar to that used for showing that axiom 2 is necessary for additive representability, it is easily shown that the axiom below, namely the Thomsen condition, is necessary as well.

The above figure illustrates the Thomsen condition. The curves passing through points A and B, C and D, and E and F are called indifference curves. They represent points in the outcome set that are all indifferent (from the Decision Maker's point of view). The first indifference in the Thomsen condition is simply illustrated by the fact that A and B belong to the same indifference curve. Similarly, the second indifference means that C and D belong to the same indifference curve. Then the Thomsen condition implies that E and F must also belong to the same indifference curve. Note that by permuting the order of variables , the Thomsen condition would also imply that if A and B, and E and F belong to the same indifference curves, then this is also the case for C and D.

Unfortunately, even though the above axioms are necessary, they are not sufficient to ensure additive representability. For instance, if on = {0,1,2,3}2 is represented by , as defined in table 1,

\ 0 1 2 3
0 0 3 6 9
1 6 9 12 18
2 15 21 24 27
3 24 30 33 36

Table 1:


then satisfies weak ordering; independence holds because increases with and ; there exist neither distinct nor distinct such that and , so that the Thomsen condition trivially holds on . However, no additive utility represents else , and , which would imply that , or, equivalently, that . And this is impossible since .

More generally, using arguments similar to that used for independence, it is rather straightforward to show that additive representability requires, for all , the following axiom.


Axiom 4: mth-order cancellation axiom.
Consider elements in , . Let in be such that is a permutation of for every .
Then .

Note that independence and the Thomsen condition are special cases of cancellation axioms. Note also that if the (m+1)st-order cancellation axiom holds, the mth-order one clearly also holds. From a theoretical point of view, cancellation axioms are useful since they are necessary conditions for additive representability (when the sets 's arefinite, the infinite-order cancellation axiom is even sufficient to ensure additive representability: see Adams (65)). But they are difficult to handle in practical situations when their order increases because they require to test numerous preference relations. Therefore, only those of low order can be easily tested. To circumvent this difficulty, in the finite case, i.e., when the 's are finite, Scott (64) showed that additive representability amounted to solving a system of linear inequalities. In the infinite case --- the case studied in this web page --- one way out is to assume some (nonnecessary) structural conditions which enable to derive any cancellation axiom from independence and the Thomsen condition. Such structural conditions are connectedness of topological spaces (see Debreu (60) and Wakker (89)), or solvability w.r.t. every component (Krantz, Luce, Suppes and Tversky (71)).


tocTable of contents


5.2. Structural conditions

In the rest of this page, I will only speak of a particular form of solvability called restricted solvability. If you are interested in the connectedness conditions, read the excellent book Wakker (89). In the beginning of the sixties, some researchers gave structural conditions they called "solution of equations" (Luce Tukey (64)), "solvability" (Krantz (64)) and "unrestricted solvability" (Fishburn (70)). These were the first form of structural conditions used to ensure the existence of additive utility functions. Later, researchers provided a more general condition called restricted solvability. This condition is still used nowadays.


Axiom 5: restricted solvability.
For every , , , and every , if , then there exists such that . This property is called restricted solvability w.r.t. the first attribute. Parallel definitions hold for the other attributes.

In many problems in economics, the 's can be thought of as continuous: consider for instance attributes like money, time, infinitely divisible consumer goods, etc. In such cases, the above axiom can be assumed safely.

Usually, the structure induced by restricted solvability is strong enough to ensure that some high-order cancellation axioms can be derived from some low-order ones. This is particularly useful for the elicitation of additive utility functions since only independence and the Thomsen condition need be checked. However, restricted solvability alone is not strong enough to ensure additive representability, as is shown in the following example:


Example 3:

Let , where . Let be a preference relation on such that:

  • for all in and for all in , ,
  • for all and for all , > or ( = and ), where , and .

It can be easily shown that restricted solvability holds w.r.t. both and : if , then ; if , then = and so, for all , . Similarly, independence trivially holds. As for the Thomsen condition, if and if , then = = , hence . However is not representable by an additive utility function. Indeed, if it were, the lexicographic order on would also be additively representable, which is known to be impossible.


In this example, two problems prevent additive representability. The first one, the easiest to fix, is simply that the second attribute of is useless since for all , . Hence, even if it satisfies restricted solvability, it cannot structure the outcome space. In such a case, we say that this attribute is inessential. To prevent this, the following axiom shall be used:


Axiom 6: essentialness.
For every , there exist and such that .

The second problem highlighted in example 3 is that it may happen that the outcome space has so many elements that it cannot be represented by real valued functions. This is precisely the case of the lexicographic order on 2. Therefore, to ensure additive representability, we need another axiom that guarantee that the outcome space has no more elements than the set of the real numbers. This axiom is usually an Archimedean axiom. Note that the latter need not always be explicitly stated: condition (H) in Jaffray (74b) is a necessary and sufficient condition for additive representability, closely related to cancellation axioms and which includes implicitly an Archimedean property; similarly, connectedness and separability of the topological spaces in the topological approach to additive conjoint measurement ensure the Archimedean property. However, in this web page, it must be explicitly stated. The Archimedean axiom as stated in the classical theorems is illustrated in figure 1.

Figure 1: The Archimedean axiom.


For simplicity, assume that = 2, and that is representable by an additive utility function . Let , and be arbitrary elements of such that . Then . Let be such that . Then, by definition of function , , or, equivalently, . Similarly, let be such that . Then , or equivalently, . By induction, it is obvious that, at the kth step, one gets . So, when k tends toward the infinity, also tends toward the infinity. If there exists an element such that for all 's, then, since is a utility function, has a finite value such that . Therefore, the sequence --- which is usually referred to as a standard sequence --- must be finite. Of course, a similar result should hold when the sequence is constructed from right to left, instead of from left to right as in figure 1. This suggests the following definition and axiom.


Definition 3: standard sequence w.r.t. the ith component.
For any set of consecutive integers, a set is a standard sequence w.r.t. the first component if and only if the following properties hold:
  • ,
  • for all .
Parallel definitions hold for standard sequences w.r.t. other components.
Axiom 7: Classical Archimedean axiom w.r.t. ith component.
Any bounded standard sequence w.r.t. the th component is finite: if is a standard sequence with mesh
,
i.e.,
,
and if there exist , such that
for all k in , then is finite.
tocTable of contents


5.3. Existence theorems

In the late sixties/beginning of the seventies, researchers proved, using semigroups (see Krantz, Luce, Suppes and Tversky, A. (1971)), that the above axioms were sufficient for additive representability. Two representation theorems were given, depending on the number of dimensions of Cartesian product :


Theorem 1: additive representation on two-dimensional Cartesian products.
Let , and let be a weak order on . Suppose that (,) satisfies independence (axiom 2), the Thomsen condition (axiom 3), restricted solvability w.r.t. every Xi (axiom 5), essentialness w.r.t. every Xi (axiom 6), and the Archimedean axiom (axiom 7). Then there exist real-valued functions on and on such that:

for all .

Moreover, if there also exist two other functions, and , verifying the above equation, then there exist some constant and in such that

for all .
Theorem 2: additive representation on n-dimensional Cartesian products.
Let be a n-dimensional Cartesian product, , and let be a weak order on . Suppose that (,) satisfies independence (axiom 2), restricted solvability w.r.t. every  (axiom 5), essentialness w.r.t. every  (axiom 6), and the Archimedean axiom (axiom 7). Then there exist real-valued functions, on , such that


Moreover, if there also exist other functions verifying the above equation, then there exist some constant and in such that:

.

Note the uniqueness properties in both theorems: when restricted solvability holds, additive utilities are unique up to scale (the a constant) and location (the b's). Remind that restricted solvability is a structural condition and therefore is not necessary for additive representability. When it does not hold, but additive utilities do exist, these are generally not unique up to scale and location, as is shown in the example below. This shows how strong an assumption is restricted solvability.


Example 4:

Let X = {1,2} {1,2,3} be a two-dimensional Cartesian product. Let be a weak order represented by the additive utility function f(x,y) = x + 10 y. Then is also represented by g(x,y) = ln x + y2. Hence additive utilities representing are not unique up to scale and location.


tocTable of contents

6. Generalization of the existence theorems

As we saw at the end of section 2.5.3, restricted solvability is a very strong structural assumption. This is even worse with connectedness of topological spaces. The major disadvantage of these structural conditions is that they prevent the use of additive utilities in many applications, for instance in fields like medical decision making (where the set of states of health usually satisfies neither restricted solvability nor connectedness); or economics, where some consumer goods may not be infinitely divisible. In general, this is true for any application in which some attributes are discrete while others are continuous.


Example 5:

Assume you want to buy a new car. Each car can be modeled as a tuple (brand, horse-power, comfort, color, price, etc). Preferring one car to another is then equivalent to preferring one tuple to another. The problem with the tuples described above is that they cannot satisfy restricted solvability with respect to every dimension, else the brand dimension would be dense in the price dimension; in other words, there should exist as many brands as there are prices, which is of course not true. Hence, in as simple a situation as buying a car, theorems 1 and 2 cannot be applied.

Therefore, it would be interesting to weaken the structural assumptions, in order to broaden the range of applications of additive conjoint measurement. I think extensions can be divided into two separate classes:

  1. additive representability theorems on subsets of Cartesian products. It is indeed a strong hypothesis to assume that all outcomes in the Cartesian product do exist. For instance, if our decision problem is to buy a new car, then the comparisons between different cars amounts to compare tuples of attributes such as price, comfort, brand, etc. It seems quite unrealistic to assume that a $10000 brand new Ferrari exists, but this is precisely what we would assume if, in our Cartesian product, a $10000 Golf exists.

  2. additive representability theorems in which restricted solvability does not hold with respect to every dimension of . Here, I will mainly present some of the results of my Ph-D thesis, although I will also mention works by Nakamura.

tocTable of contents


6.1. Existence theorems


tocTable of contents


6.2. Counterexamples


tocTable of contents

7. References


tocTable of contents