Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business...

43
Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler) Presentation for IEOR Seminar Berkeley, October 29, 2006

Transcript of Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business...

Page 1: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Scoring Rules, Generalized Entropy, and Utility Maximization

Robert Nau

Fuqua School of Business

Duke University

(with Victor Jose and Robert Winkler)

Presentation for IEOR Seminar

Berkeley, October 29, 2006

Page 2: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Overview• Scoring rules are reward functions for defining

subjective probabilities and eliciting them in forecasting applications and experimental economics (de Finetti, Brier, Savage, Selten...)

• Cross-entropy, or divergence, is a physical measure of information gain in communication theory and machine learning (Shannon, Kullback-Leibler...)

• Utility maximization is the decision maker’s objective in Bayesian decision theory and game theory (von Neumann & Morgenstern, Savage...)

Page 3: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

General connections

• Any decision problem under uncertainty may be used to define a scoring rule or measure of divergence between probability distributions.

• The expected score or divergence is merely the expected-utility gain that results from solving the problem using the decision maker’s “true” probability distribution p rather than some other “baseline” distribution q.

Page 4: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Specific results• We explore the connections among the best-known

parametric families of generalized scoring rules, divergence measures, and utility functions.

• The expected scores obtained by truthful probability assessors turn out to correspond exactly to well-known generalized divergences.

• They also correspond exactly to expected-utility gains in financial investment problems with utility functions drawn from the linear-risk-tolerance (a.k.a. HARA) family.

• These results generalize to incomplete markets via a primal-dual pair of convex programs.

Page 5: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Part 1: Scoring rules• Consider a probability forecast for a discrete event

with n possible outcomes (“states of the world”).

• Let ei = (0, ..., 1, ..., 0) denote the indicator vector for the ith state (where 1 appears in the ith position).

• Let p = (p1, ..., pn) denote the forecaster’s true subjective probability distribution over states.

• Let r = (r1, ..., rn) denote the forecaster’s reported distribution (if different from p).

• Let q = (q1, ..., qn) denote a baseline (“prior”) distribution upon which the forecaster seeks to improve.

Page 6: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Definition of a scoring rule

• A scoring rule is a function S(r, ei, q) that determines the forecaster’s score (reward) for giving the forecast r, relative to the baseline q, when the ith state is subsequently observed to occur.

• Let denote the forecaster’s expected score for reporting r when her true distribution is p and the baseline distribution is q.

• Thus, in general, a scoring rule can be expressed as a function of three vector-valued arguments, which is linear in the 2nd argument.

Page 7: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Proper scoring rules• The scoring rule S is [strictly] proper if

S(p, p, q) [>] S(r, p, q) for all r [p], i.e., if the forecaster’s expected score is [uniquely] maximized when she reports her true probabilities.

• Henceforth let denote the forecaster’s expected score for a truthful forecast, as a function of p and q.

• S is [strictly] proper iff is [strictly] convex.

Page 8: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Proper scoring rules, continued

• If S is strictly proper, then it is uniquely determined from by McCarthy’s (1956) formula:

• Thus, a strictly proper scoring rule is completely characterized by its expected-score function.

• Henceforth only strictly proper scoring rules will be considered, and it will be assumed that r = p.

Page 9: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Standard scoring rules

• The three most commonly used scoring rules all assume a uniform baseline distribution (q = 1/n), which will be temporarily suppressed.

• Quadratic scoring rule:

• Spherical scoring rule:

• Logarithmic scoring rule:

Page 10: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

History of standard scoring rules

• The quadratic scoring rule was introduced by de Finetti (1937, 1974) to define subjective probability; later used by Brier (1950) as a tool for evaluating and paying weather forecasters; more recently advocated by Selten (1998) for paying subjects in economic experiments.

• The spherical and logarithmic rules were introduced by I.J. Good (1971), who also noted that the spherical and quadratic rules could be generalized to positive exponents other than 2, leading to...

Page 11: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Generalized scoring rules (uniform q)

• Power scoring rule ( quadratic at = 2):

• Pseudospherical scoring rule ( spherical at = 1)

• Both rules rescaled logarithmic rule at = 1.

Page 12: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Weighted scoring rules (arbitrary q)

• Our first contribution is to merely point out that the power and pseudospherical rules can be weighted by an arbitrary baseline distribution q and scaled so as to be valid for all real .

• Under the weighted rules, the score is zero in all states iff p q, and the expected score is positive iff p q.

• Thus, the weighted rules measure the “value added” of p over q as seen from the forecaster’s perspective.

Page 13: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Weighted power scoring rule:

Weighted pseudospherical scoring rule:

Page 14: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Properties of weighted scoring rules

• Both rules are strictly proper for all real .

• Both rules weighted logarithmic rule ln(pi/qi) at =1.

• For the same p, q, and , the vector of weighted power scores is an affine transformation of the vector of weighted pseudospherical scores, since both are affine functions of (pi/qi)1.

• However, the two rules present different incentives for information-gathering and honest reporting.

• The special cases = 0 and = ½ have interesting properties but have not been previously studied.

Page 15: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Special cases of weighted scores

Page 16: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

• Weighted power expected score:

• Weighted pseudospherical expected score:

Weighted expected score functions

Page 17: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Special cases of expected scores

Power Pseudospherical

Page 18: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Figure 1. Weighted power score vs. beta (uniform q)

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

State 1 (p=0.05)

State 2 (p=0.25)

State 3 (p=0.70)

• Behavior of the weighted power score for n = 3.

• For fixed p and q, the scores diverge as .

• For << 0 [ >> 2] only the lowest [highest] probability event is distinguished from the others.

Page 19: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

• By comparison, the weighted pseudospherical scores approach fixed limits as .

• Again, for << 0 [ >> 2] only the lowest [highest]

probability event is distinguished from the others.

Figure 2. Weighted pseudospherical score vs. beta (uniform q)

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

State 1 (p=0.05)

State 2 (p=0.25)

State 3 (p=0.70)

Page 20: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

The corresponding expected scores vs. are equal at = 1, where both rules converge to the weighted logarithmic scoring rule, but elsewhere the weighted power expected score is strictly larger.

Figure 3. Expected scores vs. beta (p=0.05, 0.25, 0.70, uniform q)

0.2

0.4

0.6

0.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Pseudospherical

Power

Page 21: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Part 2. Entropy

• In statistical physics, the entropy of a system with n possible internal states having probability distribution p is defined (up to a multiplicative constant) by

• In communication theory, the negative entropy H(p) is the “self-information” of an event from a stationary random process with distribution p, measured in terms of the average number of bits required to optimally encode it (Shannon 1948).

Page 22: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

The KL divergence

• The cross-entropy, or Kullback-Leibler divergence, between two distributions p and q measures the expected information gain (reduction in average number of bits per event) due to replacing the “wrong” distribution q with the “right” distribution p:

Page 23: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Properties of the KL divergence

• Additivity with respect to independent partitions of the state space:

• Thus, if A and B are independent events whose initial distributions qA and qB are respectively updated to pA and pB, the total expected information gain in their product space is the sum of the separate expected information gains, as measured by their KL divergences.

Page 24: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Properties of the KL divergence

• Recursivity with respect to the splitting of events:

• Thus, the total expected information gain does not depend on whether the true state is resolved all at once or via a sequential splitting of events.

Page 25: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Other divergence/distance measures

• The Chi-square divergence (Pearson 1900) is used by frequentist statisticians to measure goodness of fit:

• The Hellinger distance is a symmetric measure of distance between two distributions that is popular in machine learning applications:

Page 26: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Onward to generalized divergence...

• The properties of additivity and recursivity can be considered as axioms for a measure of expected information gain which imply the KL divergence.

• However, weaker axioms of “pseudoadditivity” and “pseudorecursitivity” lead to parametric families of generalized divergence.

• These generalized divergences “interpolate” and “extrapolate” beyond the KL divergence, the Chi-square divergence, and the Hellinger distance.

Page 27: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Power divergence• The directed divergence of order , a.k.a. the power

divergence, was proposed by Havrda & Chavrát (1967) and further elaborated by Rathie & Kannappan (1972), Cressie & Read (1980), Haussler and Opper (1997), among others:

• It is pseudoadditive and pseudorecursive for all , and it coincides with the KL divergence at = 1.

• It is identical to the weighted power expected score, hence the power divergence is the implicit information measure behind the weighted power scoring rule.

Page 28: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Pseudospherical divergence• An alternative generalized entropy was introduced by Arimoto

(1971) and further studied by Sharma & Mittal (1975), Boekee & Van der Lubbe (1980) and Lavenda & Dunning-Davies (2003), for >1:

• The corresponding divergence, which we call the pseudospherical divergence, is obtained by introducing a baseline distribution q and dividing out the unnecessary in the numerator:

Page 29: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Properties of the pseudospherical divergence

• It is defined for all real (not merely > 1).

• It is pseudoadditive but generally not pseudorecursive.

• It is identical to the weighted pseudospherical expected score, hence the pseudospherical divergence is the implicit information measure behind the weighted pseudospherical scoring rule.

Page 30: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Interesting special cases• The power and pseudospherical divergences

both coincide with the KL divergence at = 1.

• At = 0, = ½, and = 2 they are linearly (or at least monotonically) related to the reverse KL divergence, the squared Hellinger distance, and the Chi-square divergence, respectively:

Page 31: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Where we’ve gotten so far...

• There are two parametric families of weighted, strictly proper scoring rules which correspond exactly to two well-known families of generalized divergence, each of which has a full “spectrum” of possibilities ( < < ).

• But what is the decision-theoretic significance of these quantities?

• What are some guidelines for choosing among the the two families and their parameters?

Page 32: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Part 3. Financial decisions under uncertainty with linear risk tolerance

• Suppose that an investor with subjective probability distribution p and utility function u bets or trades optimally against a risk-neutral opponent or contingent claim market with distribution q.

• For any risk-averse utility function, the investor’s gain in expected utility yields an economic measure of the divergence between p and q.

• In particular, suppose the investor’s utility function belongs to the linear risk tolerance (HARA) family, i.e., the family of generalized exponential, logarithmic, and power utility functions.

Page 33: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Risk aversion and risk tolerance• Let y denote gain or loss relative to a (riskless) status

quo wealth position, and let u(y) denote the utility of y.

• The monetary quantity (y) u (y)/u (y) is the investor’s local risk tolerance at y (the reciprocal of the Pratt-Arrow measure of local risk aversion).

• The usual decision-analytic rule of thumb is as follows: an investor with current wealth y and local risk tolerance (y) is roughly indifferent to accepting a50-50 gamble between the wealth positions y (y) and y ½(y), i.e., indifferent to gaining (y) or losing ½(y) with equal probability.

Page 34: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Linear risk tolerance (LRT) utility

• The most commonly used utility functions in decision analysis and financial economics have the property of linear risk tolerance, i.e., (y) = + y,where > 0 is the risk tolerance coefficient.

• If the unit of money is normalized so that the risk tolerance equals 1 at y = 0 (status quo wealth), then = 1, and the utility function is u(y) = g (y), where:

Page 35: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Special cases of normalized LRT utility

Page 36: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Figure 4. Normalized LRT utility functions (beta = risk tolerance coefficient)

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1beta = -1 quadratic

beta = 0 exponential

beta = 1 logarithmic

beta = 2 square-root

Qualitative properties of LRT utilityg (0) = 0 and g (0) = 1 for all : the functions {g (y)}

are mutually tangent with dollar-utile parity at y = 0.

Page 37: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

The investor’s decision model

• Model Y: the investor seeks the payoff vector y that maximizes her own LRT expected utility under her distribution p subject to not decreasing the opponent’s linear expected utility (i.e., expected value) under his distribution q.

• The investor’s reward in state i is her own ex post utility payoff g (yi).

Page 38: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

A modified decision model

• Model Y: the investor seeks the payoff vector y that maximizes the sum of her own LRT expected utility under her distribution p and the opponent’s linear expected utility (expected value) under his distribution q.

• The investor’s reward in state i is her own ex post utility payoff g (yi) plus the opponent’s ex ante expected monetary payoff.

Page 39: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Main result

1. In the solution of Model Y, the investor’s utility payoff in state i is the weighted pseudospherical score, whose expected value is the pseudospherical divergence.

2. In the solution of Model Y, the investor’s utility payoff in state i is the weighted power score, whose expected value is the power divergence.

3. For any p, q, and , the weighted power expected score (power divergence) is greater than or equal to the weighted pseudospherical expected score (pseudospherical divergence).

Page 40: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Observations

• Insofar as Model Y is a more “realistic” investment problem than Model Y, the pseudospherical divergence appears to be more economically meaningful than the power divergence.

• The same results are obtained if the investor is endowed with linear utility while the opponent is risk averse with risk tolerance coefficient 1.

• Both of these problems involve non-decreasing risk tolerance on the part of the more-risk-averse agent only if is between 0 and 1.

Page 41: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Extension to incomplete markets• Suppose the investor faces an incomplete market in

which asset prices are supported by a convex set of risk neutral distributions.

• Let Q denote the matrix whose rows are the extreme points of the set of risk neutral distributions.

• Then the investor seeks the payoff vector y that maximizes her own LRT expected utility under her distribution p subject to the constraint Qy 0.

• This is a convex optimization problem whose dual is to find the risk neutral distribution in the convex hull of the rows of Q that minimizes the pseudospherical divergence from p.

Page 42: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Details of duality relationship

• Let z denote a vector of non-negative weights, summing to 1, for the k rows of Q.

• Then zTQ is a supporting risk neutral distribution in the convex hull of the rows of Q, and the primal-dual pair of optimization problems is as follows:

Page 43: Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)

Conclusions• The commonly used power & pseudospherical scoring

rules can be improved by incorporating a not-necessarily-uniform baseline distribution.

• The resulting weighted expected scores are equal to well-known generalized divergences.

• The weighted pseudospherical scoring rule and its divergence have a more natural utility-theoretic interpretation than the weighted power versions.

• Values of between 0 and 1 appear to be the most interesting, and the cases = 0 and = ½ have been so far under-explored.