418
The Nobel Prizes
1]. This corresponds to the agent choosing any uniform distribution [e, e + 1] at
cost c(e). The density of a uniform distribution looks like a box. As e varies the
box moves to the right. In first best, the agent should receive a constant payment
if he chooses the first-best level of effort e
FB
. This can be implemented by paying
the agent a fixed wage if the observed outcome x ≥ e
FB
and something low enough
(a punishment) if x < e
FB
. The scheme works, because two conditions hold: (a) the
agent can be certain to avoid punishment by choosing
the first-best effort level
and (b) the moving support allows the principal to infer with certainty that an
agent is slacking if x < e
FB
and hence punish him severely enough to make him
choose first best. In the general model, inferences are always imperfect, but will
still play a central role in trading off risk versus incentives.
C. Second-best with Two Actions
I proceed to characterize the optimal incentive scheme in the special case where
the agent chooses between just two distributions F
L
and F
H
. This special case
will reveal most of the insights from the basic agency model without having to
deal with technical complications. Assume that F
H
dominates the distribution
F
L
in the sense of first-order stochastic dominance: for any z, the probability
that x > z is higher under F
H
than F
L
. This is consistent with assuming that the
high distribution is a more costly choice for the agent: c
L
< c
H
. As an example,
for x = e + ε and e
L
< e
H
, F
H
first-order stochastically dominates F
L
regardless of
how ε is distributed.
Assume that the principal wants to implement H, the other case is uninter-
esting since L is optimally implemented with a fixed payment. Let μ and λ be
(non-negative) Lagrangian multipliers associated with the incentive compatibil-
ity constraint (2) and the participation constraint (3) in the principal’s program
(1)–(3). The optimal second-best contract, denoted s
H
(x), is characterized by
u′(s
H
(x))
–1
= λ + μ[1 − f
L
(x)/f
H
(x)], for every x.
4
(4)
Here f
L
(x) and f
H
(x) are the density functions of F
L
(x) and F
H
(x). It is easy
to see that both constraints (2) and (3) are binding and therefore μ and λ are
strictly positive.
5
The characterization is simple but informative. First, note that the optimal
incentive scheme deviates from first-best, which pays the agent a fixed wage,
because the right-hand side varies with x. The reason of course is that the prin-
cipal needs to provide an incentive to get the agent to put out high effort. Second,
Pay For Performance and Beyond
419
the shape of the optimal incentive scheme
only depends on the ratio f
H
(x)/f
L
(x).
In statistics, this ratio is known as the likelihood ratio; denote it l(x). The likeli-
hood ratio at x tells how likely it is that the observed outcome x originated from
the distribution H rather than the distribution L. A value higher than 1 speaks
in favor of H and a value less than 1 speaks in favor of L.
Denote by s
λ
the constant value of s(x) that satisfies (4) with μ = 0. It is the
optimal risk sharing contract corresponding to λ.
6
The second-best contract (μ
> 0) deviates from optimal risk sharing (the fixed payment s
λ
) in a very intui-
tive way. The agent is punished when l(x) is less than 1, because x is evidence
against high effort. The agent is paid a bonus when l(x) is larger than 1 because
the evidence is in favor of high effort. The deviations are bigger the stronger
the evidence. So, the second-best scheme is designed as if the principal were
making inferences about the agent’s choice, as in statistics. This is quite surpris-
ing because in the model the principal knows that the agent is choosing high
effort given the contract she offers to the agent before the outcome x is observed.
So, there is nothing to infer at the time the outcome is realized.
II. THE INFORMATIVENESS PRINCIPLE
The fact that the basic agency model thinks like a statistician is very helpful for
understanding its behavior and predictions. An important case is the answer that
the model gives to the question: When will an additional signal y be valuable,
because it allows the principal to write a better contract?
A. Additional Signals
One might think that if y is sufficiently noisy this could swamp the value of any
additional information embedded in y. That intuition is wrong. This is easily
seen from a minor extension of (4). Let the optimal incentive scheme that imple-
ments H using both signals be s
H
(x,y). The characterization of this scheme fol-
lows exactly the same steps as that for s
H
(x). The only change we need to make
in (4) is to write s
H
(x,y) in place of s
H
(x) and the joint density f
i
(x,y) in place of
f
i
(x) for i = L,H. Of course, the Lagrange multipliers will not have the same values
if y is valuable.
Considering this variant of (4) we see that if the likelihood ratio l(x,y) =
f
H
(x,y)/f
L
(x,y) depends on x as well as y, then on the left-hand side the opti-
mal solution s
H
(x,y) must depend on both x and y. In this case y is valuable.