Decision Making In Prisoner’s Dilemma

Heuristics – natural selection and modularity

Yüklə 2,24 Mb.

səhifə	8/27
tarix	08.12.2017
ölçüsü	2,24 Mb.
	#14708

1 ... 4 5 6 7 8 9 10 11 ... 27

5.4 Heuristics – natural selection and modularity

After Simon came with his bounded rationality theory and Kahneman and Tversky with their prospect theory and heuristics and biases critique of classical rationality, many people were puzzled how much plagued the animal rationale is with biased judgment and irrational decision making that fail to maximize utility. I will try to outline here an explanation: evolutionary theory and evolutionary psychology indicate that our powers of reasoning might represent the best possible trade-off between all relevant costs and benefits.

From an evolutionary perspective animal rationale is an improbable organism, since, technically, it would require a general utility evaluating and maximizing brain module to evolve and prevail over the other modules (when in conflict). Human brain evolved as a modular, domain-specific and multi-functional system (each module has its special function – visual perception, auditory perception, speech, face recognition, planning, emotional reactivity, etc.). Pinker (2009) notes that without preprogrammed, domain-specific modules perception or speech would be computationally infeasible (a general all-purpose, neuronal-network computer is not able to perform these tasks). Human mind must be preequipped with several different formats of representation and processing to interpret reality effectively.
For a detailed account of various brain modules see Gazzaniga et al., 2002; Gazzaniga, 1992; Koukolík, 2002, 2006. For functional, cognitivist analysis of individual modules see Pinker, 2009. Following Chomsky’s postulation of LAD, Pinker (1994) and Jackendoff (1995) wrote about the language module. Leslie (1992) assumes that human brain has a genetically preprogrammed “theory of mind module”. There is no evidence today for a general utility evaluating and maximizing brain module.

6. Theory of games

6.1 How to play the game

6.11 Beginnings and applications

The Prisoner’s Dilemma game was introduced by M. Flood and M. Desher in about 1950 and formalized by A. W. Tucker shortly thereafter. One of the first important accounts of this game is Games and Decisions by Luce & Raiffa (1957). Prisoner’s Dilemma (sometimes called a mixed-motive game) earned its status of a “classic” game and it serves as a tool for modeling cooperative action, conflicts, bargaining, and decision making not only in psychology, but also in various disciplines, such as economics, political science, biology, and ethics (a few examples are summarized below).
(1) Economics: e. g. for modeling of oligopolistic competition vs. collusion (see Samuelson & Nordhaus, 1991, pp. 568-571, 607-609, 629-633).
(2) Political science: e. g. strategic arms control, deterrence, the role of possible retaliation in prevention of armed conflict, or elimination of the advantage of the first attack. In game theoretic terms it is necessary to eliminate the possibility of achieving T, and/or inflicting S outcomes on the opponent, and to leave the parties with choice between P and (preferably) R outcomes, see Pilisuk & Rapoport, 1964; Schelling 1960a, 1960b/2005, 1962, 2009 (explanation of T, S, P, R is given below).
Strategic games are used by political scientists to model political conflicts. Pilisuk & Rapoport (1964) designed a series of strategic games of intermediate realism: on one side, although being more realistic then the Prisoner’s Dilemma, they still contain the basic C, or D choice (see below) and the dilemma inherent in Prisoner’s Dilemma (which in their games is repeated in various different circumstances), on the other side they simulate conditions and actions such as the possibility of winning by reallocating military resources to economic ends, inspection (random or player-induced) and the option to withdraw at inspection showdowns (at a cost), negotiation, waging of limited war, concession (ceding units to the opponent), or asymmetries (different rewards and different onset conditions for each player). Their games are “simple” enough to allow clear and exact analysis and replication. Another early “realistic” international game was designed by Guetzkow, 1957.
(4) Evolutionary theory and biology (Trivers, 1971; Maynard Smith, 1974; Axelrod & Hamilton, 1981; Dawkins, 2006b, pp. 202-233).
(5) Ethics: breaking promises and cheating, lying as destruction of trust, coercive threats to enforce cooperation, the role of conscience as an intrinsic coercive rule, reciprocal and unilateral credible and binding restrictions, the incompatibility of individual rationality and social good; see for example Schelling, 1968.

6.12 A 2 x 2 game

The Prisoner’s Dilemma is basically a game of choice whether to cooperate (C), or defect (D). The outcome of the decision of player A (whether to cooperate, or defect) is determined by the decision of player B (whether to cooperate, or defect), and vice versa. The decision of the other player is not known beforehand to his opponent. The possible outcomes are as follow: when both players cooperate, both get R (or the reward for mutual cooperation), when both players defect, both get P (punishment for mutual defection), when player A cooperates and player B defects, player A gets S (or sucker’s payoff), while player B gets T (temptation to defect). The preference ranking for the four payoffs from highest to lowest reward is T > R > P > S. Also, it is assumed that 2R > (T + S), or that the reward for mutual cooperation is greater than the average of temptation and the sucker’s payoff (so as the two players cannot gain more by taking turns in mutual defection than by mutual cooperation), see Axelrod 2006, pp. 8-10.

6.13 Question of trust

Deutsch (1958) perceives playing C as an expression of trust and playing D as an expression of suspicion/distrust towards the opponent. His interpretation, however, is not exhaustive: D can be interpreted as an expression of competitiveness, as an attempt to exploit the opponent, or as hedging against being exploited. In this context we can, nevertheless, remind ourselves that Coleman (1994) sees trust as one of the foundations of interpersonal relations.

„The potential trustor must decide between not placing trust, in which case there is no change in utility, and placing trust, in which case the expected utility relative to his current status is the potential gain times the chance of gain minus the potential loss times the chance of loss. A rational actor will place trust if […] the ratio of chance of gain to the chance of loss is greater than the ratio of the amount of the potential loss to the amount of the potential gain“ (Coleman, 1994, p. 99).
According to Coleman placing trust follows utility-maximizing rationality, but the calculation that Coleman assumes is not always possible, as will be demonstrated in this section on strategic games.

6.14 Strictly dominant strategy

Table 6.1: An example of Prisoner’s Dilemma payoff matrix

		Player B
		Cooperate	Defect
Player A	Cooperate	R = 3, R = 3 (Reward for mutual cooperation for both players)	S = 0, T = 5 (Sucker’s payoff for player A, temptation to defect for player B)
Player A	Defect	T = 5, S = 0 (Temptation to defect for player A, sucker’s payoff for player B)	P = 1, P = 1 (Punishment for mutual defection for both players)

(Table 6.1 is adapted from Axelrod, 2006, p. 8. The payoffs to player A are listed first. The values here are the smallest integers that satisfy the above defined conditions for payoffs. The payoffs can be understood as points gained or dollars received from a bank.)

Now let’s consider how a reasonable utility-maximizer would act. Player A knows, that Player B can either cooperate (1) or defect (2).
(1) Player A might assume that Player B will cooperate, so he (Player A) can get either R = 3 (for his cooperation), or T = 5 (for defection). Ergo, a utility-maximizing Player A will defect.
(2) Player A might assume that Player B will defect, so he (Player A) can get either S = 0 (for his cooperation), or P = 1 (for defection). Ergo, a utility-maximizing Player A will defect.
To put it shortly, defection in single-move Prisoner’s Dilemma is strictly dominant. This reasoning holds true for both utility-maximizing players, so it can be showed that both players will defect and gain only P each, instead of R each. Mutual defection is the most reasonable thing to do from each player’s point of view, although the average payoff is the worst possible (2R > (T+S) > 2P). Individual utility-maximizing leads to the lowest possible utility gained for both. Axelrod & Hamilton (1981) and Axelrod (2006) also show that defection in single encounter is the solution that evolution would select (fitness in the quotation ought to be defined as differential replication of competing genes):
“If the payoffs are in terms of fitness, and the interactions between pairs of individuals are random and not repeated, then any population with a mixture of heritable strategies evolves to a state where all individuals are defectors” (Axelrod, 2006, p. 92).

6.15 Superrationality and conscience

Some theorists, however, came with the idea that conscience (Rapoport, 1964) or superrationality (Hofstadter, 1985) can cause the players to play C even in the single-move Prisoner’s Dilemma. If one superrational player knows (or supposes) that he plays with another superrational player, he will circumvent the “always defect” rational strategy and go for the mutually (and in the end also individually) more rewarding “superrational” cooperative strategy, because he knows the other superrational player would do the same. Behaving according to one’s own consciousness towards other people with consciousness works similarly (having consciousness is displayed by playing C). Schelling, 1968, p. 37 remarked that the role of consciousness in Rapoport’s account – “conscience may be socially superior to individual rationality” – is that of a useful constraint. Conscience and superrationality are subjective phenomena, but their application by one subjective agent eventually requires a manifestation of identical (or analogous – e. g. a conscious player can successfully play a cooperative game with a superrational player) subjective phenomena in the behavior of the other player.

6.16 Matching heuristic and control heuristic

According to (Morris et al., 1998) cooperative behavior in single-move Prisoner’s Dilemma might stem from the so called “matching heuristic” – that means you act as if you believe your opponent is going to cooperate and you “reciprocate” this cooperation. Following matching heuristic means, that, even in single-shot games, if you expect cooperation from your opponent, you cooperate, if you expect defection, you defect, and if you are uncertain, you cooperate “in good faith” that your opponent is going to cooperate. Morris et al. (1998) found that players adhered more strongly to matching heuristic when their opponents were presented to them as similar to themselves.

Shafir & Tversky (1992) discovered and Morris et al. (1998) further examined another heuristic applied by players in the single-shot Prisoner’s Dilemma. This heuristic was called “control heuristic” – players defected, if they expected that their opponents will either cooperate or defect, but they cooperated, if they were uncertain about the likelihood of their opponent’s cooperation. Since their opponent can in any case either cooperate, or defect, players’ cooperative choices in the uncertainty condition reflect an “illusion of control” or “quasi-magical” thinking (as Shafir & Tversky, 1992 put it). Morris et al. (1998) found that players adhered more strongly to matching heuristic when the opponent’s move was presented to them as an “open fate” future event than when it was a “sealed fate” past event.
Cooperation in single-move Prisoner’s Dilemma might be partially explained by various other reasons, such as inattention of decision makers (Binmore, 1999), or it can be viewed as a spillover from experience in preceding iterated Prisoner’s Dilemmas that could have strengthened cooperativeness in some players (Chater et al., 2008).

6.17 Iterated Prisoner’s Dilemma

Now, what about iterated Prisoner’s Dilemma? Luce & Raiffa (1957, pp. 94-102) showed that in any finite length iterated Prisoner’s Dilemma (that is in any given finite sequence of moves) a rational player should still always defect, since he cannot expect cooperation from the other player on the last move (the last move is basically the same as a single-move Prisoner’s Dilemma), and since he cannot be rewarded by cooperation of the other player on the last move, he has no incentive to cooperate on the next-to-last move (which becomes the same as a single-move Prisoner’s Dilemma), and so on all the way back to the very first move. This is called multistage Prisoner’s Dilemma paradox, which comes from (as we saw) backward induction from a known terminating point (Rapoport, 1967a, p. 143). In Luce and Raiffa’s “classical” account of iterated Prisoner’s Dilemma there is no “shadow of the future”, that is no expectation of possible cooperation by the other player in the next move, and hence no incentive to maintain cooperative strategy. Although theoretically sound, this depiction represents a similar failure as was Samuelson’s failure to aggregate sequential bets (Kahneman & Lovallo, 1993; Rabin, 2000; see section 4.). On the other hand, Silverstein et al. (1998) point out that because reinforcement of defection is immediate, while reinforcement of cooperation is both probabilistic and somewhat delayed (and hence discounted by subjects – see for example also Wilson, 1986; Rainer & Rachlin, 1993; see section 6.19 on discount parameter, too), defection should be more strongly reinforced in subjects.

Some examples of actual empirical findings about iterated Prisoner’s Dilemma: Ten pairs of subjects in Rapoport et al., 1976, played a 300-trials Prisoner’s Dilemma, the average rate of cooperation was 70%. In a study by Monterosso et al., 2002, the median rate of cooperation among 45 pairs of subjects was 65,0%, with a cooperation rate of 46,8% at the lower quartile, and 76,3% at the upper quartile (the iterated games in their study were very long, with median duration of 1807 trials). In our study we obtained similar rates of cooperation in the 30-trials iterated games. However, some researchers obtained much lower mean cooperation rates, for example in a study by Jones et al. (1968) the rates were about 30% for various pay-off conditions.
So, players in reality do not behave as Luce and Raiffa presumed, although they (often) tend to increase the rate of defection near the end of the game when the length of the game is known, see Dawkins, 2006a; Axelrod (1981) successfully eliminated end-game effects by determining the length of the games probabilistically in his second computer tournament – there was a 0,00346 chance of ending with each given move, instead of a given number of moves.
Rapoport & Dale (1966), using data from Rapoport & Chammah (1965a), confirmed the existence of an end effect, that is a decline of frequency of cooperation towards the end of an iterated Prisoner’s Dilemma game or its definite sequence. The development of a typical 25-run game (averaged across 649 pairs of players in Rapoport & Chammah’s (1965a) study) was summarized by Rapoport & Dale (1966, p. 364): “There is a short initial decline following the start effect. Then the mean frequency of C remains practically constant until about four plays from the end, at which time the end effect sets in, presumably as the players anticipate the final defection.”
End effect was also found for example by McClintock et al., 1963; Rapoport et al., 1976; Andreoni & Miller, 1993; Hauk & Nagel, 2001; Hauk, 2003; Bó, 2005; Bereby-Meyer & Roth, 2006. End effect has been called into question by some sporadic empirical results (Morehouse, 1966). Also Oskamp (1974) found that after gradual initial increase the cooperation of subjects (in a 50-run game) was stable (i. e. no significant end-effect). In an experiment by Jones et al. (1968) cooperation in an iterated game (150 trials) decreased lineary as a function of time (rather than abruptly towards the end of the game). Scodel, et al. (1959) and Minas et al. (1960) found an overall decrease of cooperation in the development of an iterated game, rather than a pronounced “end-effect”.
There is an opposite start effect – an increased likelihood of cooperation at the beginning of an iterated Prisoner’s Dilemma game. One of the reasons for relatively high initial cooperation can be the player’s tendency to build reputation as an altruistic/cooperative player that can enhance his gains (as well as mutual gains for both players) during future interaction (Kreps et al., 1982; Kreps & Wilson, 1982; Milgrom & Roberts, 1982; Andreoni & Miller, 1993).
Aumann, 1959 showed that in indefinitely long games players should adopt cooperative strategy. Luce and Raiffa thus identified the generally most prohibitive condition for sustaining cooperative behavior (single-move Prisoner’s Dilemma and Prisoner’s Dilemma of known finite length) and Auman identified the general condition that best sustains cooperation (infinite games). Somewhere in between are, in this respect, games of unknown finite length and games of random duration.

6.18 Teach and be taught

Rapoport (1967a) suggested a complex general rule for determining optimal policy for the iterated Prisoner’s Dilemma (Harris, 1969, offers further refinement of Rapoport’s approach). Rapoport used dynamic programming to derive the optimal policy, modeling the sequential joint decisions of players as a Markovian decision process. The basic logic of his solution is quite simple (and actually constituting something of a general formulation of Anatol Rapoport’s (1964) and Hofstadter’s (1985) arguments – see 6.15).

The assumption is that in the iterated Prisoner’s Dilemma game player A can consider also other information than that given by the payoff matrix (if he considers only this information and maximizes his immediate reward, he will always play D). But player B’s decision on trial t - u (u = 1, 2, … t - 1) (let’s say, to cooperate) can transmit information to player A (let’s say about the willingness of player B to cooperate in the future). This transmission of information means, that what happens at t - u can influence what happens at t: the players can teach, be taught, they can build mutual trust, collude, or distrust each other. “Furthermore, we suspect that the suggested way of perceiving the game may not be far away from the player’s actual perception of it” (Rapoport, 1967a, p. 137).
Already Pilisuk et al., 1965, p. 492 assumed there was “a system of gestures and responses in the game itself which so teaches the lesson of cooperation.” Pilisuk et al., 1965, p. 506 present a revealing statement by one of their subjects: “At first I thought he [the other player] was stupid letting me win like that. But after a while I saw that he was trying to get me to turn over more factories [to play C] so we could both win.” Hence, there will develop some dynamics in the process of iterated playing (no decision being strictly dominant) and the decisions will change according to each player’s expectations (subjective probabilities) about the behavior of the other player and about the effect of his own decisions on the behavior of the other player (etc.); these expectations in t will be (at least partly) based upon what happened in t – u. A player should, in this model, follow his optimal policy (gain maximizing relative to his expectations), unless the expectations change: for example if player B starts exploiting player A’s cooperative behavior.

6.19 Shadow of the future

Luce & Raiffa (1957) showed that, theoretically, an iterated Prisoner’s Dilemma of known finite length leads to mutual defection, although in reality mutual defection sets in only near the end of the game. Before the end of the game (not near the end) players behave as if there was certain probability of another encounter with the other player in the next move (and this probability rapidly decreases near the end of the game; if the length of the game is not announced, then there is of course not this drop of probability of another encounter). If we do not know the length of the game (and for the most part, as showed by Axelrod 1980a, 1980b, even if we do) there is always a certain probability (w) of another encounter with the other player, which influences, as we shall see, the decision making. W is also called discount parameter. It reflects the tendency to value a certain good obtained now more then the same good obtained in the future, though, for our purposes, we can think of w just as of the probability of another encounter with the other player. Discount parameter was, as far as I know, introduced into the study of experimental games by Shubik (1963).

We will analyze here a sequence of moves in the iterated Prisoner’s Dilemma (and a special case of single-move Prisoner’s Dilemma) using parameter w (discount parameter). The result we want to demonstrate is that if the shadow of the future (reflected by the discount parameter w) looms small, as in the case of single-move Prisoner’s Dilemma, we should not expect cooperation, but defection. Higher rate of cooperation when w parameter was increased was found for example in experiments by Roth & Murnighan, 1978; Bó, 2005.
Parameter w reflects the probability of meeting a given player again (in the single-move Prisoner’s Dilemma w = 0, for two-rounds Prisoner’s Dilemma w = 0,5, for a 200-trials iterated game w = 0,99654). The lower the probability w of meeting a given opponent again, the less value is attributed to subsequent moves and the less incentive there is to cooperate (for w = 0 the strategy should be always to defect; for mathematical explanation see the following paragraphs).
The equation expressing the diminishing cumulative payoff (C_p) for mutual cooperation of both players in subsequent moves can be expressed as follows (see Axelrod 2006, p. 207):
(*) C_p = R + wR + w²R + w³R + … = R/(1 - w)
(Where R is the reward for mutual cooperation.)
When w is relatively high (relative to the payoff parameters), it pays to cooperate (or, to be more exact, to follow a nice, retaliatory, but forgiving strategy such as Tit for Tat). If, for example the payoffs are T = 5, R = 3, P =1, S = 0, and w = 0,9, we can plug the respective values into equation (*) for calculating the cumulative payoff for mutual cooperation:
C_p = R/(1-w) = 3/(1-0,9) = 30
You can compare C_p with cumulative payoff for defectory strategy (D_p). If you follow the defectory strategy, you get T the first move and then get P on the subsequent moves, since we can suppose that the opponent retaliates:
(**) D_p = T + wP + w²P + w³P + … = T + wP . (1 + w + w² + …) = T + wP/(1 - w) = 5 + 0,9 . 1/0,1 = 14
As you can see in this case (relatively high w = 0,9), C_p = 30 is bigger then D_p = 14. It can be demonstrated that the critical value of w to make Tit for Tat (that is a nice, retaliatory, and forgiving strategy) a collectively stable strategy is when:
(***) w ≥ (T - R) / (T - P)
(Note: collectively stable strategy is a strategy that cannot be invaded by defectors – or if it is invaded, the defectors get lower payoffs than the original strategy (here Tit for Tat), see Axelrod, 2006, p. 218. Suppose player A follows strategy s. Player B cannot do better than use the same s strategy, if and only if strategy s is collectively stable. To be more exact, collectively stable strategy in the strict sense can be invaded by neither defectors, nor cooperators, and, as Selten & Hammerstein (1984) demonstrated, population of Tit for Tats can be invaded by ALL C (or other nice strategies, as we might add), even if it can not be invaded by defectors; see also Boyd & Lorberbaum, 1987.
With payoffs T = 5, R = 3, P = 1, w must be at least 0,5 if you are to expect cooperation to appear and endure throughout the game (a 2-rounds game in this case). For single-move game you should expect defection for any possible payoff parameters that are allowed in the Prisoner’s Dilemma (T > R > P > S and 2R > (T + S)). If you plug the values into equations (*) and (**), you can see that if w = 0,5, then C_p = 3/0,5 = 6 and D_p = 5 + 0,5/0,5 = 6. If w = 0,1, then C_p = 3/0,9 = 3,333 and D_p = 5 + 0,1/0,9 = 5,111. If w is low, mutual defection is almost inevitable.
Given that w is not available with certainty (which is the case in real life where businesses go bankrupt and people die or move away), trying to calculate a strategy becomes mainly an uncertainty problem. Because if you knew w and the payoffs, and could hence calculate the optimal strategy for both players, there would be no dilemma (between cooperation and defection), but a single best strategy to employ. Short “shadow of the future” (low w) might also explain the “banker’s paradox” (see Tooby & Cosmides, 1996): the more desperately a person needs help, the less likely she is to receive it, because the worse/more desperate her condition, the smaller the probability she will be able to reciprocate.

Yüklə 2,24 Mb.

Dostları ilə paylaş:

1 ... 4 5 6 7 8 9 10 11 ... 27