The Prisoners’ Dilemma

Adapted from BBC









If there is one thing that every social scientist should know, it is the prisoners’ dilemma game (PD).

The PD describes a particular kind of social interaction where, when everyone acts in his or her own self interest, the outcome is collectively self-defeating. It is collectively self-defeating in the sense that a better outcome for everyone was available and could have been achieved (but only if each had selected an action that was, paradoxically, not in their own self interest). The PD underpins a range of important problems and debates in political economy. From the problems of political order, climate change, collusion in markets and sticky prices to the debate over the limits to individual freedom, if you understand the PD, you understand the problem and the debate.

“When everyone arms, the outcome is mutual terror and life is ‘nasty brutish and short’. It would be much better if everyone disarmed.”

The name, ‘a prisoners’ dilemma’, comes from an illustration of the interaction developed by Albert Tucker in 1950 (see Poundstone, 1993). It goes like this. Two burglars are arrested by the police and held in separate cells. The police do not have sufficient evidence to convict them with certainty, so the District Attorney (Tucker was in California) engages in a bit of plea bargaining. The DA tells each that if they both confess, then they will each serve 2 years in prison. If one confesses and the other does not, then the person confessing will be set free and the person who is convicted on the basis of the other person’s confession will receive 3 years in prison. If neither confesses, then the DA acknowledges that he or she will probably secure a conviction on a lesser charge with the result that each spends 1 year in prison. 

“If enforcement of the law is sufficiently imperfect, then the prisoner’s dilemma resurfaces over the decision of whether to obey the law.”

Hobbes (1651) famously considered essentially the same interaction but among many more than two people in his analysis of the contractual origins of the State. This many person version is now known as a Public Goods game or in some versions, a Common Resource Pool game. In Hobbes, people in the state of nature have a choice between a) arming (to protect themselves) and b) disarming (to avoid the expense of arming but potentially exposing themselves to the predations of others who have armed). When everyone arms, the outcome is mutual terror and life is ‘nasty brutish and short’. It would be much better if everyone disarmed. The difficulty is that mutual disarmament is not a stable outcome (an equilibrium) because when everyone disarms, the best course of action for any individual is to arm because that person can then dominate all others.  For Hobbes, this meant that individuals should agree to give up the freedom to arm by, as it were, contracting to create a State with the monopoly power of force. Freedom and efficiency work against each other in these types of interactions and Hobbes opted, in effect, for efficiency.If each cares only about how much time they may spend in jail, then the rational course of action for each is to ‘Confess’ with the result that they each spend 2 years in prison. But here is the dilemma: each would actually be better off if they both ‘Don’t Confess’ (as this yields 1 year each in prison).  Is there something wrong, then, with their reasoning that leads them to Confess? No. The problem is the nature of the interaction: mutual ‘Don’t Confess’ is better but it is not an equilibrium when each prisoner is only concerned with minimizing his or her jail time. To see this, suppose A thinks B has chosen ‘Don’t Confess’, then A’s best course of action is not ‘Don’t Confess’ (yielding 1 year), it is ‘Confess’ because this yields no jail time. The lack of communication between the prisoners is not important for this insight. If they could talk and both agreed not to confess, the same problem would arise, only now over whether to keep to this agreement. Each would ‘Confess’ because this is the best course of action whether the other person chooses ‘Confess’ or ‘Don’t Confess’. That is, to use some common terminology, each would ‘defect’ on the agreement rather than ‘cooperate’ by abiding by the agreement.

For ‘disarm’ and ‘arm’ substitute ‘reduce greenhouse gas emissions’ and ‘business as usual’ and the same logic captures the essence of the problem of combatting climate change. Each country would benefit from an outcome where everyone reduces greenhouse gas emissions but each country has an incentive to ‘free ride’ on every other country’s efforts in that direction (and so no one decides to reduce greenhouse gas emissions and the climate hots up).

“We often find that students studying economics (ceteris paribus, of course) are more likely to defect than students from other disciplines.”

The examples quickly multiply. Most types of pollution (and forms of agreement) fit this pattern. The choice between ‘don’t litter’ and ‘litter’ is instructive in this respect. The Hobbesian solution is to take away the right to litter by making littering illegal. This is what we have done, but interestingly it does not work 100 percent: some people still litter. In one sense this is not surprising. If enforcement of the law is sufficiently imperfect, then the prisoner’s dilemma resurfaces over the decision of whether to obey the law. It is a version of the problem that the prisoners encounter with communication. They would quickly have seen and agreed that each should choose ‘Don’t Confess’ but, with no mechanism to enforce the agreement, each would decide to break the agreement and ‘Confess’.  The surprise, from this perspective when some people litter, is why some don’t. It seems that their decisions must be made in a different way. They must value in some additional way ‘not littering’ in its own right – just as there is ‘honour among thieves’ which also explains why they do not always ‘Confess’. We capture this additional motivation analytically by positing that some people have a ‘social preference’ for ‘not littering’ or abiding by agreements.  This is what experiments also show(see Camerer, 2003).

There have been many experiments in the laboratory where people (mainly students) play this game. As a result we know a lot about when people generically ‘cooperate’ (that is, disarm, don’t confess, don’t litter, keep an agreement, etc) as opposed to generically ‘defect’ (that is, arm, confess, litter, renege on an agreement, etc).  ‘Cooperation’ is much more likely if the interaction is going to be repeated. It is also more likely in one shot games when individuals expect others to ‘cooperate’ (see Gachter, 2007, on these regularities) .  For both reasons people who belong to the same group tend to ‘cooperate’ more among themselves than with those outside their group(see Corr et al, 2015). In addition, there are a hardcore (between 10 and 30 percent of the population) who are not guided by these type of considerations: they just always either ‘defect’ or always ‘cooperate’ (see Gachter, 2007). Their behaviour is unconditional and there are typically more ‘defectors’ than ‘cooperators’ of this unconditional kind. We also, interestingly, often find that students studying economics (ceteris paribus, of course) are more likely to defect than students from other disciplines (see Frank and Gilovich, 1993).



is Professor of Political Economy at King’s College London. He is the co-author with Yanis Varoufakis of the textbook ‘Game Theory: A Critical Introduction’, and has published widely on the topic of behavioural and experimental economics.

