Stationary Concepts for Experimental 2×2-Games

Selten, Reinhard; Chmura, Thorsten (2008) “Stationary Concepts for Experimental 2×2-Games." American Economic Review, Volume 98, Number 3, June 2008 , pp. 938-966(29). DOI: http://dx.doi.org/10.1257/aer.98.3.938; zju.edu.cn 提供的 [PDF];

==notes by yinung==

mixed equilibrium 可視為是動態均衡解 (stationary concept)

Mixed equilibrium has several interpretations. One interpretation is that of a rational recommendation for a one-shot game. Another  interpretation  looks at mixed equilibrium as a result of evolutionary or learning processes in a situation of frequently repeated play with two populations of randomly matched opponents. One may speak of mixed equilibrium as a behavioral stationary concept.

Quantal response equilibrium (Richard D. McKelvey and Thomas R. Palfrey 1995)

assumes that players give quantal best  responses  to  the behavior of  the others  (see Section  IB).  In  the exponential form of quantal response equilibrium considered here, the probabilities are proportional to an exponential with the expected payoff times a parameter in the exponent.

Action-sampling  equilibrium

…in  a  stationary  situation,  a player takes a sample of seven observations of the strategies played on the other side, and then optimizes against this sample.

Payoff-sampling equilibrium  (Osborne and Rubinstein 1998)

…envisions a  stationary  situation in which a player takes two samples of equal size, one for each of her pure strategies. She then compares the sum of her payoffs in the two samples and plays the strategy with the higher payoff sum…. The best fitting sample size turns out to be six for each of both samples. The name  “payoff-sampling equilibrium" refers to the sampling of own payoffs for each pure strategy.

Impulse balance  equilibrium

proposed  by Selten  is  based  on  learning direction  theory  (Selten  and  Joachim Buchta 1999)…   is  applicable  to the repeated choice of  the same parameter  in  learning situations  in which  the decision maker receives feedback, not only about the payoff for the choice taken, but also for the payoffs connected to alternative actions.
…. The decision maker is assumed to have a tendency to move in the direction of the impulse….

… that impulse learning is very different from reinforcement learning.

In reinforcement learning, the payoff obtained for a pure strategy played in the preceding period determines the increase of the probability for this strategy. … 完全取決於自己的報酬 (is  entirely based on observed own payoffs)

In  impulse  learning  it  is not  the payoff  in  the preceding period  that  is of crucial importance. It is the difference between what could have been obtained and what has been received, which moves the behavior in the direction of the higher payoff. … 取決於對手的策略和自己的報酬 (requires feedback on the other player’s choice and the knowledge of the player’s own payoff)

In  the  stationary distribution, expected upward impulses are equal to expected downward impulses. … losses are counted double in the computation of impulses as in prospect theory (Daniel Kahnemann and Amos Tversky 1979).

===五種 equilibrium 之差異===

The five concepts can be thought of as stationary states of dynamic learning models. Learning models differ with respect to their requirements on prior knowledge of the game and on feedback after each period.

reinforcement learning: Nash, quantal response, pay-off sampling equil. 屬之

one needs knowledge of one’s own payoff matrix, as well as feedback on the other player’s choice

==original Abstract:==

Five stationary concepts for completely mixed 2×2-games are experimentally compared: Nash equilibrium, quantal response equilibrium, action-sampling equilibrium, payoff-sampling equilibrium (Martin J. Osborne and Ariel Rubinstein 1998), and impulse balance equilibrium. Experiments on 12 games, 6 constant sum games, and 6 nonconstant sum games were run with 12 independent subject groups for each constant sum game and 6 independent subject groups for each nonconstant sum game. Each independent subject group consisted of four players 1 and four players 2, interacting anonymously over 200 periods with random matching. The comparison of the five theories shows that the order of performance from best to worst is as follows: impulse balance equilibrium, payoff-sampling equilibrium, action-sampling equilibrium, quantal response equilibrium, Nash equilibrium.