Selten, Reinhard; Chmura, Thorsten (2008) “Stationary Concepts for Experimental 2×2-Games." American Economic Review, Volume 98, Number 3, June 2008 , pp. 938-966(29). DOI: http://dx.doi.org/10.1257/aer.98.3.938; zju.edu.cn 提供的 [PDF];
==notes by yinung==
此文之觀念, 適用於 長期 或 repeated 賽局之均衡解
mixed equilibrium 可視為是動態均衡解 (stationary concept)
Mixed equilibrium has several interpretations. One interpretation is that of a rational recommendation for a one-shot game. Another interpretation looks at mixed equilibrium as a result of evolutionary or learning processes in a situation of frequently repeated play with two populations of randomly matched opponents. One may speak of mixed equilibrium as a behavioral stationary concept.
這篇以 2×2 賽局實驗, 比較5種 stationary concept:
Quantal response equilibrium (Richard D. McKelvey and Thomas R. Palfrey 1995)
策略選擇比例是 exponential form
assumes that players give quantal best responses to the behavior of the others (see Section IB). In the exponential form of quantal response equilibrium considered here, the probabilities are proportional to an exponential with the expected payoff times a parameter in the exponent.
從觀察對手的 7 種 (神奇數字) 可能的策略, 來決定自己的最佳策略
…in a stationary situation, a player takes a sample of seven observations of the strategies played on the other side, and then optimizes against this sample.
Payoff-sampling equilibrium (Osborne and Rubinstein 1998)
…envisions a stationary situation in which a player takes two samples of equal size, one for each of her pure strategies. She then compares the sum of her payoffs in the two samples and plays the strategy with the higher payoff sum…. The best fitting sample size turns out to be six for each of both samples. The name “payoff-sampling equilibrium" refers to the sampling of own payoffs for each pure strategy.
Impulse balance equilibrium
和 prospect theory 觀念結合的均衡
proposed by Selten is based on learning direction theory (Selten and Joachim Buchta 1999)… is applicable to the repeated choice of the same parameter in learning situations in which the decision maker receives feedback, not only about the payoff for the choice taken, but also for the payoffs connected to alternative actions.
…. The decision maker is assumed to have a tendency to move in the direction of the impulse….
此觀念有別於 reinforcement learning
… that impulse learning is very different from reinforcement learning.
In reinforcement learning, the payoff obtained for a pure strategy played in the preceding period determines the increase of the probability for this strategy. … 完全取決於自己的報酬 (is entirely based on observed own payoffs)
In impulse learning it is not the payoff in the preceding period that is of crucial importance. It is the difference between what could have been obtained and what has been received, which moves the behavior in the direction of the higher payoff. … 取決於對手的策略和自己的報酬 (requires feedback on the other player’s choice and the knowledge of the player’s own payoff)
均衡成立時, 期望向上和期望向下 impulses 相同, losses 時 impulses 加倍計算
In the stationary distribution, expected upward impulses are equal to expected downward impulses. … losses are counted double in the computation of impulses as in prospect theory (Daniel Kahnemann and Amos Tversky 1979).
===五種 equilibrium 之差異===
The five concepts can be thought of as stationary states of dynamic learning models. Learning models differ with respect to their requirements on prior knowledge of the game and on feedback after each period.
reinforcement learning: Nash, quantal response, pay-off sampling equil. 屬之
其它兩種皆需要更多資訊: 自己的報酬 + 對手之選擇
one needs knowledge of one’s own payoff matrix, as well as feedback on the other player’s choice
Five stationary concepts for completely mixed 2×2-games are experimentally compared: Nash equilibrium, quantal response equilibrium, action-sampling equilibrium, payoff-sampling equilibrium (Martin J. Osborne and Ariel Rubinstein 1998), and impulse balance equilibrium. Experiments on 12 games, 6 constant sum games, and 6 nonconstant sum games were run with 12 independent subject groups for each constant sum game and 6 independent subject groups for each nonconstant sum game. Each independent subject group consisted of four players 1 and four players 2, interacting anonymously over 200 periods with random matching. The comparison of the five theories shows that the order of performance from best to worst is as follows: impulse balance equilibrium, payoff-sampling equilibrium, action-sampling equilibrium, quantal response equilibrium, Nash equilibrium.