Types of Nash Equilibria

Generative Adversarial Networks recently became fairly popular in a range of fields as they had success in a variety of difficult tasks.  They allowed the synthesis of naturally looking images, they automatically generate textual labels for given images and they manage to generate chemical compounds with desired properties and many more things. However, the problem whether their training process could be considered as finished (i.e. the system converges) was addressed only recently in a Paper by Heusel et al.

A GAN is a system where two different neural networks play a game against each other. The first tries to generate data, while the second network judges this generated data. The latter ones job is to distinguish whether a given datum was generated by the former network or whether it came from a ground truth distribution (i.e. training data-set). The goal of the first network is to generate data in a way which makes it hard for the second network to do its job.  At the beginning of the game both networks (i.e. both Players) do not have knowledge about the desired data distribution nor their opponents. During training the first network learns to generate data similar to the given distribution as this makes the job for the opponent harder. And the second network gets better at discriminating between real- and fake-data. Both players increase their score by adapting their respective strategy in a systematic manner.

However, after a while of playing and learning both players might find themselves in a situation where it doesn’t make sense any more to make strategy adjustments without losing ground against their opponent. They reached a Nash-Equilibrium (NE).

In a more abstract setting you could picture such a situation like in the image below. The first player tries to minimize the reward while the other player wants to maximize it.  If they end up in one of the two marked spots none of them has an incentive to change its strategy as both would get a lower score after the opponent gets a chance to adapts his strategy.

No Jumps allowed.

However, neural networks typically are trained by gradient descent and hence both players can only slowly adjust their strategy. In fact, both players follow (almost) continuous trajectories in this combined strategy space. This restriction turns out to be a major burden for both players as they are easily captured in so called local Nash-Equilibria. Although they are not a problem in the above sample min-max-game, they are in similar max-game. In the max-game both players need to work together in order to maximize the reward. 

If the combined strategy happens to be the marked left point, both players got trapped in a sub-optimal NE. Although, Player B has reached the optimal strategy, Player A could choose the optimal strategy on the right side from which both would benefit. However, due to the restriction that only local changes are allowed the system is trapped and doomed to be sub-optimal. There is simply no trajectory out of the left local minima, from which both players would benefit directly.

The problem is even more severe, because there infinitely many local NE in the max-game which are even worse then the previously described ones.

Additionally, they form a barrier which prevents the players from reaching the top right and bottom left parts of the strategy-space. Hence, generator and discriminator seems to get sucked into these sub-optimal solutions.

Two player games are hard.