In myearlier post I talked about how the GAN training procedure can be modified slightly so it minimises \$KL[Q/|P]\$ divergence. I have also talked about how GANs can be understood as

• in the D-step , performing direct log-probability-ratio estimation from data, obtaining an estimate of \$/log/frac{Q(x)}{P(x)}\$ then
• in the M-step using these estimates to minimise a divergence

From this perspective, training a binary classifier discriminatively is only one way to obtain estimates of the log-probability ratios in question, but I mentioned there are other ways.

This post is basically just having some fun with this idea: I show a different iterative algorithm that has similar flavour. Instead of using a binary classifier, I’m going to use binary pairwise preference learning as the adversarial problem.

As before, we have a generator \$G\$ and a discriminator \$D\$ that try to "fight against each other". However, the discriminator now solves a different task: instead of binary classification it performs a two-alternative forced choice (2AFC) task.

2AFC Discriminator

The preference network \$D\$ receives two inputs, \$x_1\$ and \$x_2\$, one of which is a real datapont sampled from \$P\$, the other one is a synthetic one generated from \$Q\$ (using the generator \$G\$). \$D\$ has to guess which one of its inputs is real. Let’s assume that \$D\$ takes the following symmetric and separable form:

\$\$ D(x_1,x_2;/psi) = /Phi[s(x_1;/psi) – s(x_2;/psi)],\$\$

where \$/Phi\$ is the logistic sigmoid, and \$s\$ is a trainable function (e.g. deep neural network) which I will call the preference function.

The parameters \$/psi\$ of \$D\$ are updated using the following objective:

\$\$ /mathcal{L}_{D}(/psi) = -/mathbb{E}_{x_1/sim P}/mathbb{E}_{x_2/sim Q} /log(D(x_1,x_2;/psi)) \$\$

This is similar to a standard binary classification loss, but not quite the same (if you don’t understand the objective, note that the symmetry \$D(x_1,x_2) = 1 – D(x_2,x_1)\$ and try to derive it yourself from the binary classification loss)

Bayes optimal behaviour

Let’s consider what happens if we fix \$Q\$, and optimise \$D\$ until convergence. At this point we can assume that \$D\$ approximately implements the Bayes optimal behaviour, which in this case will be the following:

\$\$ /Phi[s(x_1) – s(x_2)] /approx /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)} \$\$

Working backwards from here, we get an expression for the preference function:

/begin{align} /Phi[s(x_1) – s(x_2)] &/approx /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)} // s(x_1) – s(x_2) &/approx /Phi^{-1}/left[/frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}/right]//

s(x_1) – s(x_2) &/approx /log/left[/frac{/frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}}{1 – /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}} /right]//

s(x_1) – s(x_2) &/approx /log/left[/frac{P(x_1)Q(x_2)}{P(x_2)Q(x_1)} /right]//

s(x_1) – s(x_2) &/approx /log/frac{P(x_1)}{Q(x_1)} – /log/frac{P(x_2)}{Q(x_2)},

/end{align}

Therefore, we can say that when \$D\$ converged, the preference function \$s\$ approximates the log-probability ratio between \$P\$ vs \$Q\$:

\$\$ s(x) /approx /log/frac{P(x)}{Q(x)}\$\$

and therefore the loss for the generator becomes

\$\$ /mathcal{L}_{G}(/theta) /approx – /mathbb{E}_{x/sim Q_{/theta}} /log/frac{P(x)}{Q(x)} = KL[Q/|P]. \$\$

Is this useful?

I don’t know, maybe, most likely not. But that’s besides the point I’m trying to make. I just wanted to highlight the arbitrariness of using binary classification in the GAN procedure. One can imagine a whole class of generative modelling procedures based on the same underlying idea of direct probability ratio estimation, and it is unclear (to me) which one method will work best.