神刀安全网

Adversarial Preference Loss

In myearlier post I talked about how the GAN training procedure can be modified slightly so it minimises $KL[Q/|P]$ divergence. I have also talked about how GANs can be understood as

  • in the D-step , performing direct log-probability-ratio estimation from data, obtaining an estimate of $/log/frac{Q(x)}{P(x)}$ then
  • in the M-step using these estimates to minimise a divergence

From this perspective, training a binary classifier discriminatively is only one way to obtain estimates of the log-probability ratios in question, but I mentioned there are other ways.

This post is basically just having some fun with this idea: I show a different iterative algorithm that has similar flavour. Instead of using a binary classifier, I’m going to use binary pairwise preference learning as the adversarial problem.

Adversarial preference training

As before, we have a generator $G$ and a discriminator $D$ that try to "fight against each other". However, the discriminator now solves a different task: instead of binary classification it performs a two-alternative forced choice (2AFC) task.

2AFC Discriminator

The preference network $D$ receives two inputs, $x_1$ and $x_2$, one of which is a real datapont sampled from $P$, the other one is a synthetic one generated from $Q$ (using the generator $G$). $D$ has to guess which one of its inputs is real. Let’s assume that $D$ takes the following symmetric and separable form:

$$ D(x_1,x_2;/psi) = /Phi[s(x_1;/psi) – s(x_2;/psi)],$$

where $/Phi$ is the logistic sigmoid, and $s$ is a trainable function (e.g. deep neural network) which I will call the preference function.

The parameters $/psi$ of $D$ are updated using the following objective:

$$ /mathcal{L}_{D}(/psi) = -/mathbb{E}_{x_1/sim P}/mathbb{E}_{x_2/sim Q} /log(D(x_1,x_2;/psi)) $$

This is similar to a standard binary classification loss, but not quite the same (if you don’t understand the objective, note that the symmetry $D(x_1,x_2) = 1 – D(x_2,x_1)$ and try to derive it yourself from the binary classification loss)

Bayes optimal behaviour

Let’s consider what happens if we fix $Q$, and optimise $D$ until convergence. At this point we can assume that $D$ approximately implements the Bayes optimal behaviour, which in this case will be the following:

$$ /Phi[s(x_1) – s(x_2)] /approx /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)} $$

Working backwards from here, we get an expression for the preference function:

/begin{align} /Phi[s(x_1) – s(x_2)] &/approx /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)} // s(x_1) – s(x_2) &/approx /Phi^{-1}/left[/frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}/right]//

s(x_1) – s(x_2) &/approx /log/left[/frac{/frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}}{1 – /frac{P(x_1)Q(x_2)}{P(x_1)Q(x_2) + P(x_2)Q(x_1)}} /right]//

s(x_1) – s(x_2) &/approx /log/left[/frac{P(x_1)Q(x_2)}{P(x_2)Q(x_1)} /right]//

s(x_1) – s(x_2) &/approx /log/frac{P(x_1)}{Q(x_1)} – /log/frac{P(x_2)}{Q(x_2)},

/end{align}

Therefore, we can say that when $D$ converged, the preference function $s$ approximates the log-probability ratio between $P$ vs $Q$:

$$ s(x) /approx /log/frac{P(x)}{Q(x)}$$

and therefore the loss for the generator becomes

$$ /mathcal{L}_{G}(/theta) /approx – /mathbb{E}_{x/sim Q_{/theta}} /log/frac{P(x)}{Q(x)} = KL[Q/|P]. $$

Is this useful?

I don’t know, maybe, most likely not. But that’s besides the point I’m trying to make. I just wanted to highlight the arbitrariness of using binary classification in the GAN procedure. One can imagine a whole class of generative modelling procedures based on the same underlying idea of direct probability ratio estimation, and it is unclear (to me) which one method will work best.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Adversarial Preference Loss

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮