GANs — Generative Adversarial Networks



GANs — Generative Adversarial Networks

Explained Simply for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

  • Definition: GANs consist of two competing networks — a Generator (creates fake data) and a Discriminator (distinguishes real from fake).
  • Training: Adversarial — Generator tries to fool Discriminator; Discriminator tries to catch Generator. Both improve through competition.
  • Objective: Generator minimises log(1−D(G(z))); Discriminator maximises log(D(x)) + log(1−D(G(z))).
  • Applications: Image synthesis, data augmentation, style transfer, deepfakes, drug discovery.
  • Key challenges: Mode collapse, training instability, vanishing gradients.
  • Introduced by: Ian Goodfellow et al., 2014 — called “the most interesting idea in machine learning in the last 20 years” by Yann LeCun.

1. What is a GAN?

A Generative Adversarial Network (GAN) is a deep learning framework where two neural networks compete against each other to generate increasingly realistic synthetic data.

Introduced by Ian Goodfellow in 2014, GANs represented a fundamentally new paradigm in generative modelling. Before GANs, generating realistic images required handcrafted rules or approximate methods. GANs learn to generate data by having two networks play a game — and through competition, both become better.

Analogy — The Art Forger and the Detective

Imagine an art forger (Generator) who creates fake paintings and tries to pass them off as genuine. A detective (Discriminator) examines paintings and tries to determine which are genuine and which are fakes. As the detective gets better at catching fakes, the forger gets better at making convincing ones. As the forger improves, the detective must get sharper. Through this competition, the forger eventually becomes so skilled that even an expert cannot tell the difference. This is exactly how GAN training works.

2. Generator and Discriminator

The Generator (G)

The Generator takes a random noise vector z (sampled from a simple distribution like Gaussian or uniform) and transforms it into a synthetic data sample — an image, a sound, or text. Its goal is to generate outputs indistinguishable from real data.

G: z → x_fake (where z is random noise and x_fake is the generated sample)

The Generator never sees real data directly — it only receives feedback through the Discriminator’s judgements. It learns by trying to make the Discriminator classify its outputs as real.

The Discriminator (D)

The Discriminator takes a sample (either real from the training data or fake from the Generator) and outputs a probability between 0 and 1 — how likely the sample is to be real. D(x) → probability ∈ [0, 1], where 1 = real and 0 = fake.

The Discriminator is essentially a binary classifier, trained on real samples (label=1) and generated samples (label=0).

GeneratorDiscriminator
InputRandom noise vector zReal or fake sample x
OutputSynthetic sample x_fakeProbability P(real) ∈ [0,1]
GoalFool the DiscriminatorCatch the Generator’s fakes
Loss minimisedlog(1 − D(G(z)))Binary cross-entropy
Sees real data?NoYes

3. Adversarial Training Process

GAN training alternates between updating the Discriminator and the Generator:

  1. Sample real data: Draw a batch of real examples x from the training dataset.
  2. Generate fake data: Sample random noise vectors z, generate fakes: x_fake = G(z).
  3. Train Discriminator: Update D to maximise its ability to distinguish real from fake. D should output high values for real data and low values for fake.
  4. Train Generator: Freeze D, update G to minimise D’s ability to distinguish. G wants D(G(z)) to be close to 1 — fools D into thinking fakes are real.
  5. Repeat for many iterations until G produces convincing outputs.

The ideal equilibrium is a Nash equilibrium — G produces perfectly realistic data and D can do no better than random guessing (D(x) = 0.5 for all inputs). In practice, perfect equilibrium is rarely achieved, and training is inherently unstable.

4. The Minimax Objective

The GAN training objective is a minimax game:

min_G max_D V(D, G) = E[log D(x)] + E[log(1 − D(G(z)))]

  • The Discriminator maximises V — it wants log D(x) (real classified as real) to be high and log(1−D(G(z))) (fake classified as fake) to be high.
  • The Generator minimises V — it wants D(G(z)) to be close to 1 (fakes classified as real), making log(1−D(G(z))) very negative.

In practice, the Generator’s loss is often changed to maximise log D(G(z)) instead of minimising log(1−D(G(z))). This provides stronger gradients early in training when the Generator is weak and D easily rejects all fakes.

5. Training Challenges

Mode Collapse

The Generator finds a small set of outputs that consistently fool the Discriminator and produces only those — ignoring the full diversity of the real data distribution. Example: a GAN trained on handwritten digits produces only one or two digit types. Solutions: Wasserstein GAN (WGAN), minibatch discrimination, unrolled GANs.

Training Instability

GAN training is notoriously unstable — the Generator and Discriminator must improve at a balanced pace. If the Discriminator becomes too strong too quickly, its gradients provide no useful signal to the Generator (everything is classified as fake with near-certainty). If the Generator becomes too strong, the Discriminator never improves. Careful hyperparameter tuning, learning rate scheduling, and architectural choices are essential.

Vanishing Gradients

When the Discriminator is perfect (correctly classifies all samples), the Generator’s gradient becomes near-zero — it receives no learning signal. The original GAN paper’s solution: use the non-saturating loss for the Generator. A more robust solution: use Wasserstein distance (WGAN).

Evaluation Difficulty

Unlike classification (where accuracy is obvious), evaluating GAN quality is subjective. Common metrics: FID (Fréchet Inception Distance) — measures similarity between real and generated image distributions. IS (Inception Score) — measures quality and diversity. Human evaluation remains the gold standard.

6. GAN Variants

VariantKey InnovationBest For
DCGANConvolutional layers in G and D — stable trainingImage generation baseline
WGANWasserstein distance instead of JS divergence — no mode collapseStable training, any domain
Conditional GAN (cGAN)Condition G and D on class labels — generate specific classesClass-conditional generation
CycleGANUnpaired image-to-image translationStyle transfer, domain adaptation
StyleGAN2Style-based generator — state-of-the-art photo realismFace generation, artistic images
Pix2PixPaired image-to-image translationSketch→Photo, Map→Satellite

7. Applications

DomainApplication
Computer VisionPhoto-realistic image synthesis, face generation (This Person Does Not Exist)
Data AugmentationGenerate synthetic training data for rare classes or limited datasets
Medical ImagingGenerate synthetic MRI/CT scans to augment small medical datasets
Style TransferConvert photos to paintings, change artistic style, season transfer
Drug DiscoveryGenerate novel molecular structures with desired properties
Super ResolutionEnhance low-resolution images (SRGAN)
EngineeringGenerate new product designs, material structures, topology optimisation

8. Python Code — Simple GAN for MNIST


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Hyperparameters
LATENT_DIM = 100
IMG_SIZE = 28 * 28  # MNIST
BATCH_SIZE = 64
EPOCHS = 50
LR = 0.0002

# --- Generator ---
class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(LATENT_DIM, 256),
            nn.LeakyReLU(0.2),
            nn.BatchNorm1d(256),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.BatchNorm1d(512),
            nn.Linear(512, IMG_SIZE),
            nn.Tanh()  # Output in [-1, 1]
        )
    def forward(self, z):
        return self.model(z)

# --- Discriminator ---
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(IMG_SIZE, 512),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output: P(real)
        )
    def forward(self, x):
        return self.model(x)

# --- Setup ---
G = Generator()
D = Discriminator()
criterion = nn.BCELoss()
opt_G = optim.Adam(G.parameters(), lr=LR, betas=(0.5, 0.999))
opt_D = optim.Adam(D.parameters(), lr=LR, betas=(0.5, 0.999))

# Load MNIST
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])  # Normalise to [-1, 1]
])
dataloader = DataLoader(
    datasets.MNIST('./data', train=True, download=True, transform=transform),
    batch_size=BATCH_SIZE, shuffle=True
)

# --- Training Loop ---
for epoch in range(EPOCHS):
    for real_imgs, _ in dataloader:
        batch_size = real_imgs.size(0)
        real_imgs = real_imgs.view(batch_size, -1)  # Flatten

        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # === Train Discriminator ===
        opt_D.zero_grad()
        # Real images
        d_real = D(real_imgs)
        loss_real = criterion(d_real, real_labels)
        # Fake images
        z = torch.randn(batch_size, LATENT_DIM)
        fake_imgs = G(z).detach()  # detach: don't backprop through G
        d_fake = D(fake_imgs)
        loss_fake = criterion(d_fake, fake_labels)
        # Total D loss
        loss_D = (loss_real + loss_fake) / 2
        loss_D.backward()
        opt_D.step()

        # === Train Generator ===
        opt_G.zero_grad()
        z = torch.randn(batch_size, LATENT_DIM)
        fake_imgs = G(z)
        d_output = D(fake_imgs)
        # Generator wants D to think fakes are real
        loss_G = criterion(d_output, real_labels)
        loss_G.backward()
        opt_G.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{EPOCHS}] | D Loss: {loss_D.item():.4f} | G Loss: {loss_G.item():.4f}")
    

9. Frequently Asked Questions

Are GANs still relevant with diffusion models?

GANs dominated image generation from 2014–2021. Diffusion models (DALL-E, Stable Diffusion, Midjourney) now produce higher quality and more diverse outputs and have largely replaced GANs for photorealistic image generation. However, GANs are still widely used for: real-time generation (diffusion is slow), medical data augmentation, video generation, and tasks where training stability is manageable. GANs remain important to understand as a foundation of generative AI.

What is the difference between GANs and VAEs?

Both are generative models but work differently. VAEs (Variational Autoencoders) explicitly learn a probability distribution over the data and generate by sampling from it — outputs are blurrier but training is stable. GANs learn implicitly through adversarial training — outputs are sharper and more photorealistic, but training is unstable. GANs generally produce higher quality images; VAEs are easier to train and provide a structured latent space useful for interpolation and editing.

Next Steps