What is a GAN (Generative Adversarial Network)?

A GAN (Generative Adversarial Network) is a deep learning framework introduced by Ian Goodfellow in 2014. It consists of two neural networks — a Generator and a Discriminator — trained simultaneously in an adversarial game. The Generator learns to produce realistic fake data (images, audio, text), while the Discriminator learns to distinguish real data from generated data. Through this competition, the Generator gets progressively better at creating convincing outputs.

What is mode collapse in GANs?

Mode collapse is a common GAN training failure where the Generator learns to produce only a limited variety of outputs — perhaps only one or a few types — because they consistently fool the Discriminator. Instead of learning the full diversity of the real data distribution, the Generator collapses to producing the same output repeatedly. Solutions include using Wasserstein GAN (WGAN), minibatch discrimination, or spectral normalisation.

GANs — Generative Adversarial Networks

Explained Simply for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

Definition: GANs consist of two competing networks — a Generator (creates fake data) and a Discriminator (distinguishes real from fake).
Training: Adversarial — Generator tries to fool Discriminator; Discriminator tries to catch Generator. Both improve through competition.
Objective: Generator minimises log(1−D(G(z))); Discriminator maximises log(D(x)) + log(1−D(G(z))).
Applications: Image synthesis, data augmentation, style transfer, deepfakes, drug discovery.
Key challenges: Mode collapse, training instability, vanishing gradients.
Introduced by: Ian Goodfellow et al., 2014 — called “the most interesting idea in machine learning in the last 20 years” by Yann LeCun.

1. What is a GAN?

A Generative Adversarial Network (GAN) is a deep learning framework where two neural networks compete against each other to generate increasingly realistic synthetic data.

Introduced by Ian Goodfellow in 2014, GANs represented a fundamentally new paradigm in generative modelling. Before GANs, generating realistic images required handcrafted rules or approximate methods. GANs learn to generate data by having two networks play a game — and through competition, both become better.

Analogy — The Art Forger and the Detective

Imagine an art forger (Generator) who creates fake paintings and tries to pass them off as genuine. A detective (Discriminator) examines paintings and tries to determine which are genuine and which are fakes. As the detective gets better at catching fakes, the forger gets better at making convincing ones. As the forger improves, the detective must get sharper. Through this competition, the forger eventually becomes so skilled that even an expert cannot tell the difference. This is exactly how GAN training works.

2. Generator and Discriminator

The Generator (G)

The Generator takes a random noise vector z (sampled from a simple distribution like Gaussian or uniform) and transforms it into a synthetic data sample — an image, a sound, or text. Its goal is to generate outputs indistinguishable from real data.

G: z → x_fake (where z is random noise and x_fake is the generated sample)

The Generator never sees real data directly — it only receives feedback through the Discriminator’s judgements. It learns by trying to make the Discriminator classify its outputs as real.

The Discriminator (D)

The Discriminator takes a sample (either real from the training data or fake from the Generator) and outputs a probability between 0 and 1 — how likely the sample is to be real. D(x) → probability ∈ [0, 1], where 1 = real and 0 = fake.

The Discriminator is essentially a binary classifier, trained on real samples (label=1) and generated samples (label=0).

	Generator	Discriminator
Input	Random noise vector z	Real or fake sample x
Output	Synthetic sample x_fake	Probability P(real) ∈ [0,1]
Goal	Fool the Discriminator	Catch the Generator’s fakes
Loss minimised	log(1 − D(G(z)))	Binary cross-entropy
Sees real data?	No	Yes

3. Adversarial Training Process

GAN training alternates between updating the Discriminator and the Generator:

Sample real data: Draw a batch of real examples x from the training dataset.
Generate fake data: Sample random noise vectors z, generate fakes: x_fake = G(z).
Train Discriminator: Update D to maximise its ability to distinguish real from fake. D should output high values for real data and low values for fake.
Train Generator: Freeze D, update G to minimise D’s ability to distinguish. G wants D(G(z)) to be close to 1 — fools D into thinking fakes are real.
Repeat for many iterations until G produces convincing outputs.

The ideal equilibrium is a Nash equilibrium — G produces perfectly realistic data and D can do no better than random guessing (D(x) = 0.5 for all inputs). In practice, perfect equilibrium is rarely achieved, and training is inherently unstable.

4. The Minimax Objective

The GAN training objective is a minimax game:

min_G max_D V(D, G) = E[log D(x)] + E[log(1 − D(G(z)))]

The Discriminator maximises V — it wants log D(x) (real classified as real) to be high and log(1−D(G(z))) (fake classified as fake) to be high.
The Generator minimises V — it wants D(G(z)) to be close to 1 (fakes classified as real), making log(1−D(G(z))) very negative.

In practice, the Generator’s loss is often changed to maximise log D(G(z)) instead of minimising log(1−D(G(z))). This provides stronger gradients early in training when the Generator is weak and D easily rejects all fakes.

5. Training Challenges

Mode Collapse

The Generator finds a small set of outputs that consistently fool the Discriminator and produces only those — ignoring the full diversity of the real data distribution. Example: a GAN trained on handwritten digits produces only one or two digit types. Solutions: Wasserstein GAN (WGAN), minibatch discrimination, unrolled GANs.

Training Instability

GAN training is notoriously unstable — the Generator and Discriminator must improve at a balanced pace. If the Discriminator becomes too strong too quickly, its gradients provide no useful signal to the Generator (everything is classified as fake with near-certainty). If the Generator becomes too strong, the Discriminator never improves. Careful hyperparameter tuning, learning rate scheduling, and architectural choices are essential.

Vanishing Gradients

When the Discriminator is perfect (correctly classifies all samples), the Generator’s gradient becomes near-zero — it receives no learning signal. The original GAN paper’s solution: use the non-saturating loss for the Generator. A more robust solution: use Wasserstein distance (WGAN).

Evaluation Difficulty

Unlike classification (where accuracy is obvious), evaluating GAN quality is subjective. Common metrics: FID (Fréchet Inception Distance) — measures similarity between real and generated image distributions. IS (Inception Score) — measures quality and diversity. Human evaluation remains the gold standard.

6. GAN Variants

Variant	Key Innovation	Best For
DCGAN	Convolutional layers in G and D — stable training	Image generation baseline
WGAN	Wasserstein distance instead of JS divergence — no mode collapse	Stable training, any domain
Conditional GAN (cGAN)	Condition G and D on class labels — generate specific classes	Class-conditional generation
CycleGAN	Unpaired image-to-image translation	Style transfer, domain adaptation
StyleGAN2	Style-based generator — state-of-the-art photo realism	Face generation, artistic images
Pix2Pix	Paired image-to-image translation	Sketch→Photo, Map→Satellite

7. Applications

Domain	Application
Computer Vision	Photo-realistic image synthesis, face generation (This Person Does Not Exist)
Data Augmentation	Generate synthetic training data for rare classes or limited datasets
Medical Imaging	Generate synthetic MRI/CT scans to augment small medical datasets
Style Transfer	Convert photos to paintings, change artistic style, season transfer
Drug Discovery	Generate novel molecular structures with desired properties
Super Resolution	Enhance low-resolution images (SRGAN)
Engineering	Generate new product designs, material structures, topology optimisation

8. Python Code — Simple GAN for MNIST


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Hyperparameters
LATENT_DIM = 100
IMG_SIZE = 28 * 28  # MNIST
BATCH_SIZE = 64
EPOCHS = 50
LR = 0.0002

# --- Generator ---
class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(LATENT_DIM, 256),
            nn.LeakyReLU(0.2),
            nn.BatchNorm1d(256),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.BatchNorm1d(512),
            nn.Linear(512, IMG_SIZE),
            nn.Tanh()  # Output in [-1, 1]
        )
    def forward(self, z):
        return self.model(z)

# --- Discriminator ---
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(IMG_SIZE, 512),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output: P(real)
        )
    def forward(self, x):
        return self.model(x)

# --- Setup ---
G = Generator()
D = Discriminator()
criterion = nn.BCELoss()
opt_G = optim.Adam(G.parameters(), lr=LR, betas=(0.5, 0.999))
opt_D = optim.Adam(D.parameters(), lr=LR, betas=(0.5, 0.999))

# Load MNIST
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])  # Normalise to [-1, 1]
])
dataloader = DataLoader(
    datasets.MNIST('./data', train=True, download=True, transform=transform),
    batch_size=BATCH_SIZE, shuffle=True
)

# --- Training Loop ---
for epoch in range(EPOCHS):
    for real_imgs, _ in dataloader:
        batch_size = real_imgs.size(0)
        real_imgs = real_imgs.view(batch_size, -1)  # Flatten

        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # === Train Discriminator ===
        opt_D.zero_grad()
        # Real images
        d_real = D(real_imgs)
        loss_real = criterion(d_real, real_labels)
        # Fake images
        z = torch.randn(batch_size, LATENT_DIM)
        fake_imgs = G(z).detach()  # detach: don't backprop through G
        d_fake = D(fake_imgs)
        loss_fake = criterion(d_fake, fake_labels)
        # Total D loss
        loss_D = (loss_real + loss_fake) / 2
        loss_D.backward()
        opt_D.step()

        # === Train Generator ===
        opt_G.zero_grad()
        z = torch.randn(batch_size, LATENT_DIM)
        fake_imgs = G(z)
        d_output = D(fake_imgs)
        # Generator wants D to think fakes are real
        loss_G = criterion(d_output, real_labels)
        loss_G.backward()
        opt_G.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{EPOCHS}] | D Loss: {loss_D.item():.4f} | G Loss: {loss_G.item():.4f}")

9. Frequently Asked Questions

Are GANs still relevant with diffusion models?

GANs dominated image generation from 2014–2021. Diffusion models (DALL-E, Stable Diffusion, Midjourney) now produce higher quality and more diverse outputs and have largely replaced GANs for photorealistic image generation. However, GANs are still widely used for: real-time generation (diffusion is slow), medical data augmentation, video generation, and tasks where training stability is manageable. GANs remain important to understand as a foundation of generative AI.

What is the difference between GANs and VAEs?

Both are generative models but work differently. VAEs (Variational Autoencoders) explicitly learn a probability distribution over the data and generate by sampling from it — outputs are blurrier but training is stable. GANs learn implicitly through adversarial training — outputs are sharper and more photorealistic, but training is unstable. GANs generally produce higher quality images; VAEs are easier to train and provide a structured latent space useful for interpolation and editing.

GANs — Generative Adversarial Networks

GANs — Generative Adversarial Networks

📌 Key Takeaways

1. What is a GAN?

Analogy — The Art Forger and the Detective

2. Generator and Discriminator

The Generator (G)

The Discriminator (D)

3. Adversarial Training Process

4. The Minimax Objective

5. Training Challenges

Mode Collapse

Training Instability

Vanishing Gradients

Evaluation Difficulty

6. GAN Variants

7. Applications

8. Python Code — Simple GAN for MNIST

9. Frequently Asked Questions

Are GANs still relevant with diffusion models?

What is the difference between GANs and VAEs?

Next Steps

Next Steps

Leave a Comment