GANs — Generative Adversarial Networks
Explained Simply for Engineering Students
Last Updated: March 2026
📌 Key Takeaways
- Definition: GANs consist of two competing networks — a Generator (creates fake data) and a Discriminator (distinguishes real from fake).
- Training: Adversarial — Generator tries to fool Discriminator; Discriminator tries to catch Generator. Both improve through competition.
- Objective: Generator minimises log(1−D(G(z))); Discriminator maximises log(D(x)) + log(1−D(G(z))).
- Applications: Image synthesis, data augmentation, style transfer, deepfakes, drug discovery.
- Key challenges: Mode collapse, training instability, vanishing gradients.
- Introduced by: Ian Goodfellow et al., 2014 — called “the most interesting idea in machine learning in the last 20 years” by Yann LeCun.
1. What is a GAN?
A Generative Adversarial Network (GAN) is a deep learning framework where two neural networks compete against each other to generate increasingly realistic synthetic data.
Introduced by Ian Goodfellow in 2014, GANs represented a fundamentally new paradigm in generative modelling. Before GANs, generating realistic images required handcrafted rules or approximate methods. GANs learn to generate data by having two networks play a game — and through competition, both become better.
Analogy — The Art Forger and the Detective
Imagine an art forger (Generator) who creates fake paintings and tries to pass them off as genuine. A detective (Discriminator) examines paintings and tries to determine which are genuine and which are fakes. As the detective gets better at catching fakes, the forger gets better at making convincing ones. As the forger improves, the detective must get sharper. Through this competition, the forger eventually becomes so skilled that even an expert cannot tell the difference. This is exactly how GAN training works.
2. Generator and Discriminator
The Generator (G)
The Generator takes a random noise vector z (sampled from a simple distribution like Gaussian or uniform) and transforms it into a synthetic data sample — an image, a sound, or text. Its goal is to generate outputs indistinguishable from real data.
G: z → x_fake (where z is random noise and x_fake is the generated sample)
The Generator never sees real data directly — it only receives feedback through the Discriminator’s judgements. It learns by trying to make the Discriminator classify its outputs as real.
The Discriminator (D)
The Discriminator takes a sample (either real from the training data or fake from the Generator) and outputs a probability between 0 and 1 — how likely the sample is to be real. D(x) → probability ∈ [0, 1], where 1 = real and 0 = fake.
The Discriminator is essentially a binary classifier, trained on real samples (label=1) and generated samples (label=0).
| Generator | Discriminator | |
|---|---|---|
| Input | Random noise vector z | Real or fake sample x |
| Output | Synthetic sample x_fake | Probability P(real) ∈ [0,1] |
| Goal | Fool the Discriminator | Catch the Generator’s fakes |
| Loss minimised | log(1 − D(G(z))) | Binary cross-entropy |
| Sees real data? | No | Yes |
3. Adversarial Training Process
GAN training alternates between updating the Discriminator and the Generator:
- Sample real data: Draw a batch of real examples x from the training dataset.
- Generate fake data: Sample random noise vectors z, generate fakes: x_fake = G(z).
- Train Discriminator: Update D to maximise its ability to distinguish real from fake. D should output high values for real data and low values for fake.
- Train Generator: Freeze D, update G to minimise D’s ability to distinguish. G wants D(G(z)) to be close to 1 — fools D into thinking fakes are real.
- Repeat for many iterations until G produces convincing outputs.
The ideal equilibrium is a Nash equilibrium — G produces perfectly realistic data and D can do no better than random guessing (D(x) = 0.5 for all inputs). In practice, perfect equilibrium is rarely achieved, and training is inherently unstable.
4. The Minimax Objective
The GAN training objective is a minimax game:
min_G max_D V(D, G) = E[log D(x)] + E[log(1 − D(G(z)))]
- The Discriminator maximises V — it wants log D(x) (real classified as real) to be high and log(1−D(G(z))) (fake classified as fake) to be high.
- The Generator minimises V — it wants D(G(z)) to be close to 1 (fakes classified as real), making log(1−D(G(z))) very negative.
In practice, the Generator’s loss is often changed to maximise log D(G(z)) instead of minimising log(1−D(G(z))). This provides stronger gradients early in training when the Generator is weak and D easily rejects all fakes.
5. Training Challenges
Mode Collapse
The Generator finds a small set of outputs that consistently fool the Discriminator and produces only those — ignoring the full diversity of the real data distribution. Example: a GAN trained on handwritten digits produces only one or two digit types. Solutions: Wasserstein GAN (WGAN), minibatch discrimination, unrolled GANs.
Training Instability
GAN training is notoriously unstable — the Generator and Discriminator must improve at a balanced pace. If the Discriminator becomes too strong too quickly, its gradients provide no useful signal to the Generator (everything is classified as fake with near-certainty). If the Generator becomes too strong, the Discriminator never improves. Careful hyperparameter tuning, learning rate scheduling, and architectural choices are essential.
Vanishing Gradients
When the Discriminator is perfect (correctly classifies all samples), the Generator’s gradient becomes near-zero — it receives no learning signal. The original GAN paper’s solution: use the non-saturating loss for the Generator. A more robust solution: use Wasserstein distance (WGAN).
Evaluation Difficulty
Unlike classification (where accuracy is obvious), evaluating GAN quality is subjective. Common metrics: FID (Fréchet Inception Distance) — measures similarity between real and generated image distributions. IS (Inception Score) — measures quality and diversity. Human evaluation remains the gold standard.
6. GAN Variants
| Variant | Key Innovation | Best For |
|---|---|---|
| DCGAN | Convolutional layers in G and D — stable training | Image generation baseline |
| WGAN | Wasserstein distance instead of JS divergence — no mode collapse | Stable training, any domain |
| Conditional GAN (cGAN) | Condition G and D on class labels — generate specific classes | Class-conditional generation |
| CycleGAN | Unpaired image-to-image translation | Style transfer, domain adaptation |
| StyleGAN2 | Style-based generator — state-of-the-art photo realism | Face generation, artistic images |
| Pix2Pix | Paired image-to-image translation | Sketch→Photo, Map→Satellite |
7. Applications
| Domain | Application |
|---|---|
| Computer Vision | Photo-realistic image synthesis, face generation (This Person Does Not Exist) |
| Data Augmentation | Generate synthetic training data for rare classes or limited datasets |
| Medical Imaging | Generate synthetic MRI/CT scans to augment small medical datasets |
| Style Transfer | Convert photos to paintings, change artistic style, season transfer |
| Drug Discovery | Generate novel molecular structures with desired properties |
| Super Resolution | Enhance low-resolution images (SRGAN) |
| Engineering | Generate new product designs, material structures, topology optimisation |
8. Python Code — Simple GAN for MNIST
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Hyperparameters
LATENT_DIM = 100
IMG_SIZE = 28 * 28 # MNIST
BATCH_SIZE = 64
EPOCHS = 50
LR = 0.0002
# --- Generator ---
class Generator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(LATENT_DIM, 256),
nn.LeakyReLU(0.2),
nn.BatchNorm1d(256),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.BatchNorm1d(512),
nn.Linear(512, IMG_SIZE),
nn.Tanh() # Output in [-1, 1]
)
def forward(self, z):
return self.model(z)
# --- Discriminator ---
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(IMG_SIZE, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(256, 1),
nn.Sigmoid() # Output: P(real)
)
def forward(self, x):
return self.model(x)
# --- Setup ---
G = Generator()
D = Discriminator()
criterion = nn.BCELoss()
opt_G = optim.Adam(G.parameters(), lr=LR, betas=(0.5, 0.999))
opt_D = optim.Adam(D.parameters(), lr=LR, betas=(0.5, 0.999))
# Load MNIST
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]) # Normalise to [-1, 1]
])
dataloader = DataLoader(
datasets.MNIST('./data', train=True, download=True, transform=transform),
batch_size=BATCH_SIZE, shuffle=True
)
# --- Training Loop ---
for epoch in range(EPOCHS):
for real_imgs, _ in dataloader:
batch_size = real_imgs.size(0)
real_imgs = real_imgs.view(batch_size, -1) # Flatten
real_labels = torch.ones(batch_size, 1)
fake_labels = torch.zeros(batch_size, 1)
# === Train Discriminator ===
opt_D.zero_grad()
# Real images
d_real = D(real_imgs)
loss_real = criterion(d_real, real_labels)
# Fake images
z = torch.randn(batch_size, LATENT_DIM)
fake_imgs = G(z).detach() # detach: don't backprop through G
d_fake = D(fake_imgs)
loss_fake = criterion(d_fake, fake_labels)
# Total D loss
loss_D = (loss_real + loss_fake) / 2
loss_D.backward()
opt_D.step()
# === Train Generator ===
opt_G.zero_grad()
z = torch.randn(batch_size, LATENT_DIM)
fake_imgs = G(z)
d_output = D(fake_imgs)
# Generator wants D to think fakes are real
loss_G = criterion(d_output, real_labels)
loss_G.backward()
opt_G.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch [{epoch+1}/{EPOCHS}] | D Loss: {loss_D.item():.4f} | G Loss: {loss_G.item():.4f}")
9. Frequently Asked Questions
Are GANs still relevant with diffusion models?
GANs dominated image generation from 2014–2021. Diffusion models (DALL-E, Stable Diffusion, Midjourney) now produce higher quality and more diverse outputs and have largely replaced GANs for photorealistic image generation. However, GANs are still widely used for: real-time generation (diffusion is slow), medical data augmentation, video generation, and tasks where training stability is manageable. GANs remain important to understand as a foundation of generative AI.
What is the difference between GANs and VAEs?
Both are generative models but work differently. VAEs (Variational Autoencoders) explicitly learn a probability distribution over the data and generate by sampling from it — outputs are blurrier but training is stable. GANs learn implicitly through adversarial training — outputs are sharper and more photorealistic, but training is unstable. GANs generally produce higher quality images; VAEs are easier to train and provide a structured latent space useful for interpolation and editing.