Probability — Conditional Probability, Bayes’ Theorem & Distributions

Q: What is Bayes' theorem and how is it applied?

Bayes' theorem: P(A|B) = P(B|A)×P(A) / P(B). It updates the probability of hypothesis A given evidence B. Using the law of total probability: P(B) = Σ P(B|Aᵢ)P(Aᵢ) over all mutually exclusive and exhaustive events Aᵢ. Application: given a medical test with 99% sensitivity and 95% specificity, if disease prevalence is 1%, what is the probability that a positive test is a true positive? P(disease|positive) = P(pos|disease)×P(disease)/P(positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 16.7% — surprisingly low due to low prevalence.

Q: What is the difference between mutually exclusive and independent events?

Mutually exclusive (disjoint) events cannot both occur: P(A∩B)=0, so P(A∪B)=P(A)+P(B). Independent events do not influence each other: P(A∩B)=P(A)×P(B). These are different! If A and B are mutually exclusive and both have positive probability, they CANNOT be independent (because P(A∩B)=0 ≠ P(A)×P(B)>0). Two events can be: neither exclusive nor independent, exclusive but not independent, independent but not exclusive, or both only if P(A)=0 or P(B)=0.

Q: What is the expected value and variance of a Binomial distribution?

For X ~ Binomial(n, p): E[X] = np (expected number of successes), Var(X) = np(1−p), SD(X) = √(np(1−p)). The Binomial models the number of successes in n independent Bernoulli trials with success probability p each. P(X=k) = C(n,k)×p^k×(1−p)^(n−k).

Q: What is the Poisson distribution and when is it used?

The Poisson distribution models the number of events occurring in a fixed interval of time or space, given a known average rate λ. P(X=k) = e^(-λ)×λ^k/k!. E[X] = λ, Var(X) = λ. It is the limit of Binomial(n,p) as n→∞, p→0 with np=λ fixed. Used to model: server requests per second, typos per page, radioactive decay events, etc. In CS: Poisson process is the foundation of queuing theory (M/M/1 queues).

Complete Guide for GATE CS — Sample Spaces, Conditional Probability, Independence, Random Variables, Expectation, Variance & Standard Distributions

Last Updated: April 2026

📌 Key Takeaways

Conditional Probability: P(A|B) = P(A∩B) / P(B). The probability of A given B has occurred. Always defined when P(B) > 0.
Bayes’ Theorem: P(A|B) = P(B|A)×P(A) / P(B). Reverses conditional probabilities. Used with total probability: P(B) = ΣP(B|Aᵢ)P(Aᵢ).
Independence: A and B are independent iff P(A∩B) = P(A)×P(B). Mutually exclusive ≠ independent (when both have positive probability).
Random Variable: A function from sample space to ℝ. Discrete: probability mass function (PMF). Continuous: probability density function (PDF).
Expectation: E[X] = ΣxP(X=x). Linearity: E[aX+b] = aE[X]+b. E[X+Y] = E[X]+E[Y] always.
Variance: Var(X) = E[X²] − (E[X])². For independent X,Y: Var(X+Y) = Var(X)+Var(Y).
Key distributions: Binomial (n trials, success p), Geometric (trials until first success), Poisson (events in fixed interval), Normal (bell curve). Binomial→Poisson as n→∞, p→0.

1. Sample Spaces, Events & Axioms

A probability space consists of: sample space Ω (set of all outcomes), event space F (subsets of Ω), and probability measure P.

Kolmogorov’s Axioms

1. P(A) ≥ 0 for all events A

2. P(Ω) = 1 (total probability)

3. If A and B are mutually exclusive: P(A ∪ B) = P(A) + P(B)

Derived rules:

P(A’) = 1 − P(A) P(∅) = 0

P(A ∪ B) = P(A) + P(B) − P(A ∩ B) (inclusion-exclusion)

If A ⊆ B then P(A) ≤ P(B)

1.1 Classical Probability (equally likely outcomes)

P(A) = |A| / |Ω| — number of favourable outcomes / total outcomes

2. Conditional Probability & Bayes’ Theorem

Conditional Probability

P(A | B) = P(A ∩ B) / P(B) (defined when P(B) > 0)

Multiplication rule: P(A ∩ B) = P(A|B) × P(B) = P(B|A) × P(A)

Chain rule (n events): P(A₁∩A₂∩…∩Aₙ) = P(A₁)×P(A₂|A₁)×P(A₃|A₁A₂)×…×P(Aₙ|A₁…Aₙ₋₁)

Law of Total Probability

If {B₁, B₂, …, Bₙ} is a partition of Ω (mutually exclusive, exhaustive):

P(A) = Σ P(A | Bᵢ) × P(Bᵢ)

Bayes’ Theorem

P(Bᵢ | A) = P(A | Bᵢ) × P(Bᵢ) / Σⱼ P(A | Bⱼ) × P(Bⱼ)

P(Bᵢ) = prior probability; P(Bᵢ|A) = posterior probability; P(A|Bᵢ) = likelihood

3. Independence

Independence of Events

A and B are independent iff: P(A ∩ B) = P(A) × P(B)

Equivalently: P(A|B) = P(A) (knowing B doesn’t change probability of A)

Pairwise vs Mutual Independence (n events):

Pairwise: P(Aᵢ∩Aⱼ)=P(Aᵢ)P(Aⱼ) for all i≠j (weaker)

Mutually independent: P(∩ subset) = ∏ P(Aᵢ) for EVERY subset (stronger — required for full independence)

Mutually exclusive (P(A∩B)=0) ≠ Independent — if both have positive probability, exclusive means knowing B occurred guarantees A didn’t → they ARE dependent.

4. Random Variables, PMF & CDF

A random variable X is a function X: Ω → ℝ that assigns a real number to each outcome.

	Discrete RV	Continuous RV
Described by	PMF: P(X=x) = p(x)	PDF: f(x) where P(a≤X≤b)=∫f(x)dx
CDF	F(x) = P(X≤x) = Σ p(k) for k≤x	F(x) = ∫₋∞ˣ f(t)dt
Constraint	Σ p(x) = 1	∫₋∞^∞ f(x)dx = 1

5. Expectation & Variance

Expectation (Mean)

E[X] = Σ x × P(X=x) (discrete)

E[g(X)] = Σ g(x) × P(X=x) (law of unconscious statistician)

Linearity: E[aX+b] = aE[X]+b E[X+Y] = E[X]+E[Y] (always)

If X,Y independent: E[XY] = E[X]×E[Y]

Variance & Standard Deviation

Var(X) = E[X²] − (E[X])² (computational formula)

Var(X) = E[(X−μ)²] where μ = E[X] (definition)

Var(aX+b) = a²Var(X)

If X,Y independent: Var(X+Y) = Var(X)+Var(Y)

SD(X) = √Var(X)

6. Standard Distributions

Distribution	PMF / PDF	E[X]	Var(X)	Use
Bernoulli(p)	P(X=1)=p, P(X=0)=1−p	p	p(1−p)	Single trial success/failure
Binomial(n,p)	C(n,k)p^k(1−p)^(n−k)	np	np(1−p)	k successes in n trials
Geometric(p)	(1−p)^(k−1)p, k=1,2,…	1/p	(1−p)/p²	Trials until first success
Poisson(λ)	e^(−λ)λ^k/k!, k=0,1,…	λ	λ	Events in fixed interval
Uniform(a,b)	1/(b−a) for x∈[a,b]	(a+b)/2	(b−a)²/12	Equal likelihood over range
Normal(μ,σ²)	(1/σ√2π)e^(−(x−μ)²/2σ²)	μ	σ²	Central Limit Theorem, natural phenomena
Exponential(λ)	λe^(−λx), x≥0	1/λ	1/λ²	Time between Poisson events

Important Relationships

Binomial → Poisson: Bin(n,p) → Poisson(λ=np) as n→∞, p→0 with np fixed

Geometric memoryless property: P(X>m+n | X>m) = P(X>n) — past failures don’t affect future

Exponential memoryless property: P(X>s+t | X>s) = P(X>t)

Central Limit Theorem: For large n, sum of n independent RVs ≈ Normal, regardless of individual distributions

7. Worked Examples (GATE CS Level)

Example 1 — Conditional Probability & Bayes

Problem: A factory has 3 machines producing bolts. Machine A produces 50% of bolts with 2% defective; Machine B produces 30% with 3% defective; Machine C produces 20% with 5% defective. A randomly selected bolt is defective. What is the probability it came from Machine B?

Let D = defective, A, B, C = machine.

P(A)=0.5, P(B)=0.3, P(C)=0.2; P(D|A)=0.02, P(D|B)=0.03, P(D|C)=0.05

Total probability of defective:

P(D) = P(D|A)P(A)+P(D|B)P(B)+P(D|C)P(C) = 0.02×0.5+0.03×0.3+0.05×0.2

= 0.010 + 0.009 + 0.010 = 0.029

P(B|D) = P(D|B)×P(B) / P(D) = 0.009 / 0.029 ≈ 0.310 (31.0%)

Example 2 — Random Variables & Expectation

Problem: A fair die is rolled. Let X = number shown if it’s odd, and X = 0 if it’s even. (a) Find the PMF of X. (b) Find E[X] and Var(X).

Outcomes: 1,2,3,4,5,6 equally likely (P=1/6 each).

X=0 if outcome ∈ {2,4,6} (prob 3/6=1/2); X=1 if outcome=1 (prob 1/6); X=3 if outcome=3 (prob 1/6); X=5 if outcome=5 (prob 1/6)

PMF:

x	0	1	3	5
P(X=x)	1/2	1/6	1/6	1/6

E[X]:

E[X] = 0×(1/2) + 1×(1/6) + 3×(1/6) + 5×(1/6) = (0+1+3+5)/6 = 9/6 = 3/2 = 1.5

E[X²]: 0²×(1/2) + 1²×(1/6) + 3²×(1/6) + 5²×(1/6) = (0+1+9+25)/6 = 35/6

Var(X) = E[X²]−(E[X])² = 35/6 − (3/2)² = 35/6 − 9/4 = 70/12 − 27/12 = 43/12 ≈ 3.58

Example 3 — Distributions & Variance

Problem: (a) A biased coin with P(H)=0.3 is flipped 10 times. Find the probability of exactly 3 heads and the expected number of heads. (b) A server receives an average of 4 requests per second. What is the probability of receiving exactly 6 requests in a second? (c) In a geometric distribution with p=0.25, what is E[X] and what is the probability that X ≥ 5?

(a) Binomial: X~Bin(10, 0.3), P(X=3) and E[X]:

P(X=3) = C(10,3)×0.3³×0.7⁷ = 120×0.027×0.0823543 ≈ 120×0.002224 ≈ 0.2668 (26.7%)

E[X] = np = 10×0.3 = 3 Var(X) = np(1−p) = 10×0.3×0.7 = 2.1

(b) Poisson: X~Poisson(λ=4), P(X=6):

P(X=6) = e⁻⁴×4⁶/6! = e⁻⁴×4096/720 ≈ 0.01832×5.689 ≈ 0.1042 (10.4%)

E[X] = Var(X) = λ = 4

(c) Geometric: X~Geometric(p=0.25), E[X] and P(X≥5):

E[X] = 1/p = 1/0.25 = 4 (expected 4 trials until first success)

P(X≥5) = P(first 4 trials all fail) = (1−p)⁴ = (0.75)⁴ = 0.3164 ≈ 31.6%

(Memoryless property: P(X≥k) = (1−p)^(k−1) = q^(k−1))

8. 5 Common Mistakes in Probability

Mistake 1 — Confusing Mutually Exclusive with Independent

What happens: Students say “since A and B are mutually exclusive, they are independent.”

Root cause: If A and B are mutually exclusive (P(A∩B)=0) and both have positive probability, then P(A∩B)=0 ≠ P(A)×P(B)>0 — so they are actually DEPENDENT.

Correct approach: Mutually exclusive means “can’t happen together.” Independent means “one doesn’t affect the other.” These are opposite concepts when both events are non-trivial.

Mistake 2 — Forgetting to Condition on the Right Event in Bayes

What happens: Students compute P(A|B) but confuse it with P(B|A), leading to the “prosecutor’s fallacy.”

Root cause: P(A|B) and P(B|A) are generally different. P(disease|positive test) ≠ P(positive test|disease).

Correct approach: Always explicitly identify the hypothesis and the evidence. Prior × Likelihood / Total probability = Posterior.

Mistake 3 — Using E[X²] = (E[X])²

What happens: Students write Var(X) = E[X²] − (E[X])² = 0, thinking E[X²]=(E[X])².

Root cause: E[X²] = (E[X])² only when X is a constant. In general, E[X²] ≥ (E[X])² (Jensen’s inequality for convex functions).

Correct approach: Compute E[X²] = Σx²P(X=x) separately, then subtract (E[X])².

Mistake 4 — Applying Binomial When Trials Are Not Independent

What happens: Drawing cards without replacement and using Binomial distribution.

Root cause: Binomial requires n independent trials with constant success probability p. Without replacement changes p at each step — use Hypergeometric distribution instead.

Correct approach: Binomial = with replacement (or infinite population). Hypergeometric = without replacement from finite population.

Mistake 5 — Misinterpreting the Geometric Distribution’s Starting Point

What happens: Students compute E[X]=1/p but forget whether X counts trials or failures, leading to off-by-one errors.

Root cause: Two conventions: X = number of trials until first success (X ≥ 1, E[X]=1/p); or X = number of failures before first success (X ≥ 0, E[X]=(1−p)/p). GATE CS typically uses the first convention.

Correct approach: State the convention clearly. If X = trials until first success: P(X=k)=(1−p)^(k−1)p for k=1,2,…; E[X]=1/p. If X = failures before success: E[X]=(1−p)/p.

9. Frequently Asked Questions

Q1. What is Bayes’ theorem and how is it applied in GATE CS problems?

Bayes’ theorem P(Bᵢ|A) = P(A|Bᵢ)×P(Bᵢ) / ΣP(A|Bⱼ)P(Bⱼ) is the fundamental tool for updating probabilities. In GATE CS, it appears in: (1) Machine learning — Naive Bayes classifier (classify text as spam/not-spam); (2) Reliability — given a system failed, find which component caused it; (3) Database/AI — probabilistic inference in Bayesian networks. The key insight: prior probability P(Bᵢ) gets updated to posterior P(Bᵢ|A) when we observe evidence A. The law of total probability computes the normalizing factor P(A).

Q2. What is the difference between mutually exclusive and independent events?

Mutually exclusive: P(A∩B)=0 — they cannot both occur. Knowing A occurred tells you B definitely did not. Example: rolling a 3 and rolling a 5 on a single die roll. Independent: P(A∩B)=P(A)P(B) — occurrence of one gives no information about the other. Example: rolling a 3 on the first die and 5 on the second. The key difference: mutually exclusive events with positive probability are maximally DEPENDENT (one completely excludes the other). Two events are simultaneously mutually exclusive AND independent only if at least one has probability 0.

Q3. What is the expected value and variance of a Binomial distribution?

For X~Binomial(n,p): E[X]=np and Var(X)=np(1−p). Intuition: n trials each contributing p to the expected count; the variance term (1−p) reflects the spread decreasing as p→0 or p→1 (when outcome is more certain). The Binomial also satisfies the reproductive property: if X~Bin(m,p) and Y~Bin(n,p) are independent, then X+Y~Bin(m+n,p). When n is large and p is small with np=λ fixed, Bin(n,p) ≈ Poisson(λ).

Q4. What is the Poisson distribution and when is it used in CS?

Poisson(λ) models the number of events in a fixed interval given average rate λ: P(X=k)=e⁻ᵏλᵏ/k!, with E[X]=Var(X)=λ. In CS, it appears in: (1) Queuing theory — M/M/1 queues model server arrival rates as Poisson processes; (2) Hashing — expected collisions in a hash table (birthday problem variant); (3) Network traffic — packet arrivals modelled as Poisson; (4) Algorithms — randomized algorithms analyze events as Poisson. The unique property E[X]=Var(X)=λ is a quick identification test in GATE problems.

Next Steps

Probability connects to machine learning, algorithms, and networks in CS:

Discrete Mathematics Formula Sheet — All probability formulas in one compact reference page.
Combinatorics — Probability is applied combinatorics — counting outcomes gives probabilities.
Recurrence Relations — Probability recurrences model random processes and Markov chains.
Back to Discrete Mathematics Hub — See all 8 topics in the cluster.

Probability — Conditional Probability, Bayes Theorem & Distributions | GATE CS