Relational Algebra and SQL for GATE CS – Operators, Joins and Queries



Relational Algebra and SQL for GATE CS – Operators, Joins and Queries

What You Will Learn

  • Relational algebra: 5 basic + 4 derived operators with examples
  • Join types: theta join, equi-join, natural join, outer joins
  • Relational division: how to express “for all” queries
  • SQL: DDL, DML, DCL, TCL commands
  • SQL joins, aggregate functions, GROUP BY, HAVING, subqueries
  • GATE PYQs with detailed solutions

GATE Weightage: 2–3 marks. SQL and relational algebra together are the highest-yield DBMS topic.

1. Relational Algebra Basics

Relational algebra is a procedural query language used to query relational databases. It forms the theoretical basis for SQL.

Five Basic Operators

OperatorSymbolDescriptionSQL Equivalent
Selectσ_condition(R)Filter rows satisfying conditionWHERE clause
Projectπ_A1,A2(R)Keep only specified columns; removes duplicatesSELECT DISTINCT
UnionR ∪ SAll tuples from R and S (union-compatible)UNION
Set DifferenceR – STuples in R but not in SEXCEPT / MINUS
Cartesian ProductR × SAll combinations of R and S tuplesFROM R, S (no join)
Union Compatibility: R and S must have the same number of attributes, and corresponding attributes must have compatible domains. Attribute names don’t need to match.

Derived Operators

OperatorSymbolDefinition
IntersectionR ∩ SR – (R – S)
Natural JoinR ⋈ SEqui-join on all common attributes, duplicates removed
Theta JoinR ⋈_θ Sσ_θ(R × S)
DivisionR ÷ STuples in R related to ALL tuples in S
Example: Find names of employees in department 5 earning > 50000.
π_Name(σ_(DeptNo=5 ∧ Salary>50000)(EMPLOYEE))

SQL equivalent: SELECT Name FROM EMPLOYEE WHERE DeptNo = 5 AND Salary > 50000;

2. Joins in Relational Algebra

Theta Join (θ-join)

General join with any comparison operator (=, <, >, ≤, ≥, ≠).

R ⋈_(R.A θ S.B) S = σ_(R.A θ S.B)(R × S)

Equi-join

Theta join where condition uses only equality (=). Result has duplicate attribute columns.

Natural Join

Equi-join on ALL common attribute names; duplicate columns removed from result.

EMP(Eid, Name, Did) ⋈ DEPT(Did, DName)
Result attributes: Eid, Name, Did, DName (Did appears once)

Outer Joins

TypeSymbolIncludes
Left Outer JoinR ⟕ SAll R tuples + matching S; NULL for unmatched S
Right Outer JoinR ⟖ SAll S tuples + matching R; NULL for unmatched R
Full Outer JoinR ⟗ SAll tuples from both; NULL where no match
GATE Tip: Natural join result size: if R has r tuples and S has s tuples, natural join result size is between 0 (no matching) and r×s (all match). If join condition never satisfies, result is empty.

3. Division Operator

R(A, B) ÷ S(B) returns all A values such that for every B value in S, (A, B) exists in R. Expresses “for all” queries.

Example:
ENROLL(Sid, Cid): which students are enrolled in which courses.
COURSES(Cid): {C1, C2}

ENROLL ÷ COURSES = Students enrolled in BOTH C1 and C2

SQL equivalent (double negation):
SELECT DISTINCT Sid FROM ENROLL E1 WHERE NOT EXISTS (
  SELECT Cid FROM COURSES C WHERE NOT EXISTS (
    SELECT * FROM ENROLL E2 WHERE E2.Sid = E1.Sid AND E2.Cid = C.Cid
  )
);

Formal definition:
R ÷ S = π_A(R) – π_A((π_A(R) × S) – R)
where A = attributes in R but not in S

4. SQL Overview

SQL Command Categories

CategoryCommandsDescription
DDL (Data Definition)CREATE, ALTER, DROP, TRUNCATEDefine schema structure
DML (Data Manipulation)SELECT, INSERT, UPDATE, DELETEQuery and modify data
DCL (Data Control)GRANT, REVOKEAccess permissions
TCL (Transaction Control)COMMIT, ROLLBACK, SAVEPOINTTransaction management

SQL Query Structure & Execution Order

Written order: SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY

Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

This is why you CANNOT use SELECT aliases in WHERE (alias not yet defined), but CAN use them in ORDER BY.

Key SQL Clauses

  • DISTINCT: Removes duplicate rows from result
  • ALL: Default; keeps duplicates (opposite of DISTINCT)
  • UNION vs UNION ALL: UNION removes duplicates; UNION ALL keeps them
  • NULL handling: Any comparison with NULL returns UNKNOWN (not TRUE or FALSE). Use IS NULL / IS NOT NULL.
  • LIKE: Pattern matching; % = any sequence, _ = single character

5. SQL Joins

Tables:
EMPLOYEE(Eid, Name, Did, Salary)
DEPARTMENT(Did, DName, Location)

INNER JOIN:
SELECT E.Name, D.DName FROM EMPLOYEE E INNER JOIN DEPARTMENT D ON E.Did = D.Did;
— Returns only employees who have a matching department

LEFT OUTER JOIN:
SELECT E.Name, D.DName FROM EMPLOYEE E LEFT JOIN DEPARTMENT D ON E.Did = D.Did;
— Returns ALL employees; NULL for DName if no matching department

CROSS JOIN (Cartesian Product):
SELECT * FROM EMPLOYEE CROSS JOIN DEPARTMENT;
— Returns all combinations: |EMPLOYEE| × |DEPARTMENT| rows

Row count after joins:
INNER JOIN: ≤ min(|R|, |S|) … up to |R| × |S|
LEFT JOIN: exactly |R| rows (at minimum)
FULL JOIN: ≥ max(|R|, |S|) rows
CROSS JOIN: exactly |R| × |S| rows

6. Aggregate Functions & GROUP BY

Aggregate Functions

FunctionDescriptionNULL handling
COUNT(*)Count all rows including NULLsIncludes NULLs
COUNT(A)Count non-NULL values of AIgnores NULLs
SUM(A)Sum of non-NULL valuesIgnores NULLs
AVG(A)Average of non-NULL valuesIgnores NULLs
MAX(A)Maximum non-NULL valueIgnores NULLs
MIN(A)Minimum non-NULL valueIgnores NULLs
Find departments with average salary above 60000:
SELECT Did, AVG(Salary) AS AvgSal
FROM EMPLOYEE
GROUP BY Did
HAVING AVG(Salary) > 60000;

Find total salary per department, only for department with more than 3 employees:
SELECT Did, SUM(Salary)
FROM EMPLOYEE
GROUP BY Did
HAVING COUNT(*) > 3;

GATE Rule: Every non-aggregate column in SELECT must appear in GROUP BY. Violating this is a SQL error.

7. Subqueries

Correlated vs Non-Correlated Subqueries

  • Non-correlated: Inner query executes once and result is used by outer query
  • Correlated: Inner query executes once for EACH row of outer query; references outer query’s columns

Subquery Operators

OperatorUsageReturns
INval IN (subquery)TRUE if val in the subquery result set
NOT INval NOT IN (subquery)TRUE if val not in subquery result; FALSE if any NULL in subquery!
EXISTSEXISTS (subquery)TRUE if subquery returns any rows
NOT EXISTSNOT EXISTS (subquery)TRUE if subquery returns no rows
ALLval > ALL (subquery)TRUE if val compares true with ALL results
ANY / SOMEval > ANY (subquery)TRUE if val compares true with at least one result
Find employees who earn more than ALL employees in department 5:
SELECT Name FROM EMPLOYEE WHERE Salary > ALL (SELECT Salary FROM EMPLOYEE WHERE Did = 5);

Find employees who earn more than SOME employee in department 5:
SELECT Name FROM EMPLOYEE WHERE Salary > ANY (SELECT Salary FROM EMPLOYEE WHERE Did = 5);

NOT IN trap: If the subquery returns any NULL values, NOT IN evaluates to UNKNOWN for ALL rows, returning an empty result. Use NOT EXISTS instead when NULLs are possible.

8. GATE Examples

GATE 2014: Relation R(A, B, C) and S(B, D). How many tuples can the natural join R ⋈ S have if |R| = 5 and |S| = 3?

View Solution

Natural join joins on common attribute B. Maximum tuples = when every B value in R matches every B value in S = 5 × 3 = 15.

Minimum = 0 (no common B values).

Answer: 0 to 15 tuples

GATE 2016: What does the following SQL query return?
SELECT A FROM R WHERE A NOT IN (SELECT A FROM S);
If S contains a NULL value for A, what happens?

View Solution

When S.A contains NULL, NOT IN evaluates as: A NOT IN {v1, v2, …, NULL}.

The comparison A ≠ NULL evaluates to UNKNOWN, so the entire NOT IN evaluates to UNKNOWN for every row.

UNKNOWN rows are excluded from WHERE clause results.

Result: Empty set (no rows returned)

GATE 2018: Consider EMPLOYEE(Eid, Salary, Did). Write a query to find the department with the maximum average salary.

View Solution

SELECT Did FROM EMPLOYEE GROUP BY Did HAVING AVG(Salary) >= ALL (SELECT AVG(Salary) FROM EMPLOYEE GROUP BY Did);

Or: SELECT Did FROM EMPLOYEE GROUP BY Did ORDER BY AVG(Salary) DESC LIMIT 1;

Key: GROUP BY Did, then filter using HAVING with ALL subquery or ORDER + LIMIT.

9. Common Mistakes

  • SELECT alias in WHERE: Cannot use SELECT column aliases in WHERE because WHERE executes before SELECT. Use the original expression in WHERE, or wrap in a subquery.
  • COUNT(*) vs COUNT(A): COUNT(*) counts all rows including NULLs. COUNT(A) counts only non-NULL values of attribute A. They give different results when A has NULLs.
  • HAVING vs WHERE: WHERE filters rows before grouping; HAVING filters groups after. Aggregate functions cannot appear in WHERE.
  • Natural join vs equi-join: Natural join removes duplicate columns; equi-join keeps both copies. If two relations share multiple common attributes, natural join joins on ALL of them (may be unintended).
  • NOT IN with NULLs: If the subquery result in NOT IN contains even one NULL, the entire NOT IN always evaluates to UNKNOWN, returning an empty result. Use NOT EXISTS for NULL-safe queries.

10. Frequently Asked Questions

What are the basic relational algebra operators?

The five basic operators are: Select (σ) for row filtering, Project (π) for column selection with deduplication, Union (∪) for combining compatible relations, Set Difference (–) for subtraction, and Cartesian Product (×) for all-combinations. Derived operators include Natural Join (⋈), Intersection (∩), Division (÷), and Rename (ρ). All SQL queries can be expressed using these operators.

What is the difference between INNER JOIN and OUTER JOIN in SQL?

INNER JOIN returns only rows with a match in both tables. LEFT OUTER JOIN returns all left-table rows; NULL fills unmatched right-table columns. RIGHT OUTER JOIN returns all right-table rows. FULL OUTER JOIN returns all rows from both tables with NULLs where no match. Natural join is an INNER JOIN on all common attribute names that removes duplicate columns from the result.

What is the difference between WHERE and HAVING clause in SQL?

WHERE filters individual rows before grouping — it cannot reference aggregate functions. HAVING filters entire groups after GROUP BY and can use aggregates. Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Use WHERE to reduce data before grouping (more efficient); use HAVING only when the filter requires an aggregate result.

What is relational division in relational algebra?

R(A, B) ÷ S(B) returns all A values related to EVERY B value in S. It answers “for all” queries: “find students enrolled in all courses.” Formal: π_A(R) – π_A((π_A(R) × S) – R). In SQL, expressed using double NOT EXISTS: “there is no course in S for which the student is NOT enrolled in R.”

Leave a Comment