Relational Algebra and SQL for GATE CS – Operators, Joins and Queries
What You Will Learn
- Relational algebra: 5 basic + 4 derived operators with examples
- Join types: theta join, equi-join, natural join, outer joins
- Relational division: how to express “for all” queries
- SQL: DDL, DML, DCL, TCL commands
- SQL joins, aggregate functions, GROUP BY, HAVING, subqueries
- GATE PYQs with detailed solutions
GATE Weightage: 2–3 marks. SQL and relational algebra together are the highest-yield DBMS topic.
1. Relational Algebra Basics
Relational algebra is a procedural query language used to query relational databases. It forms the theoretical basis for SQL.
Five Basic Operators
| Operator | Symbol | Description | SQL Equivalent |
|---|---|---|---|
| Select | σ_condition(R) | Filter rows satisfying condition | WHERE clause |
| Project | π_A1,A2(R) | Keep only specified columns; removes duplicates | SELECT DISTINCT |
| Union | R ∪ S | All tuples from R and S (union-compatible) | UNION |
| Set Difference | R – S | Tuples in R but not in S | EXCEPT / MINUS |
| Cartesian Product | R × S | All combinations of R and S tuples | FROM R, S (no join) |
Derived Operators
| Operator | Symbol | Definition |
|---|---|---|
| Intersection | R ∩ S | R – (R – S) |
| Natural Join | R ⋈ S | Equi-join on all common attributes, duplicates removed |
| Theta Join | R ⋈_θ S | σ_θ(R × S) |
| Division | R ÷ S | Tuples in R related to ALL tuples in S |
π_Name(σ_(DeptNo=5 ∧ Salary>50000)(EMPLOYEE))
SQL equivalent: SELECT Name FROM EMPLOYEE WHERE DeptNo = 5 AND Salary > 50000;
2. Joins in Relational Algebra
Theta Join (θ-join)
General join with any comparison operator (=, <, >, ≤, ≥, ≠).
Equi-join
Theta join where condition uses only equality (=). Result has duplicate attribute columns.
Natural Join
Equi-join on ALL common attribute names; duplicate columns removed from result.
Result attributes: Eid, Name, Did, DName (Did appears once)
Outer Joins
| Type | Symbol | Includes |
|---|---|---|
| Left Outer Join | R ⟕ S | All R tuples + matching S; NULL for unmatched S |
| Right Outer Join | R ⟖ S | All S tuples + matching R; NULL for unmatched R |
| Full Outer Join | R ⟗ S | All tuples from both; NULL where no match |
3. Division Operator
R(A, B) ÷ S(B) returns all A values such that for every B value in S, (A, B) exists in R. Expresses “for all” queries.
ENROLL(Sid, Cid): which students are enrolled in which courses.
COURSES(Cid): {C1, C2}
ENROLL ÷ COURSES = Students enrolled in BOTH C1 and C2
SQL equivalent (double negation):
SELECT DISTINCT Sid FROM ENROLL E1 WHERE NOT EXISTS (
SELECT Cid FROM COURSES C WHERE NOT EXISTS (
SELECT * FROM ENROLL E2 WHERE E2.Sid = E1.Sid AND E2.Cid = C.Cid
)
);
R ÷ S = π_A(R) – π_A((π_A(R) × S) – R)
where A = attributes in R but not in S
4. SQL Overview
SQL Command Categories
| Category | Commands | Description |
|---|---|---|
| DDL (Data Definition) | CREATE, ALTER, DROP, TRUNCATE | Define schema structure |
| DML (Data Manipulation) | SELECT, INSERT, UPDATE, DELETE | Query and modify data |
| DCL (Data Control) | GRANT, REVOKE | Access permissions |
| TCL (Transaction Control) | COMMIT, ROLLBACK, SAVEPOINT | Transaction management |
SQL Query Structure & Execution Order
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
This is why you CANNOT use SELECT aliases in WHERE (alias not yet defined), but CAN use them in ORDER BY.
Key SQL Clauses
- DISTINCT: Removes duplicate rows from result
- ALL: Default; keeps duplicates (opposite of DISTINCT)
- UNION vs UNION ALL: UNION removes duplicates; UNION ALL keeps them
- NULL handling: Any comparison with NULL returns UNKNOWN (not TRUE or FALSE). Use IS NULL / IS NOT NULL.
- LIKE: Pattern matching; % = any sequence, _ = single character
5. SQL Joins
EMPLOYEE(Eid, Name, Did, Salary)
DEPARTMENT(Did, DName, Location)
INNER JOIN:
SELECT E.Name, D.DName FROM EMPLOYEE E INNER JOIN DEPARTMENT D ON E.Did = D.Did;
— Returns only employees who have a matching department
LEFT OUTER JOIN:
SELECT E.Name, D.DName FROM EMPLOYEE E LEFT JOIN DEPARTMENT D ON E.Did = D.Did;
— Returns ALL employees; NULL for DName if no matching department
CROSS JOIN (Cartesian Product):
SELECT * FROM EMPLOYEE CROSS JOIN DEPARTMENT;
— Returns all combinations: |EMPLOYEE| × |DEPARTMENT| rows
INNER JOIN: ≤ min(|R|, |S|) … up to |R| × |S|
LEFT JOIN: exactly |R| rows (at minimum)
FULL JOIN: ≥ max(|R|, |S|) rows
CROSS JOIN: exactly |R| × |S| rows
6. Aggregate Functions & GROUP BY
Aggregate Functions
| Function | Description | NULL handling |
|---|---|---|
| COUNT(*) | Count all rows including NULLs | Includes NULLs |
| COUNT(A) | Count non-NULL values of A | Ignores NULLs |
| SUM(A) | Sum of non-NULL values | Ignores NULLs |
| AVG(A) | Average of non-NULL values | Ignores NULLs |
| MAX(A) | Maximum non-NULL value | Ignores NULLs |
| MIN(A) | Minimum non-NULL value | Ignores NULLs |
SELECT Did, AVG(Salary) AS AvgSal
FROM EMPLOYEE
GROUP BY Did
HAVING AVG(Salary) > 60000;
Find total salary per department, only for department with more than 3 employees:
SELECT Did, SUM(Salary)
FROM EMPLOYEE
GROUP BY Did
HAVING COUNT(*) > 3;
7. Subqueries
Correlated vs Non-Correlated Subqueries
- Non-correlated: Inner query executes once and result is used by outer query
- Correlated: Inner query executes once for EACH row of outer query; references outer query’s columns
Subquery Operators
| Operator | Usage | Returns |
|---|---|---|
| IN | val IN (subquery) | TRUE if val in the subquery result set |
| NOT IN | val NOT IN (subquery) | TRUE if val not in subquery result; FALSE if any NULL in subquery! |
| EXISTS | EXISTS (subquery) | TRUE if subquery returns any rows |
| NOT EXISTS | NOT EXISTS (subquery) | TRUE if subquery returns no rows |
| ALL | val > ALL (subquery) | TRUE if val compares true with ALL results |
| ANY / SOME | val > ANY (subquery) | TRUE if val compares true with at least one result |
SELECT Name FROM EMPLOYEE WHERE Salary > ALL (SELECT Salary FROM EMPLOYEE WHERE Did = 5);
Find employees who earn more than SOME employee in department 5:
SELECT Name FROM EMPLOYEE WHERE Salary > ANY (SELECT Salary FROM EMPLOYEE WHERE Did = 5);
8. GATE Examples
GATE 2014: Relation R(A, B, C) and S(B, D). How many tuples can the natural join R ⋈ S have if |R| = 5 and |S| = 3?
View Solution
Natural join joins on common attribute B. Maximum tuples = when every B value in R matches every B value in S = 5 × 3 = 15.
Minimum = 0 (no common B values).
Answer: 0 to 15 tuples
GATE 2016: What does the following SQL query return?
SELECT A FROM R WHERE A NOT IN (SELECT A FROM S);
If S contains a NULL value for A, what happens?
View Solution
When S.A contains NULL, NOT IN evaluates as: A NOT IN {v1, v2, …, NULL}.
The comparison A ≠ NULL evaluates to UNKNOWN, so the entire NOT IN evaluates to UNKNOWN for every row.
UNKNOWN rows are excluded from WHERE clause results.
Result: Empty set (no rows returned)
GATE 2018: Consider EMPLOYEE(Eid, Salary, Did). Write a query to find the department with the maximum average salary.
View Solution
SELECT Did FROM EMPLOYEE GROUP BY Did HAVING AVG(Salary) >= ALL (SELECT AVG(Salary) FROM EMPLOYEE GROUP BY Did);
Or: SELECT Did FROM EMPLOYEE GROUP BY Did ORDER BY AVG(Salary) DESC LIMIT 1;
Key: GROUP BY Did, then filter using HAVING with ALL subquery or ORDER + LIMIT.
9. Common Mistakes
- SELECT alias in WHERE: Cannot use SELECT column aliases in WHERE because WHERE executes before SELECT. Use the original expression in WHERE, or wrap in a subquery.
- COUNT(*) vs COUNT(A): COUNT(*) counts all rows including NULLs. COUNT(A) counts only non-NULL values of attribute A. They give different results when A has NULLs.
- HAVING vs WHERE: WHERE filters rows before grouping; HAVING filters groups after. Aggregate functions cannot appear in WHERE.
- Natural join vs equi-join: Natural join removes duplicate columns; equi-join keeps both copies. If two relations share multiple common attributes, natural join joins on ALL of them (may be unintended).
- NOT IN with NULLs: If the subquery result in NOT IN contains even one NULL, the entire NOT IN always evaluates to UNKNOWN, returning an empty result. Use NOT EXISTS for NULL-safe queries.
10. Frequently Asked Questions
What are the basic relational algebra operators?
The five basic operators are: Select (σ) for row filtering, Project (π) for column selection with deduplication, Union (∪) for combining compatible relations, Set Difference (–) for subtraction, and Cartesian Product (×) for all-combinations. Derived operators include Natural Join (⋈), Intersection (∩), Division (÷), and Rename (ρ). All SQL queries can be expressed using these operators.
What is the difference between INNER JOIN and OUTER JOIN in SQL?
INNER JOIN returns only rows with a match in both tables. LEFT OUTER JOIN returns all left-table rows; NULL fills unmatched right-table columns. RIGHT OUTER JOIN returns all right-table rows. FULL OUTER JOIN returns all rows from both tables with NULLs where no match. Natural join is an INNER JOIN on all common attribute names that removes duplicate columns from the result.
What is the difference between WHERE and HAVING clause in SQL?
WHERE filters individual rows before grouping — it cannot reference aggregate functions. HAVING filters entire groups after GROUP BY and can use aggregates. Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Use WHERE to reduce data before grouping (more efficient); use HAVING only when the filter requires an aggregate result.
What is relational division in relational algebra?
R(A, B) ÷ S(B) returns all A values related to EVERY B value in S. It answers “for all” queries: “find students enrolled in all courses.” Formal: π_A(R) – π_A((π_A(R) × S) – R). In SQL, expressed using double NOT EXISTS: “there is no course in S for which the student is NOT enrolled in R.”