Indexing and B+ Trees in DBMS – Dense, Sparse Index and B+ Tree Operations
What You Will Learn
- Index types: primary, secondary, clustering, dense, sparse
- Index structure: search key, pointer, index entry
- B tree: structure, properties, and operations
- B+ tree: structure, order calculation, insertion and deletion
- Multilevel indexing and comparison
- GATE PYQs with complete step-by-step solutions
GATE Weightage: 1–2 marks. B+ tree order/height calculations and index type selection are most common question types.
1. Index Types
An index is an auxiliary data structure that speeds up retrieval of records based on an index attribute (search key).
| Index Type | Based On | Description |
|---|---|---|
| Primary Index | Primary key; data file ordered on this key | Sparse index on ordered key field; one entry per block |
| Clustering Index | Non-key field; data file ordered on this field | One entry per distinct value; data physically ordered |
| Secondary Index | Non-ordering field (any field) | Dense index; data NOT physically ordered by this key |
| Dense Index | One entry per record | Fast lookup; more space; can be on unsorted data |
| Sparse Index | One entry per block | Less space; requires data to be sorted on key; extra scan within block |
– Only ONE clustered (primary) index per table
– Multiple secondary (non-clustered) indexes allowed
– Sparse index only valid when data is ordered on the index key
– Secondary indexes are always dense (data is not ordered, so must index every record)
2. Dense vs Sparse Index
| Property | Dense Index | Sparse Index |
|---|---|---|
| Entries | One per record | One per block (or one per distinct key value) |
| Data ordering | Not required | Required (data must be sorted on key) |
| Space | More (one entry per record) | Less (one entry per block) |
| Search speed | Direct — find block then record | May need scan within block |
| Use case | Secondary index, small tables | Primary/clustered index, large sorted tables |
Dense index: 10,000 entries
Sparse index: 1,000 entries (one per block)
If index block holds 100 entries:
Dense index blocks: 10,000 / 100 = 100 blocks
Sparse index blocks: 1,000 / 100 = 10 blocks
3. Multilevel Index
When an index itself becomes too large to fit in memory, a multi-level index is built: an index on the index.
If blocking factor of index = bfr_i (entries per index block):
Level 1 index blocks = ⌈N / bfr_i⌉ (N = number of data blocks for sparse, records for dense)
Level 2 blocks = ⌈Level1 blocks / bfr_i⌉
Continue until 1 block remains
Levels = ⌈log_bfr_i(N)⌉
Disk accesses to find a record:
= Number of index levels + 1 data block access
Level 1 (sparse): 100,000 / 100 = 1,000 blocks
Level 2: 1,000 / 100 = 10 blocks
Level 3: 10 / 100 = 1 block (fits in 1 block)
Levels = 3, accesses = 3 + 1 = 4
4. B Tree Structure
A B tree of order m is a balanced search tree where every node has at most m children (m-1 keys). Data pointers exist at all levels (internal + leaf).
B Tree Properties (order m)
- Root: 1 to m-1 keys; 2 to m children (unless root is a leaf)
- Internal nodes: ⌈m/2⌉ – m children; ⌈m/2⌉-1 to m-1 keys
- All leaves at the same depth
- Data pointers in both internal and leaf nodes
Max keys = m^(h+1) – 1
Min keys:
Min keys = 2⌈m/2⌉^h – 1 (except root which can have just 1 key)
5. B+ Tree Structure
A B+ tree stores all data pointers in leaf nodes. Internal nodes store only keys for routing. Leaf nodes are linked as a doubly or singly linked list for efficient range queries.
B+ Tree Properties
| Node Type | Keys | Pointers/Entries |
|---|---|---|
| Internal node (order n) | n-1 keys | n child pointers |
| Leaf node (order n_leaf) | Up to n_leaf keys | Data pointer per key + 1 next-leaf pointer |
| Root | 1 to n-1 keys | 2 to n children (if not leaf) |
| Non-root internal | ⌈n/2⌉-1 to n-1 keys | ⌈n/2⌉ to n children |
| Non-root leaf | ⌈n_leaf/2⌉ to n_leaf keys | Same + next pointer |
B tree vs B+ tree
| Feature | B Tree | B+ Tree |
|---|---|---|
| Data storage | In all nodes (internal + leaf) | Only in leaf nodes |
| Key duplication | Keys appear once | Keys may appear in both internal and leaf |
| Leaf linking | Not linked | Leaves linked as list |
| Range queries | Slow (in-order traversal) | Fast (scan linked leaves) |
| Tree height | Higher (fewer keys in internal nodes) | Lower (more keys fit in internal nodes) |
| Search time | Variable (may find in internal node) | Uniform — always reach leaf |
6. Order Calculation
Internal Node Order (n)
n × P + (n-1) × K ≤ B
where P = pointer size, K = key size
Solve: n ≤ (B + K) / (P + K)
n = floor((B + K) / (P + K))
Leaf Node Order (n_leaf)
where P_data = data pointer size, P_next = next-leaf pointer
n_leaf = floor((B – P_next) / (K + P_data))
Block size = 1024 B, Key = 10 B, Pointer = 6 B
Internal node order:
n × 6 + (n-1) × 10 ≤ 1024
16n – 10 ≤ 1024
16n ≤ 1034
n ≤ 64.6 → n = 64
Leaf node (data pointer = 8 B, next pointer = 6 B):
n_leaf × (10 + 8) + 6 ≤ 1024
18 × n_leaf ≤ 1018
n_leaf ≤ 56.6 → n_leaf = 56
B+ Tree Height
Number of leaf nodes needed = ⌈N / ⌊n_leaf/2⌋⌉ (minimum occupancy)
Tree height h satisfies: ⌈n/2⌉^(h-1) × 2 ≥ N / n_leaf
Simpler: minimum height = ⌈log_{⌈n/2⌉}(N)⌉ + 1
7. B+ Tree Operations
Search
Start at root. At each internal node, use keys to navigate: go to child pointer where key falls in range. Reach leaf, check for key. Time = O(log_n(N)) disk accesses = tree height.
Insertion
- Find the appropriate leaf node
- If leaf has space: insert key in sorted order
- If leaf is full (overflow): split leaf into two; push middle key UP to parent
- If parent is full: split parent; push middle key UP (chain reaction)
- If root splits: create new root; tree height increases by 1
Insert 50 into a full leaf [30, 40]:
Keys sorted: [30, 40, 50]
Split: Leaf1 = [30, 40], Leaf2 = [50]
Push 50 up to parent as separator key
Parent gets pointer to Leaf1 and Leaf2
Deletion
- Find and remove key from leaf
- If leaf still has ≥ ⌈n/2⌉ keys: done
- If underflow: try to borrow from adjacent sibling (rotate via parent key)
- If sibling has exactly minimum: merge leaves; remove separator from parent
- If parent underflows: repeat up the tree
8. GATE Examples
GATE 2014: A B+ tree of order 5 (max 4 keys per node). What is the minimum number of keys in a non-root internal node?
View Solution
Non-root internal node has minimum ⌈n/2⌉ children = ⌈5/2⌉ = 3 children.
Minimum keys = minimum children – 1 = 3 – 1 = 2 keys
GATE 2016: Block size = 512 B. Key = 8 B. Child pointer = 4 B. Record pointer = 4 B. Calculate the order of a B+ tree internal node and the maximum number of keys in a leaf node.
View Solution
Internal node: n × 4 + (n-1) × 8 ≤ 512 → 12n – 8 ≤ 512 → n ≤ 43.3 → n = 43
Max keys in internal node = 43 – 1 = 42
Leaf node (4 B data pointer + 4 B next pointer):
n_leaf × (8 + 4) + 4 ≤ 512 → 12 × n_leaf ≤ 508 → n_leaf ≤ 42.3 → n_leaf = 42
Answer: Internal order = 43 (42 keys); Leaf max = 42 keys
GATE 2019: A file has 16,000 records. Using a B+ tree with leaf order 8, what is the minimum number of leaf nodes?
View Solution
Minimum occupancy of non-root leaf = ⌈8/2⌉ = 4 keys per leaf.
Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 (if all leaves are full, maximum occupancy)
Wait — minimum number of leaves means fewest leaves, which is when each leaf is maximally full (8 keys).
Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 leaf nodes
9. Common Mistakes
- B tree order vs B+ tree order: In a B tree of order m, each node has at most m children (m-1 keys). In a B+ tree of order n, an internal node has at most n child pointers and n-1 keys; leaf nodes may have a different order. Always check which order applies to which node type.
- Secondary index must be dense: Secondary indexes are on non-ordering fields, so there’s no guarantee records with the same key value are in adjacent blocks. Each record must have its own index entry — secondary index is always dense.
- Split in insertion: When a leaf overflows, the middle key is COPIED up to the parent (in B+ trees, the key stays in the leaf). When an internal node overflows, the middle key is PUSHED UP and does NOT remain in the node. This difference matters for tracing operations.
- Minimum keys in root: Root is a special case. Root has minimum 1 key (2 children) for a non-leaf root, or 1 key for a leaf root. Non-root internal nodes have minimum ⌈n/2⌉ – 1 keys.
- Height calculation starting from 0 vs 1: GATE problems vary. Clarify whether “height” counts levels starting from 0 (root = level 0) or 1 (root = level 1). Read the problem carefully.
10. Frequently Asked Questions
What is the difference between dense and sparse index in DBMS?
Dense index: one entry per record; works on both sorted and unsorted data; requires more space. Sparse index: one entry per block; requires data to be physically sorted on the index key; much smaller. Dense index enables direct lookup to any record. Sparse index requires finding the right block then scanning within it. Primary indexes are sparse; secondary indexes must be dense.
What is the difference between B tree and B+ tree?
B tree stores data in all nodes including internal nodes; B+ tree stores data only in leaf nodes. B+ tree leaves are linked for range scans; B tree leaves are not linked. B+ tree internal nodes hold more keys (no data pointers) → shorter tree → fewer I/Os. B+ trees dominate in DBMS because of efficient range queries and predictable search depth. B trees can find data in internal nodes for point queries but lack range query efficiency.
How do you calculate the order of a B+ tree?
For internal node order n: n × pointer_size + (n-1) × key_size ≤ block_size. Solve for maximum integer n. For leaf node order: n_leaf × (key_size + data_pointer_size) + next_leaf_pointer ≤ block_size. This gives separate orders for internal and leaf nodes. In GATE, substitute the given sizes and solve — typically 40–100 for common block/key/pointer sizes.
What is a clustered index vs non-clustered index?
A clustered index defines the physical order of records on disk — the table is sorted by the clustered index key. Only one clustered index per table. Efficient for range queries and full scans. Non-clustered index creates a separate index structure with pointers to the actual data rows; data is not physically sorted. Multiple non-clustered indexes allowed. Non-clustered index search requires an extra I/O to follow the pointer to the actual record.