Indexing and B+ Trees in DBMS – Dense Sparse Index and B+ Tree Operations



Indexing and B+ Trees in DBMS – Dense, Sparse Index and B+ Tree Operations

What You Will Learn

  • Index types: primary, secondary, clustering, dense, sparse
  • Index structure: search key, pointer, index entry
  • B tree: structure, properties, and operations
  • B+ tree: structure, order calculation, insertion and deletion
  • Multilevel indexing and comparison
  • GATE PYQs with complete step-by-step solutions

GATE Weightage: 1–2 marks. B+ tree order/height calculations and index type selection are most common question types.

1. Index Types

An index is an auxiliary data structure that speeds up retrieval of records based on an index attribute (search key).

Index TypeBased OnDescription
Primary IndexPrimary key; data file ordered on this keySparse index on ordered key field; one entry per block
Clustering IndexNon-key field; data file ordered on this fieldOne entry per distinct value; data physically ordered
Secondary IndexNon-ordering field (any field)Dense index; data NOT physically ordered by this key
Dense IndexOne entry per recordFast lookup; more space; can be on unsorted data
Sparse IndexOne entry per blockLess space; requires data to be sorted on key; extra scan within block
Key rules:
– Only ONE clustered (primary) index per table
– Multiple secondary (non-clustered) indexes allowed
– Sparse index only valid when data is ordered on the index key
– Secondary indexes are always dense (data is not ordered, so must index every record)

2. Dense vs Sparse Index

PropertyDense IndexSparse Index
EntriesOne per recordOne per block (or one per distinct key value)
Data orderingNot requiredRequired (data must be sorted on key)
SpaceMore (one entry per record)Less (one entry per block)
Search speedDirect — find block then recordMay need scan within block
Use caseSecondary index, small tablesPrimary/clustered index, large sorted tables
Example: File with 10,000 records, 10 records per block = 1,000 blocks
Dense index: 10,000 entries
Sparse index: 1,000 entries (one per block)

If index block holds 100 entries:
Dense index blocks: 10,000 / 100 = 100 blocks
Sparse index blocks: 1,000 / 100 = 10 blocks

3. Multilevel Index

When an index itself becomes too large to fit in memory, a multi-level index is built: an index on the index.

Number of index levels:
If blocking factor of index = bfr_i (entries per index block):
Level 1 index blocks = ⌈N / bfr_i⌉ (N = number of data blocks for sparse, records for dense)
Level 2 blocks = ⌈Level1 blocks / bfr_i⌉
Continue until 1 block remains

Levels = ⌈log_bfr_i(N)⌉

Disk accesses to find a record:
= Number of index levels + 1 data block access

Example: 100,000 records, blocking factor = 100 entries/block
Level 1 (sparse): 100,000 / 100 = 1,000 blocks
Level 2: 1,000 / 100 = 10 blocks
Level 3: 10 / 100 = 1 block (fits in 1 block)
Levels = 3, accesses = 3 + 1 = 4

4. B Tree Structure

A B tree of order m is a balanced search tree where every node has at most m children (m-1 keys). Data pointers exist at all levels (internal + leaf).

B Tree Properties (order m)

  • Root: 1 to m-1 keys; 2 to m children (unless root is a leaf)
  • Internal nodes: ⌈m/2⌉ – m children; ⌈m/2⌉-1 to m-1 keys
  • All leaves at the same depth
  • Data pointers in both internal and leaf nodes
Max keys in B tree of order m and height h:
Max keys = m^(h+1) – 1

Min keys:
Min keys = 2⌈m/2⌉^h – 1 (except root which can have just 1 key)

5. B+ Tree Structure

A B+ tree stores all data pointers in leaf nodes. Internal nodes store only keys for routing. Leaf nodes are linked as a doubly or singly linked list for efficient range queries.

B+ Tree Properties

Node TypeKeysPointers/Entries
Internal node (order n)n-1 keysn child pointers
Leaf node (order n_leaf)Up to n_leaf keysData pointer per key + 1 next-leaf pointer
Root1 to n-1 keys2 to n children (if not leaf)
Non-root internal⌈n/2⌉-1 to n-1 keys⌈n/2⌉ to n children
Non-root leaf⌈n_leaf/2⌉ to n_leaf keysSame + next pointer

B tree vs B+ tree

FeatureB TreeB+ Tree
Data storageIn all nodes (internal + leaf)Only in leaf nodes
Key duplicationKeys appear onceKeys may appear in both internal and leaf
Leaf linkingNot linkedLeaves linked as list
Range queriesSlow (in-order traversal)Fast (scan linked leaves)
Tree heightHigher (fewer keys in internal nodes)Lower (more keys fit in internal nodes)
Search timeVariable (may find in internal node)Uniform — always reach leaf

6. Order Calculation

Internal Node Order (n)

Each internal node: n pointers + (n-1) keys ≤ block size B
n × P + (n-1) × K ≤ B
where P = pointer size, K = key size

Solve: n ≤ (B + K) / (P + K)
n = floor((B + K) / (P + K))

Leaf Node Order (n_leaf)

Each leaf: n_leaf × (K + P_data) + P_next ≤ B
where P_data = data pointer size, P_next = next-leaf pointer

n_leaf = floor((B – P_next) / (K + P_data))

GATE-style Example:
Block size = 1024 B, Key = 10 B, Pointer = 6 B

Internal node order:
n × 6 + (n-1) × 10 ≤ 1024
16n – 10 ≤ 1024
16n ≤ 1034
n ≤ 64.6 → n = 64

Leaf node (data pointer = 8 B, next pointer = 6 B):
n_leaf × (10 + 8) + 6 ≤ 1024
18 × n_leaf ≤ 1018
n_leaf ≤ 56.6 → n_leaf = 56

B+ Tree Height

For N data records and leaf order n_leaf:
Number of leaf nodes needed = ⌈N / ⌊n_leaf/2⌋⌉ (minimum occupancy)

Tree height h satisfies: ⌈n/2⌉^(h-1) × 2 ≥ N / n_leaf

Simpler: minimum height = ⌈log_{⌈n/2⌉}(N)⌉ + 1

7. B+ Tree Operations

Search

Start at root. At each internal node, use keys to navigate: go to child pointer where key falls in range. Reach leaf, check for key. Time = O(log_n(N)) disk accesses = tree height.

Insertion

  1. Find the appropriate leaf node
  2. If leaf has space: insert key in sorted order
  3. If leaf is full (overflow): split leaf into two; push middle key UP to parent
  4. If parent is full: split parent; push middle key UP (chain reaction)
  5. If root splits: create new root; tree height increases by 1
Example (order 3 B+ tree, max 2 keys per leaf):
Insert 50 into a full leaf [30, 40]:
Keys sorted: [30, 40, 50]
Split: Leaf1 = [30, 40], Leaf2 = [50]
Push 50 up to parent as separator key
Parent gets pointer to Leaf1 and Leaf2

Deletion

  1. Find and remove key from leaf
  2. If leaf still has ≥ ⌈n/2⌉ keys: done
  3. If underflow: try to borrow from adjacent sibling (rotate via parent key)
  4. If sibling has exactly minimum: merge leaves; remove separator from parent
  5. If parent underflows: repeat up the tree

8. GATE Examples

GATE 2014: A B+ tree of order 5 (max 4 keys per node). What is the minimum number of keys in a non-root internal node?

View Solution

Non-root internal node has minimum ⌈n/2⌉ children = ⌈5/2⌉ = 3 children.

Minimum keys = minimum children – 1 = 3 – 1 = 2 keys

GATE 2016: Block size = 512 B. Key = 8 B. Child pointer = 4 B. Record pointer = 4 B. Calculate the order of a B+ tree internal node and the maximum number of keys in a leaf node.

View Solution

Internal node: n × 4 + (n-1) × 8 ≤ 512 → 12n – 8 ≤ 512 → n ≤ 43.3 → n = 43

Max keys in internal node = 43 – 1 = 42

Leaf node (4 B data pointer + 4 B next pointer):
n_leaf × (8 + 4) + 4 ≤ 512 → 12 × n_leaf ≤ 508 → n_leaf ≤ 42.3 → n_leaf = 42

Answer: Internal order = 43 (42 keys); Leaf max = 42 keys

GATE 2019: A file has 16,000 records. Using a B+ tree with leaf order 8, what is the minimum number of leaf nodes?

View Solution

Minimum occupancy of non-root leaf = ⌈8/2⌉ = 4 keys per leaf.

Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 (if all leaves are full, maximum occupancy)

Wait — minimum number of leaves means fewest leaves, which is when each leaf is maximally full (8 keys).

Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 leaf nodes

9. Common Mistakes

  • B tree order vs B+ tree order: In a B tree of order m, each node has at most m children (m-1 keys). In a B+ tree of order n, an internal node has at most n child pointers and n-1 keys; leaf nodes may have a different order. Always check which order applies to which node type.
  • Secondary index must be dense: Secondary indexes are on non-ordering fields, so there’s no guarantee records with the same key value are in adjacent blocks. Each record must have its own index entry — secondary index is always dense.
  • Split in insertion: When a leaf overflows, the middle key is COPIED up to the parent (in B+ trees, the key stays in the leaf). When an internal node overflows, the middle key is PUSHED UP and does NOT remain in the node. This difference matters for tracing operations.
  • Minimum keys in root: Root is a special case. Root has minimum 1 key (2 children) for a non-leaf root, or 1 key for a leaf root. Non-root internal nodes have minimum ⌈n/2⌉ – 1 keys.
  • Height calculation starting from 0 vs 1: GATE problems vary. Clarify whether “height” counts levels starting from 0 (root = level 0) or 1 (root = level 1). Read the problem carefully.

10. Frequently Asked Questions

What is the difference between dense and sparse index in DBMS?

Dense index: one entry per record; works on both sorted and unsorted data; requires more space. Sparse index: one entry per block; requires data to be physically sorted on the index key; much smaller. Dense index enables direct lookup to any record. Sparse index requires finding the right block then scanning within it. Primary indexes are sparse; secondary indexes must be dense.

What is the difference between B tree and B+ tree?

B tree stores data in all nodes including internal nodes; B+ tree stores data only in leaf nodes. B+ tree leaves are linked for range scans; B tree leaves are not linked. B+ tree internal nodes hold more keys (no data pointers) → shorter tree → fewer I/Os. B+ trees dominate in DBMS because of efficient range queries and predictable search depth. B trees can find data in internal nodes for point queries but lack range query efficiency.

How do you calculate the order of a B+ tree?

For internal node order n: n × pointer_size + (n-1) × key_size ≤ block_size. Solve for maximum integer n. For leaf node order: n_leaf × (key_size + data_pointer_size) + next_leaf_pointer ≤ block_size. This gives separate orders for internal and leaf nodes. In GATE, substitute the given sizes and solve — typically 40–100 for common block/key/pointer sizes.

What is a clustered index vs non-clustered index?

A clustered index defines the physical order of records on disk — the table is sorted by the clustered index key. Only one clustered index per table. Efficient for range queries and full scans. Non-clustered index creates a separate index structure with pointers to the actual data rows; data is not physically sorted. Multiple non-clustered indexes allowed. Non-clustered index search requires an extra I/O to follow the pointer to the actual record.