Indexing and B+ Trees in DBMS – Dense, Sparse Index and B+ Tree Operations

Q: What is the difference between dense and sparse index in DBMS?

A dense index has one index entry for every search key value (every record) in the data file. A sparse index has index entries only for some records — typically one per block (for ordered/clustered data). Dense index: faster search (direct lookup), more space. Sparse index: less space, slower (may need sequential scan within a block), only applicable when data is ordered on the search key (clustered index). Primary indexes are typically sparse; secondary indexes must be dense.

Q: What is the difference between B tree and B+ tree?

In a B tree, data pointers can exist in both internal nodes and leaf nodes. In a B+ tree, all data pointers are in leaf nodes only; internal nodes store only keys for routing. B+ tree advantages: (1) leaf nodes are linked, enabling efficient range queries; (2) more keys fit in internal nodes (no data pointers), reducing tree height; (3) all searches reach leaf level, giving uniform search time. B trees may be slightly faster for point queries (data may be found in an internal node), but B+ trees dominate in database systems due to range query efficiency.

Q: How do you calculate the order of a B+ tree?

The order (n) of a B+ tree is the maximum number of pointers in an internal node. For a node with block size B bytes, key size k bytes, pointer size p bytes: the condition is n×p + (n-1)×k ≤ B. Solve for maximum integer n. Leaf node order may differ: leaf stores (key, data-pointer) pairs plus one pointer to next leaf, so: n_leaf × (k + p_data) + p_next ≤ B. GATE typically asks to find the minimum number of levels or maximum/minimum number of keys for a given B+ tree order.

Q: What is a clustered index vs non-clustered index?

A clustered (primary) index means the physical order of data records on disk matches the order of the index key. There can be only ONE clustered index per table. A non-clustered (secondary) index has an index order different from physical record order; it stores pointers (record IDs) to the actual data. Clustered index: efficient for range queries, table scans; requires physically sorted data. Non-clustered index: allows indexing on multiple columns; search requires an extra pointer dereference to find the actual record.

What You Will Learn

Index types: primary, secondary, clustering, dense, sparse
Index structure: search key, pointer, index entry
B tree: structure, properties, and operations
B+ tree: structure, order calculation, insertion and deletion
Multilevel indexing and comparison
GATE PYQs with complete step-by-step solutions

GATE Weightage: 1–2 marks. B+ tree order/height calculations and index type selection are most common question types.

1. Index Types

An index is an auxiliary data structure that speeds up retrieval of records based on an index attribute (search key).

Index Type	Based On	Description
Primary Index	Primary key; data file ordered on this key	Sparse index on ordered key field; one entry per block
Clustering Index	Non-key field; data file ordered on this field	One entry per distinct value; data physically ordered
Secondary Index	Non-ordering field (any field)	Dense index; data NOT physically ordered by this key
Dense Index	One entry per record	Fast lookup; more space; can be on unsorted data
Sparse Index	One entry per block	Less space; requires data to be sorted on key; extra scan within block

Key rules:
– Only ONE clustered (primary) index per table
– Multiple secondary (non-clustered) indexes allowed
– Sparse index only valid when data is ordered on the index key
– Secondary indexes are always dense (data is not ordered, so must index every record)

2. Dense vs Sparse Index

Property	Dense Index	Sparse Index
Entries	One per record	One per block (or one per distinct key value)
Data ordering	Not required	Required (data must be sorted on key)
Space	More (one entry per record)	Less (one entry per block)
Search speed	Direct — find block then record	May need scan within block
Use case	Secondary index, small tables	Primary/clustered index, large sorted tables

Example: File with 10,000 records, 10 records per block = 1,000 blocks
Dense index: 10,000 entries
Sparse index: 1,000 entries (one per block)

If index block holds 100 entries:
Dense index blocks: 10,000 / 100 = 100 blocks
Sparse index blocks: 1,000 / 100 = 10 blocks

3. Multilevel Index

When an index itself becomes too large to fit in memory, a multi-level index is built: an index on the index.

Number of index levels:
If blocking factor of index = bfr_i (entries per index block):
Level 1 index blocks = ⌈N / bfr_i⌉ (N = number of data blocks for sparse, records for dense)
Level 2 blocks = ⌈Level1 blocks / bfr_i⌉
Continue until 1 block remains

Levels = ⌈log_bfr_i(N)⌉

Disk accesses to find a record:
= Number of index levels + 1 data block access

Example: 100,000 records, blocking factor = 100 entries/block
Level 1 (sparse): 100,000 / 100 = 1,000 blocks
Level 2: 1,000 / 100 = 10 blocks
Level 3: 10 / 100 = 1 block (fits in 1 block)
Levels = 3, accesses = 3 + 1 = 4

4. B Tree Structure

A B tree of order m is a balanced search tree where every node has at most m children (m-1 keys). Data pointers exist at all levels (internal + leaf).

B Tree Properties (order m)

Root: 1 to m-1 keys; 2 to m children (unless root is a leaf)
Internal nodes: ⌈m/2⌉ – m children; ⌈m/2⌉-1 to m-1 keys
All leaves at the same depth
Data pointers in both internal and leaf nodes

Max keys in B tree of order m and height h:
Max keys = m^(h+1) – 1

Min keys:
Min keys = 2⌈m/2⌉^h – 1 (except root which can have just 1 key)

5. B+ Tree Structure

A B+ tree stores all data pointers in leaf nodes. Internal nodes store only keys for routing. Leaf nodes are linked as a doubly or singly linked list for efficient range queries.

B+ Tree Properties

Node Type	Keys	Pointers/Entries
Internal node (order n)	n-1 keys	n child pointers
Leaf node (order n_leaf)	Up to n_leaf keys	Data pointer per key + 1 next-leaf pointer
Root	1 to n-1 keys	2 to n children (if not leaf)
Non-root internal	⌈n/2⌉-1 to n-1 keys	⌈n/2⌉ to n children
Non-root leaf	⌈n_leaf/2⌉ to n_leaf keys	Same + next pointer

B tree vs B+ tree

Feature	B Tree	B+ Tree
Data storage	In all nodes (internal + leaf)	Only in leaf nodes
Key duplication	Keys appear once	Keys may appear in both internal and leaf
Leaf linking	Not linked	Leaves linked as list
Range queries	Slow (in-order traversal)	Fast (scan linked leaves)
Tree height	Higher (fewer keys in internal nodes)	Lower (more keys fit in internal nodes)
Search time	Variable (may find in internal node)	Uniform — always reach leaf

6. Order Calculation

Internal Node Order (n)

Each internal node: n pointers + (n-1) keys ≤ block size B
n × P + (n-1) × K ≤ B
where P = pointer size, K = key size

Solve: n ≤ (B + K) / (P + K)
n = floor((B + K) / (P + K))

Leaf Node Order (n_leaf)

Each leaf: n_leaf × (K + P_data) + P_next ≤ B
where P_data = data pointer size, P_next = next-leaf pointer

n_leaf = floor((B – P_next) / (K + P_data))

GATE-style Example:
Block size = 1024 B, Key = 10 B, Pointer = 6 B

Internal node order:
n × 6 + (n-1) × 10 ≤ 1024
16n – 10 ≤ 1024
16n ≤ 1034
n ≤ 64.6 → n = 64

Leaf node (data pointer = 8 B, next pointer = 6 B):
n_leaf × (10 + 8) + 6 ≤ 1024
18 × n_leaf ≤ 1018
n_leaf ≤ 56.6 → n_leaf = 56

B+ Tree Height

For N data records and leaf order n_leaf:
Number of leaf nodes needed = ⌈N / ⌊n_leaf/2⌋⌉ (minimum occupancy)

Tree height h satisfies: ⌈n/2⌉^(h-1) × 2 ≥ N / n_leaf

Simpler: minimum height = ⌈log_{⌈n/2⌉}(N)⌉ + 1

7. B+ Tree Operations

Search

Start at root. At each internal node, use keys to navigate: go to child pointer where key falls in range. Reach leaf, check for key. Time = O(log_n(N)) disk accesses = tree height.

Insertion

Find the appropriate leaf node
If leaf has space: insert key in sorted order
If leaf is full (overflow): split leaf into two; push middle key UP to parent
If parent is full: split parent; push middle key UP (chain reaction)
If root splits: create new root; tree height increases by 1

Example (order 3 B+ tree, max 2 keys per leaf):
Insert 50 into a full leaf [30, 40]:
Keys sorted: [30, 40, 50]
Split: Leaf1 = [30, 40], Leaf2 = [50]
Push 50 up to parent as separator key
Parent gets pointer to Leaf1 and Leaf2

Deletion

Find and remove key from leaf
If leaf still has ≥ ⌈n/2⌉ keys: done
If underflow: try to borrow from adjacent sibling (rotate via parent key)
If sibling has exactly minimum: merge leaves; remove separator from parent
If parent underflows: repeat up the tree

8. GATE Examples

GATE 2014: A B+ tree of order 5 (max 4 keys per node). What is the minimum number of keys in a non-root internal node?

View Solution

Non-root internal node has minimum ⌈n/2⌉ children = ⌈5/2⌉ = 3 children.

Minimum keys = minimum children – 1 = 3 – 1 = 2 keys

GATE 2016: Block size = 512 B. Key = 8 B. Child pointer = 4 B. Record pointer = 4 B. Calculate the order of a B+ tree internal node and the maximum number of keys in a leaf node.

View Solution

Internal node: n × 4 + (n-1) × 8 ≤ 512 → 12n – 8 ≤ 512 → n ≤ 43.3 → n = 43

Max keys in internal node = 43 – 1 = 42

Leaf node (4 B data pointer + 4 B next pointer):
n_leaf × (8 + 4) + 4 ≤ 512 → 12 × n_leaf ≤ 508 → n_leaf ≤ 42.3 → n_leaf = 42

Answer: Internal order = 43 (42 keys); Leaf max = 42 keys

GATE 2019: A file has 16,000 records. Using a B+ tree with leaf order 8, what is the minimum number of leaf nodes?

View Solution

Minimum occupancy of non-root leaf = ⌈8/2⌉ = 4 keys per leaf.

Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 (if all leaves are full, maximum occupancy)

Wait — minimum number of leaves means fewest leaves, which is when each leaf is maximally full (8 keys).

Minimum leaf nodes = ⌈16,000 / 8⌉ = 2,000 leaf nodes

9. Common Mistakes

B tree order vs B+ tree order: In a B tree of order m, each node has at most m children (m-1 keys). In a B+ tree of order n, an internal node has at most n child pointers and n-1 keys; leaf nodes may have a different order. Always check which order applies to which node type.
Secondary index must be dense: Secondary indexes are on non-ordering fields, so there’s no guarantee records with the same key value are in adjacent blocks. Each record must have its own index entry — secondary index is always dense.
Split in insertion: When a leaf overflows, the middle key is COPIED up to the parent (in B+ trees, the key stays in the leaf). When an internal node overflows, the middle key is PUSHED UP and does NOT remain in the node. This difference matters for tracing operations.
Minimum keys in root: Root is a special case. Root has minimum 1 key (2 children) for a non-leaf root, or 1 key for a leaf root. Non-root internal nodes have minimum ⌈n/2⌉ – 1 keys.
Height calculation starting from 0 vs 1: GATE problems vary. Clarify whether “height” counts levels starting from 0 (root = level 0) or 1 (root = level 1). Read the problem carefully.

10. Frequently Asked Questions

What is the difference between dense and sparse index in DBMS?

Dense index: one entry per record; works on both sorted and unsorted data; requires more space. Sparse index: one entry per block; requires data to be physically sorted on the index key; much smaller. Dense index enables direct lookup to any record. Sparse index requires finding the right block then scanning within it. Primary indexes are sparse; secondary indexes must be dense.

What is the difference between B tree and B+ tree?

B tree stores data in all nodes including internal nodes; B+ tree stores data only in leaf nodes. B+ tree leaves are linked for range scans; B tree leaves are not linked. B+ tree internal nodes hold more keys (no data pointers) → shorter tree → fewer I/Os. B+ trees dominate in DBMS because of efficient range queries and predictable search depth. B trees can find data in internal nodes for point queries but lack range query efficiency.

How do you calculate the order of a B+ tree?

For internal node order n: n × pointer_size + (n-1) × key_size ≤ block_size. Solve for maximum integer n. For leaf node order: n_leaf × (key_size + data_pointer_size) + next_leaf_pointer ≤ block_size. This gives separate orders for internal and leaf nodes. In GATE, substitute the given sizes and solve — typically 40–100 for common block/key/pointer sizes.

What is a clustered index vs non-clustered index?

A clustered index defines the physical order of records on disk — the table is sorted by the clustered index key. Only one clustered index per table. Efficient for range queries and full scans. Non-clustered index creates a separate index structure with pointers to the actual data rows; data is not physically sorted. Multiple non-clustered indexes allowed. Non-clustered index search requires an extra I/O to follow the pointer to the actual record.

Indexing and B+ Trees in DBMS – Dense Sparse Index and B+ Tree Operations

Indexing and B+ Trees in DBMS – Dense, Sparse Index and B+ Tree Operations

What You Will Learn

1. Index Types

2. Dense vs Sparse Index

3. Multilevel Index

4. B Tree Structure

B Tree Properties (order m)

5. B+ Tree Structure

B+ Tree Properties

B tree vs B+ tree

6. Order Calculation

Internal Node Order (n)

Leaf Node Order (n_leaf)

B+ Tree Height

7. B+ Tree Operations

Search

Insertion

Deletion

8. GATE Examples

9. Common Mistakes

10. Frequently Asked Questions

What is the difference between dense and sparse index in DBMS?

What is the difference between B tree and B+ tree?

How do you calculate the order of a B+ tree?

What is a clustered index vs non-clustered index?

Next Topics

Leave a Comment