Understanding B-Trees – CS @ Kenyon

A B-Tree is a self-balancing search tree that maintains sorted data and allows searches, insertions, deletions, and sequential access in logarithmic time. B-Trees are widely used in databases and file systems because of their ability to handle large amounts of data efficiently.

Key Features of B-Trees

Balanced Tree:
- Every path from the root to a leaf has the same length, ensuring balanced access times.
Multiple Keys per Node:
- Each node can store multiple keys, reducing the height of the tree.
Dynamic Growth:
- Automatically adjusts the number of keys and nodes during insertions and deletions.
Efficient Disk Access:
- B-Trees minimize disk I/O by grouping multiple keys into each node.

Terminology

Order (m):
- The maximum number of children a node can have. A B-Tree of order m can have up to m-1 keys in each node.
Root Node:
- The topmost node of the tree.
Internal Nodes:
- Nodes that have children.
Leaf Nodes:
- Nodes with no children.

Properties of a B-Tree

Each node can hold at most m-1 keys and must hold at least ceil(m/2) - 1 keys (except the root).
The keys within a node are sorted in ascending order.
Internal nodes act as guides, dividing the data space:
- For a key k in a node:
  - Keys in the left child are < k.
  - Keys in the right child are > k.

Basic Operations on B-Trees

1. Search

The search operation in a B-Tree is similar to binary search:

Compare the key with the keys in the current node.
If the key matches one of the keys in the node, the search is successful.
If not, move to the appropriate child node based on the key’s value.

2. Insertion

Insertion involves finding the correct position for the key and ensuring that the B-Tree properties are maintained:

Insert the key into the appropriate node.
If the node overflows (has m keys), split it into two nodes:
- The middle key becomes the parent key in the node’s parent.
- Repeat this process up to the root if necessary.

3. Deletion

Deletion can be more complex, as it may require re-balancing:

If the key is in a leaf node, simply remove it.
If the key is in an internal node:
- Replace it with its predecessor or successor key.
- Recursively delete the predecessor or successor key.
If a node underflows (has fewer than ceil(m/2) - 1 keys):
- Borrow a key from a sibling, or
- Merge with a sibling.

Example of a B-Tree

Inserting Keys into a B-Tree of Order 3

Start with an empty tree:
[]
Insert keys: 10, 20, 30, 40, 50:
- After inserting 10, 20, and 30:
  [10, 20, 30]
- Inserting 40 causes a split:
  [20] [10] [30, 40]
- Inserting 50:
  [20] [10] [30, 40, 50]
Insert 60, causing another split:
[20, 40] [10] [30] [50, 60]

C++ Implementation of a B-Tree

Below is a simplified implementation of a B-Tree:

#include <iostream>
#include <vector>
using namespace std;

// B-Tree Node Structure
class BTreeNode {
public:
    vector<int> keys;
    vector<BTreeNode*> children;
    bool isLeaf;

    BTreeNode(bool leaf) : isLeaf(leaf) {}

    void insertNonFull(int key);
    void splitChild(int i, BTreeNode* child);
    void traverse();

    BTreeNode* search(int key);

    friend class BTree;
};

// B-Tree Structure
class BTree {
    BTreeNode* root;
    int order; // Maximum number of children

public:
    BTree(int t) : root(nullptr), order(t) {}

    void traverse() {
        if (root) root->traverse();
    }

    BTreeNode* search(int key) {
        return (root == nullptr) ? nullptr : root->search(key);
    }

    void insert(int key);
};

void BTreeNode::traverse() {
    int i;
    for (i = 0; i < keys.size(); i++) {
        if (!isLeaf) {
            children[i]->traverse();
        }
        cout << keys[i] << " ";
    }

    if (!isLeaf) {
        children[i]->traverse();
    }
}

BTreeNode* BTreeNode::search(int key) {
    int i = 0;
    while (i < keys.size() && key > keys[i]) {
        i++;
    }

    if (i < keys.size() && keys[i] == key) {
        return this;
    }

    if (isLeaf) {
        return nullptr;
    }

    return children[i]->search(key);
}

void BTree::insert(int key) {
    if (!root) {
        root = new BTreeNode(true);
        root->keys.push_back(key);
    } else {
        if (root->keys.size() == order - 1) {
            BTreeNode* newRoot = new BTreeNode(false);
            newRoot->children.push_back(root);
            newRoot->splitChild(0, root);

            int i = (newRoot->keys[0] < key) ? 1 : 0;
            newRoot->children[i]->insertNonFull(key);

            root = newRoot;
        } else {
            root->insertNonFull(key);
        }
    }
}

void BTreeNode::insertNonFull(int key) {
    int i = keys.size() - 1;

    if (isLeaf) {
        keys.push_back(0);
        while (i >= 0 && keys[i] > key) {
            keys[i + 1] = keys[i];
            i--;
        }
        keys[i + 1] = key;
    } else {
        while (i >= 0 && keys[i] > key) {
            i--;
        }
        i++;

        if (children[i]->keys.size() == order - 1) {
            splitChild(i, children[i]);

            if (keys[i] < key) {
                i++;
            }
        }
        children[i]->insertNonFull(key);
    }
}

void BTreeNode::splitChild(int i, BTreeNode* child) {
    int mid = (order - 1) / 2;
    BTreeNode* sibling = new BTreeNode(child->isLeaf);

    sibling->keys.assign(child->keys.begin() + mid + 1, child->keys.end());
    child->keys.resize(mid);

    if (!child->isLeaf) {
        sibling->children.assign(child->children.begin() + mid + 1, child->children.end());
        child->children.resize(mid + 1);
    }

    children.insert(children.begin() + i + 1, sibling);
    keys.insert(keys.begin() + i, child->keys[mid]);
}

int main() {
    BTree tree(3);

    tree.insert(10);
    tree.insert(20);
    tree.insert(5);
    tree.insert(6);
    tree.insert(12);
    tree.insert(30);
    tree.insert(7);
    tree.insert(17);

    cout << "Traversal of the tree: ";
    tree.traverse();
    cout << endl;

    int key = 6;
    if (tree.search(key)) {
        cout << key << " is found in the tree." << endl;
    } else {
        cout << key << " is not found in the tree." << endl;
    }

    return 0;
}

Output

Traversal of the tree: 5 6 7 10 12 17 20 30
6 is found in the tree.

Applications of B-Trees

Databases: Efficient indexing and retrieval of records.
File Systems: Managing hierarchical file structures.
Search Engines: Storing and retrieving search terms.