≡ Menu

Building a B-Tree In Python: The Backbone of Database Indexing

If you’ve ever wondered how a database can find one specific row out of millions in milliseconds, the answer is likely a B-Tree.

Unlike a standard Binary Search Tree (BST), which can become unbalanced and slow, a B-Tree is a self-balancing search tree designed to handle large amounts of data efficiently.


Why B-Trees for Databases?

In a database, data is often stored on a disk rather than in RAM. Accessing a disk is significantly slower than accessing memory.

  • Minimizing Disk I/O: B-Trees are “fat” and “short.” While a BST might have a height of 20 to find an element, a B-Tree with a high branching factor (order) might only have a height of 3. This means fewer “hops” to find your data.

  • Sorted Storage: Because the keys are kept in order, range queries (e.g., “find all users aged 20 to 30”) are incredibly fast.

  • Predictable Performance: The self-balancing nature ensures that the time complexity for search, insertion, and deletion remains O(log n).


Understanding the B-Tree Logic

Before coding, keep these rules in mind for a B-Tree of degree t:

  1. Every node (except the root) must have at least t-1 keys.

  2. Every node can have at most 2t-1 keys.

  3. A legal node with n keys has n+1 children.

  4. All leaves must be at the same depth.


Python Implementation

Here is a simplified version of a B-Tree insertion algorithm.

We focus on the split mechanism, which is the “magic” that keeps the tree balanced.

class BTreeNode:
    def __init__(self, leaf=False):
        self.leaf = leaf
        self.keys = []
        self.child = []

class BTree:
    def __init__(self, t):
        self.root = BTreeNode(True)
        self.t = t  # Minimum degree

    def insert(self, k):
        root = self.root
        if len(root.keys) == (2 * self.t) - 1:
            # If root is full, the tree grows in height
            temp = BTreeNode()
            self.root = temp
            temp.child.insert(0, root)
            self.split_child(temp, 0)
            self.insert_non_full(temp, k)
        else:
            self.insert_non_full(root, k)

    def split_child(self, x, i):
        t = self.t
        y = x.child[i]
        z = BTreeNode(y.leaf)
        
        # Move the second half of y's keys to z
        x.child.insert(i + 1, z)
        x.keys.insert(i, y.keys[t - 1])
        
        z.keys = y.keys[t: (2 * t) - 1]
        y.keys = y.keys[0: t - 1]
        
        if not y.leaf:
            z.child = y.child[t: 2 * t]
            y.child = y.child[0: t]

    def insert_non_full(self, x, k):
        i = len(x.keys) - 1
        if x.leaf:
            x.keys.append(None)
            while i >= 0 and k < x.keys[i]:
                x.keys[i + 1] = x.keys[i]
                i -= 1
            x.keys[i + 1] = k
        else:
            while i >= 0 and k < x.keys[i]:
                i -= 1
            i += 1
            if len(x.child[i].keys) == (2 * self.t) - 1:
                self.split_child(x, i)
                if k > x.keys[i]:
                    i += 1
            self.insert_non_full(x.child[i], k)

    def print_tree(self, x, l=0):
        print("Level", l, " ", len(x.keys), end=":")
        for i in x.keys:
            print(i, end=" ")
        print()
        l += 1
        if len(x.child) > 0:
            for i in x.child:
                self.print_tree(i, l)

# Usage
db_index = BTree(3)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
    db_index.insert(key)

db_index.print_tree(db_index.root)

How this speeds up your DB

When you create an index on a column (like user_id), the database builds this structure in the background.

Instead of a Full Table Scan (checking every single row), the database engine:

  1. Loads the Root node.

  2. Compares your ID to the keys in the node to decide which child pointer to follow.

  3. Repeats until it finds the exact pointer to the data on the disk.

This reduces the search space from N to log N almost instantly.

Useful links below:

Let me & my team build you a money making website/blog for your business https://bit.ly/tnrwebsite_service

Get Bluehost hosting for as little as $1.99/month (save 75%)…https://bit.ly/3C1fZd2

Best email marketing automation solution on the market! http://www.aweber.com/?373860

Build high converting sales funnels with a few simple clicks of your mouse! https://bit.ly/484YV29

Join my Patreon for one-on-one coaching and help with your coding…https://www.patreon.com/c/TyronneRatcliff

Buy me a coffee ☕️https://buymeacoffee.com/tyronneratcliff

{ 0 comments… add one }

Leave a Comment