
If you’ve ever wondered how a database can find one specific row out of millions in milliseconds, the answer is likely a B-Tree.
Unlike a standard Binary Search Tree (BST), which can become unbalanced and slow, a B-Tree is a self-balancing search tree designed to handle large amounts of data efficiently.
Why B-Trees for Databases?
In a database, data is often stored on a disk rather than in RAM. Accessing a disk is significantly slower than accessing memory.
-
Minimizing Disk I/O: B-Trees are “fat” and “short.” While a BST might have a height of 20 to find an element, a B-Tree with a high branching factor (order) might only have a height of 3. This means fewer “hops” to find your data.
-
Sorted Storage: Because the keys are kept in order, range queries (e.g., “find all users aged 20 to 30”) are incredibly fast.
-
Predictable Performance: The self-balancing nature ensures that the time complexity for search, insertion, and deletion remains O(log n).
Understanding the B-Tree Logic
Before coding, keep these rules in mind for a B-Tree of degree t:
-
Every node (except the root) must have at least t-1 keys.
-
Every node can have at most 2t-1 keys.
-
A legal node with n keys has n+1 children.
-
All leaves must be at the same depth.
Python Implementation
Here is a simplified version of a B-Tree insertion algorithm.
We focus on the split mechanism, which is the “magic” that keeps the tree balanced.
class BTreeNode:
def __init__(self, leaf=False):
self.leaf = leaf
self.keys = []
self.child = []
class BTree:
def __init__(self, t):
self.root = BTreeNode(True)
self.t = t # Minimum degree
def insert(self, k):
root = self.root
if len(root.keys) == (2 * self.t) - 1:
# If root is full, the tree grows in height
temp = BTreeNode()
self.root = temp
temp.child.insert(0, root)
self.split_child(temp, 0)
self.insert_non_full(temp, k)
else:
self.insert_non_full(root, k)
def split_child(self, x, i):
t = self.t
y = x.child[i]
z = BTreeNode(y.leaf)
# Move the second half of y's keys to z
x.child.insert(i + 1, z)
x.keys.insert(i, y.keys[t - 1])
z.keys = y.keys[t: (2 * t) - 1]
y.keys = y.keys[0: t - 1]
if not y.leaf:
z.child = y.child[t: 2 * t]
y.child = y.child[0: t]
def insert_non_full(self, x, k):
i = len(x.keys) - 1
if x.leaf:
x.keys.append(None)
while i >= 0 and k < x.keys[i]:
x.keys[i + 1] = x.keys[i]
i -= 1
x.keys[i + 1] = k
else:
while i >= 0 and k < x.keys[i]:
i -= 1
i += 1
if len(x.child[i].keys) == (2 * self.t) - 1:
self.split_child(x, i)
if k > x.keys[i]:
i += 1
self.insert_non_full(x.child[i], k)
def print_tree(self, x, l=0):
print("Level", l, " ", len(x.keys), end=":")
for i in x.keys:
print(i, end=" ")
print()
l += 1
if len(x.child) > 0:
for i in x.child:
self.print_tree(i, l)
# Usage
db_index = BTree(3)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
db_index.insert(key)
db_index.print_tree(db_index.root)
How this speeds up your DB
When you create an index on a column (like user_id), the database builds this structure in the background.
Instead of a Full Table Scan (checking every single row), the database engine:
-
Loads the Root node.
-
Compares your ID to the keys in the node to decide which child pointer to follow.
-
Repeats until it finds the exact pointer to the data on the disk.
This reduces the search space from N to log N almost instantly.
Useful links below:
Let me & my team build you a money making website/blog for your business https://bit.ly/tnrwebsite_service
Get Bluehost hosting for as little as $1.99/month (save 75%)…https://bit.ly/3C1fZd2
Best email marketing automation solution on the market! http://www.aweber.com/?373860
Build high converting sales funnels with a few simple clicks of your mouse! https://bit.ly/484YV29
Join my Patreon for one-on-one coaching and help with your coding…https://www.patreon.com/c/TyronneRatcliff
Buy me a coffee https://buymeacoffee.com/tyronneratcliff




