I built a disk-backed extendible hash table for an open source DBMS
[ posted 2024-04-21 @ 7:36 AM ]
I'm pretty stoked about my latest project: a disk-backed extendible hash table for the BusTub DBMS. This project marks the second assignment from the CMU 15-445 graduate course "Database Systems" at Carnegie Mellon University. Building on the foundation laid by the buffer pool manager project, this assignment delved into the complexities of implementing an extendible hash table, a critical data structure used for indexing and organizing data efficiently.
One of the primary goals of this project was to design a hash table that could handle a large number of key-value pairs efficiently while not having to store the entire table in memory all at once. To achieve this, I implemented the extendible hash table using a combination of in-memory data structures for caching frequently accessed pages and disk-based storage for persisting data across system restarts.
Implementing a disk-backed extendible hash table presented several challenges, particularly in managing the dynamic resizing capability of the hash table and implementing thread-safe page guards with the help of C++ move semantics. I addressed latter challenge by referencing Bjarne Stroustrups book A Tour of C++, which is an excellent source of knowledge from the man who actually created the language. Check out the video to see how I worked my way through the assignment.
This assignment was really fun and I'm definitely hyped for the next one: using the extendible hash table to implement the components that facilitate SQL query execution!