- Backend Weekly
- Posts
- Understanding Database Indexing
Understanding Database Indexing
In this episode, I will explain Database Indexing, its types, indexing architectures, drawbacks, and benefits.
Hello “👋”
Welcome to another week, another opportunity to become a Great Backend Engineer.
Today’s issue is brought to you by BuilderKit → NextJS AI SaaS Boilerplate with Landing pages, Waitlist Pages, Auth pages, Auth, Payments, Database, Mails, Customer Support, Admin Dashboard and 10 Production Ready AI SaaS Apps
Before we get down to the business of today. Part 12 of Understanding System Design.
I have a special announcement for you: You will love this one.
Finally, I’m starting my Advanced Backend Bootcamp 2.0. Since I launched the first one, I have been preparing for this with my team for over a year.
This bootcamp is different. It’s a 4-6 month in-depth backend engineering Bootcamp diving deep into backend engineering and building real-world backend projects.
Not to say much, because I’m fully prepared to turn you into a great backend engineer.
Click here for all the information you need.
Now, back to the business of today.
In the previous edition, I discussed Database Sharding and explored how it works, sharding architectures, benefits, and alternatives to database sharding.
In this episode, I will explain Database Indexing, its types, indexing architectures, drawbacks, and benefits.
Overview of Database Indexing
Database indexing improves the database search for any records when you issue a query.
Imagine that you have a library with thousands of books, and a customer walks in and requests a book by its title. Without organization, you will have to search through all the books (in the worst case) to find the book you’re looking for. You will agree with me that this process is inefficient and time-consuming.
However, locating any book will be easy if the library has a well-designed indexing system. The indexing system will create a catalog listing all the books in alphabetical order (or any order you want) and their corresponding shelf numbers.
That’s exactly how database indexing works.
What is Database Indexing?
A database index is a powerful tool for quickly retrieving records from a database file. It is a data structure that improves the speed of the data retrieval process.
From our illustration above, the library represents a database, the books represent the data stored, and the library index and organization represent database indexing.
Example
Let’s say you have a large database with a user’s table containing information for about 1 million users.
ID | FirstName | LastName | Country | |
---|---|---|---|---|
1 | James | Omollo | Kenya | |
2 | Gbemisola | Oluwasegun | Nigeria | |
3 | Trishna | Pranav | India | |
4 | Esther | Coleman | UK | |
5 | Zara | Shahidi | USA |
Assuming Zara is the last record on our table, querying the database for her details will take a long time since you will have to traverse through all the 1 million records(full table scan).
If the data was ordered alphabetically using the first name, searching could happen faster since we could jump halfway through the data to see if Zara comes before or after that row.
We could then half the remaining rows and make the same comparison using binary search. Indices help us create a sorted list without creating sorted tables, which would take up a lot of space.
Let’s create an index and see how it maps back to the user table using the “FirstName” column. In this example, we will use Postgres Database.
CREATE INDEX users_firstName ON users(FirstName);
FirstName | ID |
---|---|
Esther | 4 |
James | 1 |
Paul | 2 |
Trishna | 3 |
Zara | 5 |
The Index has the user’s first names sorted in alphabetical order. When you perform the SQL Query below to get a user’s record using FirstName
as the filter, the database can swiftly locate the relevant rows rather than scanning the entire table sequentially.
The key will be used to find the rows that match a specific FirstName
quickly. This indexing technique significantly improves query performance, especially when dealing with larger datasets.
SELECT FirstName From Users WHERE FirstName = 'Zara';
Remember, indexing is not limited to a single column. Depending on your specific use cases and query patterns, you can create indexes on multiple columns or even compound indexes that span multiple columns.
Index architectures
1. Clustered Index
Popular SQL databases, such as SQL servers, automatically create a clustered index if a constraint or primary key is defined on a table. Based on the primary key, each table can only have one clustered index.
The PK defines the order in which the data is physically stored on a disk. Use the SQL script below to create a user table and insert data in an SQL Server Database.
CREATE TABLE users (
ID INTEGER PRIMARY KEY NOT NULL,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Email VARCHAR(100) NOT NULL,
Country VARCHAR(50) NOT NULL
);
INSERT INTO users (ID, FirstName, LastName, Email, Country) VALUES
(2, 'Gbemisola', 'Oluwasegun', '[email protected]', 'Nigeria'),
(3, 'Trishna', 'Pranav', '[email protected]', 'India'),
(1, 'James', 'Omollo', '[email protected]', 'Kenya'),
(5, 'Zara', 'Shahidi', '[email protected]', 'USA'),
(4, 'Esther', 'Coleman', '[email protected]', 'UK')
When you query the database:
SELECT * FROM users
We get the user’s data ascending by the ID(PK). Remember, we did not insert the data ascending by the PK.
The database automatically creates a clustered index. Run the following command to confirm.
Execute sp_helpindex users;
A clustered index determines the physical order of the data rows in a table. However, inserts and updates can be more expensive than non-clustered Indices, as they may require data to be physically rearranged.
2. Non-Clustered Index
Our library analogy is a good example of a non-clustered index. The data is stored in one place, and the index is stored in another place. The index contains pointers to where the data is stored. Since a non-clustered stores data separate from actual data, the table can contain more than one clustered index.
Just like a book in our example above, it can have an index by chapters at the beginning and another index at the end. Inserts and updates on non-clustered Indices are generally faster than on clustered Indices, as they don’t require data rearranging.
Non-clustered Indexes are commonly used to optimize query performance for specific columns that are frequently accessed.
Types of Database Indices Implementations
Databases Indices use data structures and algorithms to access indexed data efficiently. The most popular ones are:
1. Binary-Tree(B-Tree)
This is the most popular database index data structure. It organizes data in a self-balancing tree structure, typically a binary search tree. Balancing the tree and minimizing the number of disk accesses required provide efficient searching, insertion, and deletion operations.
2. Hash
Hash indexes use a hash function to map keys to specific locations in the index. The hash function distributes the data uniformly across the index, allowing for fast lookup operations.
Hashing architectures are efficient for equality searches, but they may suffer from collisions, where multiple keys hash to the same location, requiring additional handling.
Benefits of database Indices
Improves query performance for frequently accessed columns: If certain columns are frequently used in queries, using searching, filtering, or join operations, indexing them is beneficial for retrieving data quickly and improving query performance.
Faster Data Retrieval for Large Tables: Indexing is useful for large tables containing many records. Without using Indices, querying large tables can be very slow and resource-intensive.
Improved Scalability: Indexing improves scalability in a database system. As the amount of data grows, Indices provides a way to access and retrieve data efficiently, ensuring that query performance remains consistent even with large datasets.
Drawbacks of Database Indices
While database Indexing offers numerous benefits, it has some the following drawbacks:
Overhead during Data Modification: When performing data modifications such as creating, updating, or deleting a table that has an index, the Indices need to be updated to reflect the data changes. These operations incur some overhead. The more Indices a database has, the longer it takes to modify data, potentially affecting write operations.
Increased Storage Space: Indices require additional storage space to store the index data structures. Indices can consume a significant amount of disk space, and the space requirement increases proportionally with the number of indexed columns and the size of indexed data.
Database indexing is an important technique you can add to your toolbox to improve query performance and data retrieval in databases. When you create Indices on frequently accessed columns, the database can quickly map and retrieve data based on the specified conditions, resulting in faster execution times.
Indexing is particularly beneficial in large tables, complex queries involving multiple table joins, and frequently accessed tables. It’s important to carefully consider the columns and queries that would benefit most from indexing. Creating Indexes for every column can result in unnecessary overhead and additional costs.
That will be all for this week. I like to keep this newsletter short.
Today, I discussed Database Indexing, its types, indexing architectures, drawbacks, and benefits.
Next week, I will start exploring the CAP Theorem.
Don’t miss it. Share with a friend
Did you learn any new things from this newsletter this week? Please reply to this email and let me know. Feedback like this encourages me to keep going.
See you on Next Week.
Remember to join the Advanced Backend Bootcamp 2.0, which will start soon. You can reply to this email with any questions or concerns.
Top 5 Remote Backend Jobs this week
Here are the top 5 Backend Jobs you can apply to now.
👨‍💻 Deel
✍️ Senior Back-End Engineer (Node.js/TypeSript)
đź“ŤRemote, Nigeria, Africa
đź’° Click on Apply for salary details
Click here to Apply for this role.
👨‍💻 Firsty
✍️ Senior Backend Developer
đź“ŤRemote
đź’° Click on Apply for salary details
Click here to Apply for this role.
👨‍💻 SFOX
✍️ Backend Software Engineer, Trading Infrastructure
đź“ŤRemote
đź’° Click on Apply for salary details
Click here to Apply for this role.
👨‍💻TransferGo
✍️ Backend Engineer - Cards & Integrations
đź“ŤRemote, London, United Kingdom
đź’° Click on Apply for salary details
Click here to Apply for this role.
Want more Remote Backend Jobs? Visit GetBackendJobs.com
Backend Engineering Resources
Whenever you're ready
There are 4 ways I can help you become a great backend engineer:
1. The MB Platform: Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.
2. ​The MB Academy:​ The “MB Academy” is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.
3. MB Video-Based Courses: Join 1000+ backend engineers who learn from our meticulously crafted courses designed to empower you with the knowledge and skills you need to excel in backend development.
4. GetBackendJobs: Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.
LAST WORD đź‘‹
How am I doing?
I love hearing from readers, and I'm always looking for feedback. How am I doing with The Backend Weekly? Is there anything you'd like to see more or less of? Which aspects of the newsletter do you enjoy the most?
Hit reply and say hello - I'd love to hear from you!
Stay awesome,
Solomon
I moved my newsletter from Substack to Beehiiv, and it's been an amazing journey. Start yours here.
Reply