Backend Weekly
Posts
Part 5: Understanding System Design - Distributed Caching

Part 5: Understanding System Design - Distributed Caching

You will learn everything you need about distributed caching, its Benefits, what caching is, Caching strategies, Caching eviction policies, and caching layers.

Solomom Eseme
April 27, 2024

Hello “👋”

Welcome to another week, another opportunity to become a Great Backend Engineer.

Today’s issue is brought to you by LaraFast, a Laravel SaaS Starter Kit with ready-to-go components for Payments, Auth, Admin, Blog, SEO, and more.

Before we get down to the business of today. Part 5 of Understanding System Design.

I have a gift for you: You will love this one.

If you build products with Laravel without a boilerplate, then you will end up repeating yourself on:

Auth
Payments Integration
Admin Dashboard
Blog
SEO
etc

Larafast has already taken care of all these.

Why not get the boilerplate and focus only on the business logic while the boilerplate handles those repetitive tasks?

You can GET THE LARAFAST boilerplate and build products in weeks, not months. Use my link above to get $100 off.

Check this out (Sponsored)

Ride the wave of 23% compounded annual growth

That’s the forecasted growth rate of the smart shades between 2023-2033. And RYSE’s automated window shade tech is positioned to dominate the market. They’ve generated over 20X growth in share price for early shareholders, with significant upside remaining as they launch in over 100 Best Buy stores. Invest in the rapidly growing smart shades market →

Now, back to the business of today.

In the previous edition, I will discuss one of the system's components, starting with Load Balancers. You learned everything you needed about load balancers, their Benefits, Load Balancing Algorithms, Load balancing Balancers vs. Reverse Proxy, and more. Check it out here if you haven’t.

In this episode, I will further elucidate another system component to help you understand System Design.

We will look at Distributed Caching.

You will learn everything you need about distributed caching, its Benefits, what caching is, Caching strategies, Caching eviction policies, and caching layers.

What is Caching?

Distributed Caching is a subset of Caching, and before we can explore distributed caching fully, we need to have a better understanding of Caching.

Therefore, Caching is a technique for storing and retrieving frequently accessed data or computation to speed up subsequent requests.

The most important thing about Caching is that it helps large applications scale and respond to requests faster by storing data in easily accessible caches.

Real World analogy

Let’s say you prepare dinner every day. You need ingredients for your recipe. When you cook, is it efficient to go to the store every time to buy the ingredients? Not. Going to the store every time you want to cook is time-consuming.

Buying ingredients and storing them in the refrigerator/pantry makes more sense. This will save you time. In this case, your refrigerator/pantry acts like a cache.

This is similar to computers. The CPU has a cache in the processor so that it doesn’t have to make data requests from the RAM or Disk every time. Also, accessing data from the RAM is faster than accessing data from the disk.

Caching acts as a temporary local storage for data.

Now, talking about local storage for data. Let’s explore types of Caching

Different Types of Caching

There are different types of caching; sometimes, different companies can customize them to fit their needs. However, there are two broad categories of Caching.

Local Caching
Distributed Caching

What is Local Caching?

In local storage, data is stored in a single machine or application. This type of caching is the commonly used scenario where data is stored, retrieved, and updated within a single machine. Additionally, the volume of data in local caching is relatively small, and some of the examples of local caching include browser caches or application-level caches.

What is Distributed Caching?

In distributed caching, data is stored across multiple machines or nodes, often in a network. This type of caching is essential for applications that need to scale across multiple servers or distributed.

One essential advantage of distributed caching is that data is stored, distributed, and made available close to where it is needed, even if the original data is stored far away.

Distributed Caching (source: Hazelcast)

Let’s look at the example of local and distributed caching to help us understand better.

Consider a Fintech application with thousands of requests per second. If the application uses only local caching, the data might be stored on the server where the website is hosted.

However, as the website scales, traffic increases, and people start accessing the application from different regions of the world. The local caching approach can lead to a bottleneck.

A better approach would be to use distributed caching, where the data can be stored across multiple cache servers in different regions. When a user accesses the website, the system retrieves the data from the nearest cache server, ensuring faster response times and a better user experience.

If you’re starting your application development journey, the most important thing to understand is “when to use caching.”

When to use caching

You can’t store all your system information in a cache because caching hardware is limited in storage and more expensive than normal databases. Also, the search time will significantly increase when you store tons of data.

Therefore, a cache should contain the most relevant data, such as reads. There are many more reads to handle than writes.

For instance, Twitter has an average of 300k requests on reads/second and only 6,000 writes/second. Caching tweets according to the user’s timeline greatly improves the system's performance and user experience.

Caching is mostly useful in the following scenarios:

Storing results of requests that are made many times to minimize data retrieval operations, especially on data that is immutable(does not change often ).
Storing results of complex and taxing computational problems reduces the system latency.

Benefits of caching

Improved Application Performance - Memory is 50-200 times faster than disk(magnetic or SSD); therefore, reading from in-memory is extremely fast. The fast data access from the cache greatly improves the system's performance.
Reduce latency - Latency is a measure of delay. Modern applications like Amazon may experience high traffic during Black Friday and Christmas. Increased load on the databases results in higher latency in getting data, which makes the overall application slow. This may cost Amazon billions of dollars. Utilizing an in-memory cache can avoid this issue since it greatly improves system performance by reducing latency.
Increase Read Throughput—Besides lower latency, caching greatly increases throughput. Throughput refers to how much data can be processed within a specific period. A single instance cache can serve hundreds of thousands of requests a second, greatly improving system performance and scalability during spikes.
Reduce load on the database—Directing reads to the cache reduces the load and protects it from slower performance or crashing during spikes.
Reduce Database Cost—A single cache instance can provide hundreds of thousands of Input/Output operations per second, potentially replacing the need for multiple database instances and driving the database cost down.

Key Components of Distributed Caching

You need to look deeply into two key components when implementing distributed caching. viz:

Cache Servers
Data Replication and Strategies

Cache Servers

The primary component of a distributed caching system is the Cache Server. It temporarily stores data across multiple nodes and ensures it is available near where it’s needed.

Each cache server can operate independently, and in case of a server failure, the system can reroute requests to another server, ensuring high availability and fault tolerance.

Data Replication and Strategies

To ensure efficiency in storing and retrieving data across multiple nodes in distributed caching, Data Replication is used. Also, there are several data replication strategies listed below:

Consistent hashing → In this strategy, data is evenly distributed across cache servers, minimizing data movement when new servers are added or existing ones are removed.
Virtual nodes → In this strategy, virtual nodes are used to handle scenarios where cache servers have varying capacities. They ensure that data distribution remains balanced even if some servers have higher storage capacities than others.

Replication is another crucial aspect of distributed caching, as it ensures data availability across different nodes in different regions.

That would be all for this week.

Today, I discussed distributed caching and introduced you to everything you need about it, including its benefits, what caching is, and its key components.

Next week, I will cover more topics under distributed caching, such as Caching Strategies, Caching Eviction Policy, and implementing a distributed cache.

Don’t miss it.

DON’T LEARN ALONE. SHARE THIS NEWSLETTER WITH YOUR FRIENDS

Let me know if this guide gives you perspective on Distributed Caching.

That will be all for this one. See you on Saturday.

Don’t forget to get the Larafast Boilerplate. It comes with unmatched benefits for building products in days.

Check this out (Sponsored)

Ride the wave of 23% compounded annual growth

Top 5 Remote Backend Jobs this week

Here are the top 5 Backend Jobs you can apply to now.

👨‍💻 Fincra
✍️ Senior Backend Engineer
📍Remote, Africa
💰 Click on Apply for salary details
Click here to Apply for this role.

👨‍💻 BandLab Technologies
✍️ Senior Backend Engineer, Ads Team
📍Remote, Asia, Africa, Europe
💰 Click on Apply for salary details
Click here to Apply for this role.

👨‍💻 WunderGraph
✍️ Senior Golang Engineer - Remote (EMEA)
📍Remote, Go, Golang
💰 Click on Apply for salary details
Click here to Apply for this role.

👨‍💻 eShaafi
✍️ 𝗕𝗮𝗰𝗸𝗲𝗻𝗱 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿
📍Remote
💰 Click on Apply for salary details
Click here to Apply for this role.

Want more Remote Backend Jobs? Visit GetBackendJobs.com

Backend Engineering Resources

Whenever you're ready

There are 4 ways I can help you become a great backend engineer:

1. The MB Platform: Join 1000+ backend engineers learning backend engineering on the MB platform. Build real-world backend projects, track your learnings and set schedules, learn from expert-vetted courses and roadmaps, and solve backend engineering tasks, exercises, and challenges.

2. The MB Academy: The “MB Academy” is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.

3. MB Video-Based Courses: Join 1000+ backend engineers who learn from our meticulously crafted courses designed to empower you with the knowledge and skills you need to excel in backend development.

4. GetBackendJobs: Access 1000+ tailored backend engineering jobs, manage and track all your job applications, create a job streak, and never miss applying. Lastly, you can hire backend engineers anywhere in the world.

LAST WORD 👋

How am I doing?

I love hearing from readers, and I'm always looking for feedback. How am I doing with The Backend Weekly? Is there anything you'd like to see more or less of? Which aspects of the newsletter do you enjoy the most?

Hit reply and say hello - I'd love to hear from you!

Stay awesome,
Solomon

I moved my newsletter from Substack to Beehiiv, and it's been an amazing journey. Start yours here.

Reply

or to participate.