Reference Guide · Updated April 2026

System Design Fundamentals: The Complete Reference

Every core concept you need to reason about large-scale systems, in one place. Load balancers, caches, databases, CAP theorem, consistent hashing, microservices, message queues, and everything in between. Each concept explained in plain English, with when you'd use it and why it matters.

By Arslan Ahmad · Founder, Design Gurus~15 min read · Bookmark this

System design fundamentals are the building blocks every engineer uses when designing large-scale systems. Think of them as the vocabulary of architecture. You can't reason about a problem you don't have words for, and you can't communicate with senior engineers without this vocabulary.

This page collects the core concepts into one reference. Each one gets a short entry with three parts: what it is, why it matters, and when you'd use it. Read it top to bottom the first time. Come back to specific entries as a refresher when a problem calls for them.

The concepts are grouped into five categories: how systems communicate (networking), how they store data (storage), how they stay consistent at scale (distributed systems), how they are structured (architecture patterns), and how they stay reliable under failure (reliability).

01 · Communication & Networking

How Systems Talk to Each Other

Every request in a distributed system travels across a network. These concepts describe how that traffic is routed, delivered, and managed.

DNS (Domain Name System)

What: DNS translates human-readable domain names like designgurus.io into the numeric IP addresses that computers use to locate each other.

Why: Without it, users would have to memorize IP addresses for every site. DNS is also the first step in every web request, which makes it a quiet performance factor.

When: Everywhere. Interviewers rarely ask you to design DNS, but they expect you to know it is part of the request path.

Load Balancer

What: A component that sits in front of your servers and spreads incoming traffic across them using algorithms like round-robin, least connections, or IP hash.

Why: Without a load balancer, one server would get all the traffic while others sit idle. Load balancers also provide failover when a server goes down.

When: The moment you have more than one server, you have a load balancer, even if it's just DNS round-robin. It is the most common first component added when a system starts to scale.

API Gateway

What: A single entry point that sits in front of many backend services. It handles routing to the right service, authentication, rate limiting, and request transformation.

Why: Clients shouldn't know about your internal service layout. An API gateway hides that complexity and gives you a place to enforce cross-cutting concerns like auth and rate limits.

When: Microservices architectures, external APIs consumed by third parties, or any system where you want a single policy-enforcement point.

Proxy and Reverse Proxy

What: A proxy sits between a client and the internet, forwarding requests on behalf of the client. A reverse proxy sits between the internet and your servers, forwarding requests on behalf of the server.

Why: Proxies add caching, security filtering, and logging without changing the client or server. Reverse proxies are also where TLS termination and load balancing often happen.

When: Every production web system has at least one reverse proxy in front of it. Forward proxies show up in corporate networks and content filtering systems.

CDN (Content Delivery Network)

What: A geographically distributed network of servers that cache static content (images, videos, scripts) close to users so requests don't have to travel to your origin server.

Why: The speed of light is a real limit. A user in Tokyo requesting content from a server in Virginia will always be slower than a user in Tokyo getting it from a nearby CDN edge.

When: Any system serving static assets to a global audience. Also increasingly used for dynamic content caching and edge computing.

REST vs RPC

What: Two styles of API design. REST treats APIs as operations on resources using standard HTTP verbs. RPC treats APIs as remote function calls, often with a more efficient binary format like gRPC.

Why: The style you pick affects how clients use your system, how you evolve the API, and how much overhead each call has. REST is simpler and more cache-friendly. RPC is typically faster and better for internal service-to-service calls.

When: REST for public APIs and browser-facing endpoints. gRPC or other RPC for high-throughput internal service communication.

WebSockets

What: A protocol that upgrades an HTTP connection into a persistent, bidirectional channel. Either side can push data at any time without the client needing to poll.

Why: HTTP is request-response only. Anything real-time (chat, live notifications, multiplayer games) needs a push channel, and polling is wasteful.

When: Real-time features like chat, live dashboards, collaborative editing, and presence indicators.

Long Polling vs Server-Sent Events

What: Two lighter-weight alternatives to WebSockets. Long polling keeps a request open until the server has data. Server-Sent Events (SSE) uses a persistent HTTP response to stream one-way updates from server to client.

Why: Both let the server push without the complexity of a full bidirectional WebSocket connection. SSE is especially good when you only need server-to-client updates.

When: Long polling when you need a fallback for environments that block WebSockets. SSE for notification feeds, live scores, or stock tickers.

02 · Data Storage

How Systems Store and Retrieve Data

Every interesting system has data. These concepts describe how that data is stored, organized, and made fast enough to serve at scale.

SQL Databases

What: Relational databases (MySQL, Postgres, Oracle) that store data in tables with fixed schemas. They support ACID transactions and complex joins.

Why: When your data has clear relationships and you need strong consistency, nothing beats SQL. Transactions guarantee that related changes succeed or fail together.

When: Financial systems, order management, user accounts, any system where consistency matters more than raw scale. Most systems start here.

NoSQL Databases

What: A family of databases (MongoDB, Cassandra, DynamoDB, Redis) designed for scale and schema flexibility. Many use eventual consistency instead of strong consistency.

Why: NoSQL databases scale horizontally more easily than SQL databases. They also handle unstructured or rapidly changing data better.

When: Massive write throughput, flexible schemas, key-value lookups, graph traversals. Also common for caching, sessions, and real-time analytics.

Caching

What: Storing copies of frequently accessed data in faster memory (usually RAM) so repeated requests don't hit the slower backend.

Why: Caches can improve response times by 10x to 100x. They also reduce load on origin systems, which saves money and prevents cascading failures.

When: Read-heavy data that doesn't change often. User profiles, product catalogs, rendered pages, expensive query results. Avoid caching when staleness is unacceptable.

Data Partitioning (Sharding)

What: Splitting a large dataset across multiple database servers so each holds only a subset. Horizontal partitioning splits rows. Vertical partitioning splits columns.

Why: A single database has limits on storage and throughput. Sharding is how you get past those limits. Done well, it scales linearly.

When: When your dataset is too large for one machine, or your write throughput exceeds what one server can handle. Be careful: cross-shard queries are expensive.

Database Indexes

What: Auxiliary data structures (typically B-trees or hash maps) that let the database find rows matching a query without scanning the whole table.

Why: Without indexes, a lookup on a billion-row table is hopeless. With an index, it's O(log n). The trade-off is that indexes slow down writes, because the index has to be updated too.

When: On columns that appear in WHERE clauses, JOIN conditions, or ORDER BY clauses. Don't index every column. More indexes means slower writes and more storage.

Replication

What: Maintaining multiple copies of the same data across different servers. Writes go to a primary (or to all peers) and propagate to replicas synchronously, asynchronously, or semi-synchronously.

Why: Replication gives you read scaling (serve reads from any replica), failover (promote a replica when the primary fails), and geographic distribution (put replicas closer to users).

When: Almost always, in production. The only question is the replication mode and how you handle failover.

Bloom Filters

What: A space-efficient probabilistic data structure that tells you whether an element is "probably in the set" or "definitely not in the set." Some false positives, zero false negatives.

Why: When your real lookup is expensive (disk read, database query), a Bloom filter lets you skip it cheaply when the element can't possibly be there. Memory cost is tiny.

When: Checking whether a username is taken before querying the database. Checking whether a URL has been crawled. Avoiding unnecessary disk reads in LSM-tree databases like Cassandra.

03 · Distributed Systems

How Systems Stay Consistent at Scale

Once you have many servers working together, the laws of distributed systems start to bite. These are the foundational concepts every senior engineer needs to understand.

CAP Theorem

What: In the presence of a network partition, a distributed system must choose between consistency (all nodes see the same data) and availability (every request gets a response). You can't have both.

Why: It forces a choice you would otherwise avoid. Systems like banking pick consistency. Systems like social media pick availability. Neither is universally correct.

When: Any time you design or evaluate a distributed system. Interviewers often probe this explicitly with questions like "what happens when a network partition occurs?"

PACELC Theorem

What: An extension of CAP. If there is a partition (P), choose between availability and consistency (A / C). Else (E), choose between latency and consistency (L / C).

Why: CAP only tells you what happens during failures. PACELC tells you what happens during normal operation too. Most of the time, systems aren't in a partition, and they are still making trade-offs.

When: Evaluating database choices. DynamoDB, for example, is PA/EL (availability during partitions, latency during normal operation). Spanner is PC/EC (consistency in both cases).

Strong vs Eventual Consistency

What: Strong consistency means any read after a write sees that write. Eventual consistency means reads may see stale data briefly, but all replicas will converge eventually.

Why: Strong consistency is easier to reason about but slower and harder to scale. Eventual consistency is faster and more scalable but forces you to handle stale reads.

When: Use strong consistency for money, inventory, user accounts. Use eventual consistency for social feeds, analytics, non-critical metadata.

Consistent Hashing

What: A hashing technique that places both data and servers on a conceptual ring. When a new server is added or removed, only a small fraction of keys need to move.

Why: With naive modulo hashing, adding or removing a server remaps almost every key, which is catastrophic at scale. Consistent hashing limits the damage.

When: Distributed caches (like Memcached), distributed databases (Cassandra, DynamoDB), and any system where you shard by key across a changing set of nodes.

Quorum

What: A consensus mechanism where a write is considered successful only when it has been acknowledged by a majority (or some specified number) of replicas.

Why: Quorums give you tunable consistency. By requiring W + R > N (where N is total replicas), you guarantee that every read sees the latest write.

When: Distributed databases (Cassandra, DynamoDB), consensus protocols (Raft, Paxos), and any system where you want strong consistency without sacrificing too much availability.

Leader and Follower

What: A replication pattern where one node (the leader) accepts all writes and propagates them to followers. Followers can serve reads.

Why: It's simple, it gives consistent writes, and it scales reads well. The trade-off is that the leader is a single point of failure and a single point of write throughput.

When: Most managed SQL databases (RDS, Aurora), many NoSQL systems, and most systems where reads vastly outnumber writes.

04 · Architecture Patterns

How Systems Are Structured

Beyond individual components, there are whole-system patterns that shape how services are organized, deployed, and scaled.

Monolith vs Microservices

What: A monolith is a single deployable unit containing all the application's logic. Microservices split the application into many small, independently deployable services that communicate over the network.

Why: Monoliths are simpler to build and deploy but become bottlenecks as the team grows. Microservices enable independent teams and deployment, at the cost of operational complexity.

When: Start with a monolith. Move to microservices when you have clear service boundaries, independent team ownership, and operational maturity to handle distributed tracing, service discovery, and deployment complexity.

Stateful vs Stateless

What: A stateless service doesn't remember anything between requests. Any server can handle any request. A stateful service keeps data in memory that subsequent requests depend on.

Why: Stateless services scale horizontally trivially. You can add servers behind a load balancer and it just works. Stateful services require sticky sessions or shared state, which is harder.

When: Make services stateless by default and push state to a database or cache. Reserve stateful services for cases where in-memory state is a hard requirement (some real-time systems, some ML serving).

Event-Driven Architecture

What: A pattern where components communicate by emitting events (to a queue or event bus) rather than calling each other directly. Consumers subscribe to events they care about.

Why: Decouples producers from consumers. Adding a new consumer doesn't require changing the producer. Also gives you natural replay, auditing, and async processing.

When: Systems with many independent workflows triggered by user actions (order placed, user signed up, file uploaded). Also common in analytics pipelines and real-time notification systems.

Serverless Architecture

What: A model where you deploy individual functions (AWS Lambda, Google Cloud Functions) that execute on demand without you managing servers. The cloud provider handles scaling automatically.

Why: No server management. Pay only for actual execution time. Scales from zero to thousands of concurrent executions without configuration.

When: Event-driven workloads, sporadic traffic, or tasks with unpredictable scale. Poor fit for long-running processes, predictable high-traffic services, or workloads that need persistent connections.

Message Queues

What: A component that lets producers send messages to be processed asynchronously by consumers. Examples: RabbitMQ, Kafka, Amazon SQS, Google Pub/Sub.

Why: Decouples producers from consumers, absorbs traffic spikes, lets slow consumers catch up without dropping messages, and enables retries and dead-letter handling.

When: Background jobs, async processing, event distribution, buffering between fast producers and slow consumers. Essentially any time "this doesn't need to happen right now" is acceptable.

Rate Limiting

What: A mechanism that restricts how many requests a given client can make in a given time window. Common algorithms include token bucket, leaky bucket, and sliding window.

Why: Protects your system from abuse and overload, enforces fair usage across users, and lets you sell API access in tiers. Without it, one misbehaving client can take down your service.

When: Any public API, any user-facing endpoint that an attacker might hammer, and any internal service where a bug in one caller could overwhelm others.

05 · Reliability & Resilience

How Systems Survive Failure

Everything fails eventually. These concepts describe how well-designed systems detect failure, limit its damage, and recover automatically.

Heartbeat

What: Periodic signals that a server sends to say "I'm still alive." A monitor that stops receiving heartbeats assumes the server is down.

Why: Without heartbeats, you can't tell the difference between "slow" and "dead." Heartbeats give the system a way to trigger failover quickly.

When: Load balancer health checks, leader election, distributed consensus protocols, and any cluster that needs to react to node failures.

Checksum

What: A small value computed from a larger block of data. If the data is modified (intentionally or from corruption), the checksum changes.

Why: Data in transit can get corrupted. Data on disk can bit-rot. Checksums let the system detect corruption so it can request a fresh copy or fail loudly instead of silently serving bad data.

When: Network protocols (TCP), file systems (ZFS, Btrfs), backup systems, and distributed storage systems that replicate data across nodes.

Circuit Breaker

What: A pattern that stops calling a failing downstream service after too many failures. The breaker "opens" and fails fast until a timeout, then lets a test request through to see if recovery has happened.

Why: Prevents cascading failures. When a downstream service is overwhelmed, continued calls make it worse. Circuit breakers give it room to recover.

When: Any system that calls external services or other internal services. Essential in microservice architectures where failure in one service can ripple.

Redundancy

What: Duplicating critical components so that if one fails, another takes over. Redundancy can be at the server level, data center level, or region level.

Why: The only way to tolerate a failure is to have a backup already running. Redundancy is the foundation of high availability.

When: Everywhere you care about uptime. Load balancers, databases, file storage, network paths. The question is not whether to be redundant but how much redundancy costs you and how much downtime costs you.

Failover

What: The process of switching from a failed component to a backup. Failover can be automatic (triggered by heartbeat failure) or manual (triggered by an operator).

Why: Redundancy alone isn't useful if you can't actually switch to the backup when something fails. Failover is the mechanism that makes redundancy work.

When: Database primary-to-replica promotion, active-passive load balancer setups, cross-region disaster recovery. Every one of these needs a rehearsed failover plan.

How These Fundamentals Fit Together

The concepts above look like a grab bag until you see them work together in a real request. Walk through what happens when a user loads their Instagram feed, and almost every fundamental shows up:

The journey of one Instagram feed request

User's phone resolves instagram.com via DNS.

Request hits the nearest CDN edge, which serves cached static assets (images, JS, CSS) locally.

API request goes to a load balancer, which picks one of many application servers.

That server passes through an API gateway for authentication and rate limiting.

Feed service checks a cache first. Cache hit? Return immediately. Miss? Keep going.

Service queries a sharded database with replication, reading from a follower replica for speed.

Result goes back through the stack, populating the cache on the way out.

If any step failed, a circuit breaker prevents cascading failure and a heartbeat-triggered failover switches to a backup.

Every fundamental in this guide plays a role in that journey. Learning them in isolation is the first step. Seeing them compose is what makes you effective at system design.

The structured way to see them all in action is through real design problems. Each problem in a course like Grokking the System Design Interview walks through a different combination of these fundamentals applied to a real system: URL shortener, Instagram, Uber, YouTube, and more. If you'd rather get deeper on just the fundamentals first, Grokking System Design Fundamentals is a focused course on the building blocks alone.

Frequently Asked Questions

Common questions about system design fundamentals.

What are the most important system design fundamentals?

The most commonly used building blocks are: load balancers, caches, SQL and NoSQL databases, CDNs, message queues, and API gateways. On the conceptual side, the most important ideas are the CAP theorem, consistency models, sharding, and replication. If you understand these well, you can reason through most system design problems.

What is the difference between system design and system design fundamentals?

System design is the overall practice of planning how a large software product is built. System design fundamentals are the specific concepts and components you use in that practice: load balancers, caches, databases, consistency models, and so on. Fundamentals are the vocabulary. System design is the skill of applying that vocabulary to a problem.

Do I need to memorize every fundamental before doing system design?

No. You need solid understanding of the core building blocks (load balancing, caching, databases, sharding, replication) and the main distributed systems concepts (CAP, consistency models). More specialized concepts like Bloom filters or Merkle trees can be learned on demand when a specific problem calls for them.

What is the difference between a load balancer and an API gateway?

A load balancer distributes traffic across identical backend servers to balance load and provide failover. An API gateway is smarter: it routes different requests to different services, handles authentication, rate limiting, and request transformation. In practice, many systems use both, with the load balancer in front of the gateway and the gateway in front of the services.

When should I use SQL vs NoSQL?

Use SQL when you need strong consistency, complex queries, and transactions: most business systems, financial applications, and anything with relationships between entities. Use NoSQL when you need massive horizontal scale, flexible schemas, or extremely high write throughput, and you can tolerate eventual consistency. Most modern products use both for different parts of the system.

Is caching always a good idea?

No. Caching introduces complexity: you have to decide what to cache, when to invalidate it, and what to do when cached data is stale. Caching is usually a good idea for read-heavy data that doesn't change often. It is usually a bad idea for write-heavy data or data where staleness is unacceptable.

What is the CAP theorem in simple terms?

The CAP theorem says that when a network partition happens in a distributed system, you have to choose between consistency (all nodes see the same data) and availability (every request gets a response). You cannot have both during the partition. In practice, this means you must decide in advance which one your system will prioritize when things go wrong.

How long does it take to learn system design fundamentals?

You can learn the vocabulary and core concepts in 2 to 3 weeks of focused study. Getting comfortable applying them to real design problems takes another 4 to 6 weeks. A structured course like Grokking System Design Fundamentals covers the building blocks efficiently, and Grokking the System Design Interview adds the application layer.

Ready to apply these fundamentals?

Knowing the fundamentals is the foundation. Applying them to real systems is the skill. The most direct path is a structured course that walks you through 15+ real design problems using every concept on this page.

Enroll in Grokking the System Design Interview See the Full Curriculum