System Design Primer

Typical system design workflow

What are the problem constraints

What's the amount of traffic the system should handle

Num of operation per day * Number of daily active users / 86400 (~100,000)

Peak QPS = Average QPS * 3

What's the amount of data the system should handle

New data written per year: DAU * 365 (~400) * 5

What features the system needs to support

List features

Common features

Abstract design

Front-end layer

Application service layer

Data cache

Data storage

Message queue + notification center

Logs + storage + analytics

Scale

System design evaluation standards

OO design principles

SRP: The Single Responsibility Principle

OCP: The Open-Closed Principle

LSP: The Liskov Substitution Principle

DIP: The Dependency-Inversion Principle

ISP: The Interface-Segregation Principle

DRY: Don't repeat yourself

Distributed system concepts

CAP theorem

Consistency

Update consistency

Read consistency

Replication Consistency

Message queue

Benefits

Components

Routing methods

Protocols

Metrics to decide which message broker to use

Challenges

Networking

TCP vs UDP

TCP UDP
Reliable: TCP is connection-oriented protocol. When a file or message send it will get delivered unless connections fails. If connection lost, the server will request the lost part. There is no corruption while transferring a message. Not Reliable: UDP is connectionless protocol. When you a send a data or message, you don’t know if it’ll get there, it could get lost on the way. There may be corruption while transferring a message.
Ordered: If you send two messages along a connection, one after the other, you know the first message will get there first. You don’t have to worry about data arriving in the wrong order. Not Ordered: If you send two messages out, you don’t know what order they’ll arrive in i.e. no ordered
Heavyweight: – when the low level parts of the TCP “stream” arrive in the wrong order, resend requests have to be sent, and all the out of sequence parts have to be put back together, so requires a bit of work to piece together. Lightweight: No ordering of messages, no tracking connections, etc. It’s just fire and forget! This means it’s a lot quicker, and the network card / OS have to do very little work to translate the data back from the packets.
Streaming: Data is read as a “stream,” with nothing distinguishing where one packet ends and another begins. There may be multiple packets per read call. Datagrams: Packets are sent individually and are guaranteed to be whole if they arrive. One packet per one read call.
Examples: World Wide Web (Apache TCP port 80), e-mail (SMTP TCP port 25 Postfix MTA), File Transfer Protocol (FTP port 21) and Secure Shell (OpenSSH port 22) etc. Examples: Domain Name System (DNS UDP port 53), streaming media applications such as IPTV or movies, Voice over IP (VoIP), Trivial File Transfer Protocol (TFTP) and online multiplayer games

HTTP

Status code

Groups

Status code Meaning Examples
5XX Server error 500 Server Error
4XX Client error 401 Authentication failure; 403 Authorization failure; 404 Resource not found
3XX Redirect 301 Resource moved permanently; 302 Resource moved temporarily
2XX Success 200 OK; 201 Created; 203 Object marked for deletion

HTTP 4XX status codes

Status code Meaning Examples
400 Malformed request Frequently a problem with parameter formatting or missing headers
401 Authentication error The system doesn't know who the request if from. Authentication signature errors or invalid credentials can cause this
403 Authorization error The system knows who you are but you don't have permission for the action you're requesting
404 Page not found The resource doesn't exist
405 Method not allowed Frequently a PUT when it needs a POST, or vice versa. Check the documentation carefully for the correct HTTP method

Verbs

CRUD example with Starbucks

Action System call HTTP verb address Request body Successful response code + Response body
Order iced team Add order to system Post /orders/ {"name" : "iced tea", "size" : "trenta"} 201 Created Location: /orders/1
Update order Update existing order item PUT /orders/1 {"name" : "iced tea", "size" : "trenta", "options" : ["extra ice", "unsweetened"]} 204 No Content or 200 Success
Check order Read order from system GET /orders/1 200 Success { "name" : "iced tea", "size" : "trenta", "options" : ["extra ice", "unsweetened"]}
Cancel order Delete order from system DELETE /orders/1 202 Item Marked for Deletion or 204 No Content

Others

Headers

Request

Header Example value Meaning
Accept Text/html, application/json The client's preferred format for the response body. Browsers tend to prefer text/html, which is a human-friendly format. Applications using an API are likely to request JSON, which is structured in a machine-parseable way. This can be a list, and if so, the list is parsed in priority order: the first entry is the most desired format, all the way down to the last one.
Accept-language en-US The preferred written language for the response. This is most often used by browsers indicating the language the user has specified as a preference
User-agent Mozilla/5.0 This header tells the server what kind of client is making the request. This is an important header because sometimes responses or JavaScript actions are performed differently for different browsers. This is used less frequently for this purpose by API clients, but it's a friendly practice to send a consistent user-agent for the server to use when determining how to send the information back.
Content-length size of the content body When sending a PUT or POST, this can be sent so the server can verify that the request body wasn't truncated on the way to the server.
Content-type application/json When a content body is sent, the client can indicate to the server what the format is for that content in order to help the server respond to the request correctly.

Response

Header Example value Meaning
Content-Type application/json As with the request, when the content body is sent back to the client, the Content-Type is generally set to help the client know how best to process the request. Note that this is tied somewhat indirectly to the Accept header sent by the client. The server will generally do its best to send the first type of content from the list sent by the client but may not always provide the first choice.
Access-Control-Allow-Headers Content-Type, Authorization, Accept This restricts the headers that a client can use for the request to a particular resource
Access-Control-Allow-Methods GET, PUT, POST, DELETE, OPTIONS What HTTP methods are allowed for this resource
Access-Control-Allow-Origin * or http://www.example.com This restricts the locations that can refer requests to the resource

Compression

Parameters

Action System call HTTP verb address Successful response code / Response body
Get order list, only Trenta iced teas Retrieve list with a filter Get /orders?name=iced%20tea&size=trenta [{ "id" : 1, "name" : "iced tea", "size" : "trenta", "options" : ["extra ice", "unsweetened"] }]
Get options and size for the order Retrieve order with a filter specifying which pieces to return Get /orders/1?fields=options,size { "size" : "trenta", "options" : ["extra ice", "unsweetened"]}

HTTP session

Stateless applications

Structure of a session

Category Session Cookie
Location User ID on server User ID on web browser
Safeness Safer because data cannot be viewed or edited by the client A hacker could manipulate cookie data and attack
Amount of data Big Limited
Efficiency Save bandwidth by passing only a reference to the session (sessionID) each pageload. Must pass all data to the webserver each pageload
Scalability Need efforts to scale because requests depend on server state Easier to implement

Store session state in client-side cookies

// Browser request example:

GET /index.html HTTP/1.1
Host: www.example.com

// Example answer from the server:


HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: foo=10
Set-Cookie: bar=20; Expires=Fri, 30 Sep 2011 11:48:00 GMT
... rest  of the response

// Here two cookies foo=10 and bar=20 are stored on the browser. The second one will expire on 30 September. In each subsequent request the browser will send the cookies back to the server.


GET /spec.html HTTP/1.1
Host: www.example.com
Cookie: foo=10; bar=20
Accept: */*

Store session state in server-side

Typical server-side session workflow
  1. Every time an internet user visits a specific website, a new session ID (a unique number that a web site's server assigns a specific user for the duration of that user's visit) is generated. And an entry is created inside server's session table
Columns Type Meaning
sessionID string a global unique hash value
userId Foreign key pointing to user table
expireAt timestamp when does the session expires
  1. Server returns the sessionID as a cookie header to client
  2. Browser sets its cookie with the sessionID
  3. Each time the user sends a request to the server. The cookie for that domain will be automatically attached.
  4. The server validates the sessionID inside the request. If it is valid, then the user has logged in before.
Use a load balancer that supports sticky sessions:

DNS

Design

Initial design

A distributed, hierarchical database

Internals

DNS records

Insert records into DNS DB

DNS query parsing

Types

Round-robin DNS

GeoDNS

Functionality

DNS Caching

Load balancing

Host alias

DNS prefetching

Def

Control prefetching

X-DNS-Prefetch-Control: off

Load balancers

Benefits

Round-robin algorithm

Security

SSL

Definition

How does HTTPS work

How to avoid public key being modified?

How to avoid computation consumption from PKI

NoSQL

NoSQL vs SQL

Database SQL NoSQL
Data uniformness Uniform data. Best visualized as a set of tables. Each table has rows, with each row representing an entity of interest. Each row is described through columns. One row cannot be nested inside another. Non-uniform data. NoSQL databases recognize that often, it is common to operate on data in units that have a more complex structure than a set of rows. This is particularly useful in dealing with nonuniform data and custom fields. NoSQL data model can be put into four categories: key-value, document, column-family and graph.
Schema change Define what table exists, what column exists, what data types are. Although actually relational schemas can be changed at any time with standard SQL commands, it is of hight cost. Changing schema is casual and of low cost. Essentially, a schemaless database shifts the schema into the application code.
Query flexibility Low cost on changing query. It allows you to easily look at the data in different ways. Standard SQL supports things like joins and subqueries. High cost in changing query. It does not allow you to easily look at the data in different ways. NoSQL databases do not have the flexibility of joins or subqueries.
Transactions SQL has ACID transactions (Atomic, Consistent, Isolated, and Durable). It allows you to manipulate any combination of rows from any tables in a single transaction. This operation either succeeds or fails entirely, and concurrent operations are isolated from each other so they cannot see a partial update. Graph database supports ACID transactions. Aggregate-oriented databases do not have ACID transactions that span multiple aggregates. Instead, they support atomic manipulation of a single aggregate at a time. If we need to manipulate multiple aggregates in an atomic way, we have to manage that ourselves in application code. An aggregate structure may help with some data interactions but be an obstacle for others.
Consistency Strong consistency Trade consistency for availability or partition tolerance. Eventual consistency
Scalability elational database use ACID transactions to handle consistency across the whole database. This inherently clashes with a cluster environment Aggregate structure helps greatly with running on a cluster. It we are running on a cluster, we need to minize how many nodes we need to query when we are gathering data. By using aggregates, we give the database important information about which bits of data (an aggregate) will be manipulated together, and thus should live on the same node.
Performance MySQL/PosgreSQL ~ 1k QPS MongoDB/Cassandra ~ 10k QPS. Redis/Memcached ~ 100k ~ 1M QPS
Maturity Over 20 years. Integrate naturally with most web frameworks. For example, Active Record inside Ruby on Rails Usually less than 10 years. Not great support for serialization and secondary index

NoSQL flavors

Key-value

Document

Column-Family

CREATE COLUMN FAMILY visit_counter
WITH default_validation_class=CounterColumnType
AND key_validation_class=UTF8Type AND comparator=UTF8Type

// Once a column family is created, you can have arbitrary columns for each page visited within the web application for every user. 
INCR visit_counter['mfowler'][home] BY 1;
INCR visit_counter['mfowler'][products] BY 1;
INCR visit_counter['mfowler'][contactus] BY 1;

// expiring columns
SET Customer['mfowler']['demo_access'] = 'allowed' WITH ttl=2592000;

Graph

Scaling

Functional partitioning

REST best practices

Consistency

Endpoint naming conventions
HTTP verbs and CRUD consistency
Verb Endpoint Description
GET /products Gets a list of products
GET /products/:id Gets a single product by ID
GET /products/:id/parts Gets a list of parts in a single product
PUT /products/:id/parts Inserts a new part for a particular product
DELETE /products/:id Deletes a single product by ID
PUT /products Inserts a new product
HEAD /products/:id Returns whether the product exists through a status code of 200 or 404
PATCH /products/:id Edits an existing product by ID
POST /authentication/login Most other API methods should use POST requests
Versioning
Data transfer format
{
    "data" : {}  // actual response
}
HTTP status codes and error handling
HTTP/1.1 200 OK
{
    "data": {
        "id" : "baeb-b001",
        "name" : "Angry Pirate Plush Toy",
        "description" : "Batteries not included",
        "price" : "$39.99",
        "categories": ["plushies", "kids"]
    }
}
// if input validation fails on a form while attempting to create a product, you could return a response using a 400 bad request status code, as shown in the following listing.
HTTP/1.1 400 Bad Request
{
    "error": {
        "code": "bf-400",
        "message": "Some required fields were invalid.",
        "context": {
            "validation": [
                "The product name must be 6-20 alphanumeric characters",
                "The price cann't be negative",
                "At least one product category should be selected. "
            ]
        }
    }
}

// server side error
{
    "error": {
        "code": "bf-500",
        "message": "An unexpected error occurred while accessing the database",
        "context": {
            "id": "baeb-b001"
        }
    }
}
Paging
 curl 'https://api.github.com/user/repos?page=2&per_page=100'
{
  "results": [ ... actual results ... ],
  "pagination": {
    "count": 2340,
    "page": 4,
    "per_page": 20
  }
}
 Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
   <https://api.github.com/user/repos?page=50&per_page=100>; rel="last"
Name Description
next The link relation for the immediate next page of results.
last The link relation for the last page of results.
first The link relation for the first page of results.
prev The link relation for the immediate previous page of results.

Scaling REST web services

Keeping service machine stateless
Benefits
Common use cases needing share state
Caching service responses
Cache-Control header
Expires
Last-Modified/If-Modified-Since/Max-age

Cache-Control: private, max-age=86400 Last-Modified: Thu, 3 Jul 2014 18:31:12 GMT

If-Modified-Since: Thu, 3 Jul 2014 18:31:12 GMT

ETag

Cache-Control: private, max-age=86400 ETag: "d5jiodjiojiojo"

If-None-Match: "d5jiodjiojiojo"

Vary: Authorization
Functional partitioning

Security

Throttling

X-RateLimit-Limit: 2000 X-RateLimit-Remaining: 1999 X-RateLimit-Reset: 1404429213925

X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1404429213925
{
    "error": {
        "code": "bf-429",
        "message": "Request quota exceeded. Wait 3 minutes and try again.",
        "context": {
            "renewal": 1404429213925
        }
    }
}
Use OAuth2 with HTTPS for authorization, authentication and confidentiality.

Documentation

Others

Data partitioning - Sharding

Sharding benefits

Sharding key

Sharding function

Static sharding

Dynamic sharding

Challenges

Cross-shard joins

Using AUTO_INCREMENT

Distributed transactions

Clones - Replication

Replication purpose

High availability by creating redundancy

Planning for failures

Replication for scaling read

When to use
When not to use

Replication Topology

Master-slave vs peer-to-peer

Types Strengths Weakness
Master-slave Helpful for scaling when you have a read-intensive dataset. Can scale horizontally to handle more read requests by adding more slave nodes and ensuring that all read requests are routed to the slaves.Helpful for read resilience. Should the master fail, the slaves can still handle read requests.Increase availability by reducing the time needed to replace the broken database. Having slaves as replicas of the master does speed up recovery after a failure of the master since a slave can be appointed a new master very quickly. Not a good scheme for datasets with heavy write traffic, although offloading the read traffic will help a little bit with handling the write load. All of your writes need to go through a single machine The failure of the master does eliminate the ability to handle writes until either the master is restored or a new master is appointed.Inconsistency. Different clients reading different slaves will see different values because the changes haven't all propagated to the slaves. In the worst case, that can mean that a client cannot read a write it just made.
p2p: Master-master Faster master failover. In case of master A failure, or anytime you need to perform long-lasting maintainence, your application can be quickly reconfigured to direct all writes to master B.More transparent maintainance. Switch between groups with minimal downtime. Not a viable scalability technique. Need to use auto-increment and UUID() in a specific way to make sure you never end up with the same sequence number being generated on both masters at the same time.Data inconsistency. For example, updating the same row on both masters at the same time is a classic race condition leading to data becoming inconsistent between masters.Both masters have to perform all the writes. Each of the master needs to execute every single write statement either coming from your application or via the replication. To make it worse, each master will need to perform additional I/O to write replicated statements into the relay log. Both masters have the same data set size. Since both masters have the exact same data set, both of them will need more memory to hold ever-growing indexes and to keep enough of the data set in cache.
p2p: Ring-based Chain three or more masters together to create a ring. All masters need to execute all the write statements. Does not help scale writes. Reduced availability and more difficult failure recovery: Ring topology makes it more difficult to replace servers and recover from failures correctly. Increase the replication lag because each write needs to jump from master to master until it makes a full circle.

Master-slave replication

Number of slaves

Peer-to-peer replication

Replication mode

Synchronous and Asynchronous

Synchronous vs Asynchronous

Cache

Why does cache work

Cache hit ratio

How much will cache benefit

Access pattern

Write through cache

Write around cache

Write back cache

How to handle cache failure

Typical caching scenarios

HTTP Cache

Headers

Types

Browser cache
Caching proxies
Reverse proxy
Content delivery networks

Scaling

Application objects cache

Types

Client-side web storage
Caches co-located with code: One located directly on your web servers.
Distributed cache store

Scaling

Caching rules of thumb

Cache priority

Cache reuse

Cache invalidation

Pains

Pain of large data sets - When cache memory is full

Pain of stale data

Pain of loading

Pain of duplication

Thundering herd problem

Def

/* read some data, check cache first, otherwise read from SoR */
public V readSomeData(K key) {
  Element element;
  if ((element = cache.get(key)) != null) {
    return element.getValue();
  }

  // note here you should decide whether your cache
  // will cache 'nulls' or not
  if (value = readDataFromDataStore(key)) != null) {
    cache.put(new Element(key, value));
  }

  return value;
}

Solutions

Scaling Memcached at Facebook

Architecture

Lambda architecture

Building blocks

Load balancer

Hardware vs software

Category Software Hardware
Def Run on standard PC hardware, using applications like Nginx and HAProxy Run on special hardware and contain any software pre-installed and configured by the vendor.
Model Operate on Application Layer Operate on network and transport layer and work with TCP/IP packets. Route traffic to backend servers and possibly handling network address translation
Strength/Weakness More intelligent because can talk HTTP (can perform the compression of resources passing through and routing-based on the presence of cookies) and more flexible for hacking in new features or changes Higher throughput and lower latency. High purchase cost. Hardware load balancer prices start from a few thousand dollars and go as high as over 100,000 dollars per device. Specialized training and harder to find people with the work experience necessary to operate them.

HAProxy vs Nginx

Category Nginx HAProxy
Strengths Can cache HTTP responses from your servers. A little faster than Nginx and a wealth of extra features. It can be configured as either a layer 4 or layer 7 load balancer.

Web server

Apache and Nginx

Apache vs Nginx

Category Apache Nginx
History Invented around 1990s when web traffic is low and web pages are really simple. Apache's heavyweight, monolithic model has its limit. Tunning Apache to cope with real-world traffic efficiently is a complex art. Heavy traffic and web pages. Designed for high concurrency. Provides 12 features including which make them appropriate for microservices.
Architecture One process/threads per connection. Each requests to be handled as a separate child/thread. Asynchronous event-driven model. There is a single master process with one or more worker processes.
Performance To decrease page-rendering time, web browsers routinely open six or more TCP connections to a web server for each user session so that resources can download in parallel. Browsers hold these connections open for a period of time to reduce delay for future requests the user might make during the session. Each open connection exclusively reserves an httpd process, meaning that at busy times, Apache needs to create a large number of processes. Each additional process consumes an extra 4MB or 5MB of memory. Not to mention the overhead involved in creating and destroying child processes. Can handle a huge number of concurrent requests
Easier development Very easy to insert additional code at any point in Apache's web-serving logic. Developers could add code securely in the knowledge that if newly added code is blocked, ran slowly, leaked resources, or even crashed, only the worker process running the code would be affected. Processing of all other connections would continue undisturbed Developing modules for it isn't as simple and easy as with Apache. Nginx module developers need to be very careful to create efficient and accurate code, without any resource leakage, and to interact appropriately with the complex event-driven kernel to avoid blocking operations.

Cache

In-memory cache - Guava cache

Standalone cache

Memcached

Redis

Database

DynamoDB

Cassandra

Queue

ActiveMQ

RabbitMQ

SQS

Kafka

Data Processing

Hadoop

Spark

EMR

Stream Processing

Samza

Storm

References