Software engineering notes

Database

Database scaling

Vertical vs Horizontal

Vertical: increase the capacity of a single server by adding more powerful hardware resources such as CPU, RAM, and storage

Horizontal: distribute the database load across multiple servers by adding more instances to the existing pool of servers

Data consistency

Note that MySQL replication only provides eventual consistency, not strong consistency

Sharding and Partitioning

breaking up large tables into partitions and storing each partition on separate servers is called “sharding”

Common methods of database sharding

The pros and cons of sharding

pros

cons

How to do partitioning and sharding?

Query a partitioned table in mysql

If the query involves joining partitioned tables or combining data from multiple partitions, MySQL might create temporary tables to process the query

How to read data a shard server?

Before sending a query, we need to determine the shard key so that we know which shard we want to query

For example, use the hash function for user_id to get the shard key, then query the appropriate shard based on the shard key

code example

# Determine the shard
shard = hash_function(key)

# Find the server for this shard
server = metadata[shard]

data = connect_and_retrieve(server, key)

How to read data across shard servers?

cross-shard query

There are a few ways to achieve cross-shard query if the database doesn’t support

CAP theorem

It’s impossible for a distributed data store to simultaneously provide more than two out of these three guarantee:

Eventually consistent: The database prioritise Availability and Partition Tolerance over Consistency

ACID

A set of properties that guarantee that database transactions are processed reliably

SQL vs NoSQL

SQL

NoSQL

PostgreSQL vs MySQL (innodb)

Similarity

PostgreSQL

MySQL

AWS Redshift vs MySQL

Redshift features

ref: