Database - MySQL Optimisation

Index

How does index work?

MySQL uses a data structure (B-Tree is default) that stores the value of the indexed column(s) and a reference to the row that contains the rest of the data. This allows mysql to find the values without scanning the whole table.

B-Tree is the most commonly used
B-Tree is used for both clustered and non-clustered indexes
If it’s a multi-column index, the value will be concatenated values of the indexed columns

types of index

B-Tree
- efficient for finding ranges of values and individual values, especially when dealing with =, >, <, >=, <=, BETWEEN, LIKE, etc.
Hash
- used for equality comparisons that use the = operator
- not used for range-based query or sorting (ORDER BY)
Full-text
- used with full-text search functions MATCH(), AGAINST()
- used for complex word-based searches
Spatial
- used for geographic or multi-dimensional data
- used for finding points within a given radius or polygons that overlap
Clustered
- stored as B-tree
- the primary key serves as a clustered index
- mysql can have only one clustered index in a table
- It’s best for retrieving records in a range of values
Secondary (Non-clustered)
- stored as B-tree
- an index that isn’t a table’s primary key index (which is a clustered index)
- contain the indexed columns’ values and a pointer that references the primary key for the row
- best for retrieving individual records

Multi-column index best practices

Most selective

If one column in the index is going to narrow down the result significantly more than the other, it should be placed first in the index

Column order Matters

For example, if there is an index on (uid, gid, pid),
the index will fully come into effect under the following cases:
- uid = “AAA”
- uid = “AAA” AND gid = “BBB”
- uid = “AAA” AND gid = “BBB” AND pid = “CCC”
the index will partially come into effect under the following cases:
- uid = “AAA” AND pid = “CCC”
- only uid comes into effect in this index
the index won’t come into effect under the following cases:
- gid = “BBB”
- pid = “CCC”
- gid = “BBB” AND pid = “CCC”
the reason why the index won’t work is because the index followed by uid that isn’t in the WHERE clause

Does the order matter in “WHERE” clauses?

No, the order of where clauses does not matter.

Query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query

Can a query choose more than one index?

Yes, a query optimizer can choose to use more than one index in a single query if doing so would make the query more efficient.

It’s worth noting that using multiple indexes in a single query can also have downsides. Each index lookup has a cost, and the database has to combine the results from multiple lookups, which also takes time. Therefore, the query optimizer will only choose to use multiple indexes if it estimates that doing so will be faster than the alternatives. This estimation is based on various factors, such as the size of the table, the selectivity of the indexes, and the complexity of the query.

For example, if there are two indexes, one for ‘uid’ and another for ‘pid’, and both are used for filtering in the WHERE clause, will the query optimizer use both indexes?

It depends. The query optimizer may use both indexes if both ‘uid’ and ‘pid’ columns are used in the WHERE clause and their indexes are highly selective. However, it may also choose to use only one index which is more selective than the other one.

Can an index be used for both `ORDER BY` and `WHERE` at the same time?

Yes, a multi-column index can be used for both WHERE and ORDER BY at the same time.

The index needs to be carefully designed so that the fields used in the WHERE clause and the ORDER BY clause match the prefix of the multi-column index.

For example:

SELECT * FROM employees WHERE last_name = 'Bob' ORDER BY created_at;

index: last_name, created_at

The index can be used for both WHERE, searching the leftmost part of the index (last_name = bob), and the ORDER BY, mainting the order of created_at in the filtered data

What factors does the query optimizer consider when making decisions on index?

Cardinality
- The uniqueness of data in a column.
- If a column has a high cardinality (many unique values), then an index on that column can be very efficient for certain lookups.
- Conversely, if a column has low cardinality (few unique values), then an index might not be as useful.
Selectivity
- This is related to cardinality and refers to the percentage of rows a particular index entry selects.
- If an index entry refers to many rows (low selectivity), the optimizer may choose a table scan or another index.
- But if an index entry refers to few rows (high selectivity), the optimizer may opt to use the index.
- If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index)
Distribution of Values
- If the values in a column are unevenly distributed, this might affect the optimizer’s decision.
- For instance, if a large number of rows have the same value for an indexed column, and a query is selecting rows with that value, a full table scan might be more efficient than using the index.
Size of the Table
- Larger tables are more likely to benefit from indexing than smaller tables.
- In a small table, it might be quicker to perform a full table scan rather than using an index.
Type of Query
- The kind of operation being performed in the query can also influence the decision.
- For instance, indexes are more beneficial in SELECT queries than in INSERT, UPDATE, or DELETE queries, because the latter group of operations require additional time to update the index.
Index Type
- Different DBMS support different types of indexes (B-tree, hash, bitmap, etc.).
- The type of an index can determine its usefulness for certain types of queries.
Statistics
- Modern databases keep statistics about data distribution in tables and indexes.
- The optimizer uses these to make educated decisions about which indexes to use.

What’s covering index?

A covering index in MySQL is a type of index that includes all the fields retrieved by a particular query. if a query can be satisfied entirely using an index without having to perform additional disk I/O operation to fetch the actual data row.

For example:

SELECT first_name, last_name FROM users WHERE email = 'user@example.com';
CREATE INDEX idx_email_first_last ON users(email, first_name, last_name);

This index “covers” the query because the index includes all the fields used in this query.

Any single-column index or multi-column index could act as a covering index, given the right query. Design your indexes and queries to take advantage of this optimization technique wherever possible.

Which clauses in a query can affect the query optimizer’s decision on indexes?

SELECT
- While the SELECT clause doesn’t directly affect index selection, the columns that are being selected can influence the optimizer.
- If all selected columns are part of a single index (a situation known as a “covering index”), the optimizer may choose to scan the index instead of the actual table.
JOIN
- The optimizer takes into account the tables being joined and the join type.
- If there are indexes on the columns being used to join the tables, the optimizer will likely use those indexes to speed up the join operation.
WHERE
- The optimizer looks at the columns and values in the WHERE clause.
- If there are indexes on the columns being used in the WHERE conditions, it can use these indexes to quickly locate the rows that match the condition, instead of scanning the entire table.
GROUP BY: If you’re grouping by a column that’s indexed, the optimizer might use the index to speed up the operation.
ORDER BY: If you’re ordering by a column that’s indexed, the database can use the index to avoid having to sort the result set, since the index already provides a sorted set of values.
LIMIT: The LIMIT clause can significantly influence how the optimizer chooses to use indexes, especially when combined with an ORDER BY clause.

Which clauses are given precedence over others by the query optimizer within a query?

No clause takes precedence over the others.

When deciding on the usage of indexes, the optimizer doesn’t strictly prioritize one clause over the other.

For example, it doesn’t necessarily favor ‘WHERE’ over ‘ORDER BY’. It evaluates the entire query and the available indexes to choose the best indexes to use.

The pros and cons of setting foreign keys

Setting foreign keys in a database comes with several advantages and a few considerations. Here are some pros and cons of setting foreign keys:

Pros

Data Integrity
- ensure that relationships between tables are maintained correctly
- prevent the insertion of invalid or orphaned data by enforcing constraints on the referenced columns
Data Consistency
- automatically update or delete related records when changes are made to the referenced records
- minimise manual errors and inconsistencies
Query Optimization
- optimise JOIN operations through efficient navigation between related tables using indexes.
Documentation and Understanding
- illustrate relationships between tables
- serve as guides, making database structure easier to understand

Cons

Performance Overhead
- Validation Checks
  - For inserts and updates, the database must ensure that the referenced key value exists in the referenced table
  - For deletes, the database must verify if there are any dependent records in the child table
- Index maintenance
  - maintain and update referencing and referenced indexes
- Bulk Data Operations:
  - operations such as concurrent updates and transactions to large-scale tables
- Concurrency and Locking
  - For updates and deletes, system performance and concurrency can be impacted due to lock operations on related tables
Complexity and Maintenance
- For database schema, any modifications to relationships or referenced tables may require updating the foreign keys
- When performing database migrations or large-scale data loading, foreign key constraints can sometimes pose challenges.
  - Workaround: Disabling or temporarily removing foreign key constraints may be necessary to facilitate the migration or data loading process

the benefits of setting foreign keys typically outweigh the drawbacks

How to speed up `LIKE` query by using an index?

For string columns, MySQL indexes the left side of a string

It can speed up like query that has a wildcard on the right side

SELECT * FROM foo WHERE field LIKE "bar%"

It doesn’t work for a query that has a wildcard on the left side

SELECT * FROM foo WHERE field LIKE "%bar%"

EXPLAIN

The columns in the output

id
- The ID of the select operation within the query execution plan. If there are multiple rows, they represent a nested subquery
select_type
- The type of select operation being performed. It indicates the type of query or subquery, such as SIMPLE, PRIMARY, SUBQUERY, DERIVED, etc.
- Values
  - SIMPLE: A simple SELECT statement without any subqueries or UNIONs.
  - PRIMARY: The main select statement in a complex query (for example, outer query in a subquery).
  - SUBQUERY: The first SELECT statement in a subquery, not in the FROM.
    - e.g. SELECT * FROM users WHERE user_id in (SELECT id FROM employees)
    - select_type: SUBQUERY, table: employees
  - DERIVED: MySQL creates a temporary table for the results of the SELECT statement. This is usually in FROM subqueries.
    - e.g. SELECT * FROM ( SELECT column1, column2 FROM table1) as derived_table
    - select_type: DERIVED, table: derived_table
  - UNION: A second or subsequent SELECT statement in a UNION.
    - e.g. show the result of the union of tables 1 and 2
    - e.g. SELECT id FROM table1 UNION SELECT id FROM table2
    - select_type: UNION, table: table2
  - UNION RESULT: The result of a UNION.
    - select_type: UNION RESULT, table: <union1,2>
  - DEPENDENT UNION: A second or subsequent SELECT statement in a UNION, but depending on outer query.
  - DEPENDENT SUBQUERY: A subquery that depends on the outer query.
  - MATERIALIZED: It refers to a materialized subquery, which is a subquery that has been precomputed and stored.
  - UNCACHEABLE SUBQUERY: A subquery result that is not cached and hence is re-evaluated for each row of the outer query.
  - UNCACHEABLE UNION: The result of a UNION is not cached, and hence the UNION operation is re-evaluated for each row of the outer query.
table
- The name of the table involved in the corresponding row of the query execution plan.
partitions
- The partitions of the table that are accessed or searched during the query execution.
type
- The join type or access method used for retrieving rows.
- Values
  - const
    - ideal value
    - This suggests that the query accesses a single row based on a constant value or a primary key.
    - It’s the most efficient access method because it retrieves a specific row directly.
    - e.g. SELECT * FROM employees WHERE employee_id = 1, assuming employee_id is a primary key
    - e.g. SELECT * FROM employees WHERE unique_index_column = 'unique_value', assuming unique_index_column is a column that has a unique index
    - e.g. SELECT * FROM employees WHERE column1 = 'value1' AND column2 = 'value2', assuming column1 and column2 together form a composite unique index
  - ALL
    - This indicates a full table scan where all rows of the table are read.
    - It can be resource-intensive and generally not desirable, especially for large tables.
    - e.g. SELECT * FROM employees WHERE name LIKE '%John%'
  - index
    - MySQL will scan the entire index to find matching rows
    - This happens when all the columns being selected are part of the same index.
    - MySQL can go directly to the index to find the necessary data, instead of searching the entire table.
    - It’s more efficient than a full table scan (ALL type), but not efficient enough because it still reads the entire index
    - It’s worth to note that this can happen even without a WHERE clause
    - e.g. SELECT * FROM employees, mysql may opt to perform a full index scan if that’s cheaper than a full table scan
      - e.g. If there is a covering index that includes all the columns in the table, mysql might instead opt to do a full scan of that index
    - e.g. SELECT employee_name FROM employees, assuming employee_name is an indexed column, mysql can simply scan the employee_name index to retrieve the data, rather than scanning the full table
    - e.g. SELECT * FROM employees ORDER BY employee_name, assuming employee_name is an indexed column, mysql performs an index scan to take advantage of the fact that the index stores rows in order
  - range
    - MySQL uses an index to read rows from a table based on a range
    - this type is used when querying on indexed columns with operators like BETWEEN, <, >, <=, >=, IN, etc.
    - much more efficient than index and ALL types because it only reads rows where the index is within a certain range, rather than reading the entire index or table
    - e.g. SELECT * FROM employees WHERE employee_id BETWEEN 100 AND 200, assuming employee_id is an indexed column
  - index_merge
    - Only appear if the optimizer decides to use the Index Merge optimization
    - Use multiple index scans and merges the results
    - e.g. SELECT * FROM employees WHERE employee_id = 1 OR department_id = 2, assuming employee_id and department_id are seperately indexed columns (not multi-column index)
  - unique_subquery
    - This value suggests that a subquery is used to fetch a single unique row.
  - index_subquery
    - Similar to unique_subquery, but tbl value inside the IN clause is non-unique.
  - ref
    - ideal value
    - This indicates that the query uses a non-unique index to access rows based on a single value from another table.
    - It typically occurs when joining tables using a column that is not unique but has an index.
    - e.g. JOIN on gid = gid (1:N)
    - e.g. SELECT * FROM employees WHERE department_id = 10, assuming department_id is a non-unique indexed column
  - eq_ref
    - ideal value
    - This is similar to ref, but it occurs when the join is performed using a unique index.
    - It’s a more efficient access method compared to ref because only one row is expected to match.
    - e.g. JOIN on gid = gid (1:1)
    - e.g. SELECT * FROM employees JOIN departments ON employees.department_id = departments.department_id, assuming department_id in departments is a unique indexed column
  - ref_or_null
    - This join type is like the ref type, but MySQL does an extra search for rows that contain NULL values.
  - fulltext
    - A full-text search is performed.
    - e.g. find the employee name whose name is ‘John’
    - e.g. SELECT * FROM employees WHERE MATCH(employee_name) AGAINST('John'), aassuming employee_name is a column with a FULLTEXT index
  - system
    - This indicates that the query examines a single row from a system table, which contains database-specific metadata.
- You generally want ALL or index to be as low as possible.
  - If you see ALL, it means a full table scan is happening which can be a performance hit.
  - You may need to add an index to speed this up.
possible_keys
- The indexes that could potentially be used for the query execution.
- If possible_keys is NULL, it means there are no relevant indexes that MySQL can use. You might want to add an index.
key
- The actual key (index) that will be used.
- If key is NULL, it means MySQL isn’t using an index. This could also indicate the need for an index.
key_len
- The length of the index used.
- Shorter is usually better.
- The maximum number of bytes used from the index for key comparisons
- It is measured in bytes, and its calculation depends on the data types of the columns involved in the index.
- For example, 8 means the length of primary key if it’s 8-byte bigint
ref
- It indicates a constant or which column is used against the column present in the key to fetch the data from the table.
- Values
  - Const
    - the most efficient
    - If the column value is matched to a constant, you’ll see const here. This is an ideal scenario where the query can match exactly one row in the table.
    - This usually happens when the query is using a primary key or unique index in a WHERE clause.
  - Column Name(s)
    - the named columns are used in an index lookup and are part of the WHERE clause in the query
    - e.g. SELECT * FROM orders JOIN customers ON orders.customer_id = customers.customer_id
    - e.g. table: orders, type: ref, key: customer_id, ref: customers.customer_id
    - e.g. This indicates that the customer_id column from the customers table is being used in the index customer_id of orders to select rows
  - NULL
    - This shows that no index has been used for this table in the query
    - can be a sign of less efficient queries as they may indicate full table or index scans
  - func
    - The column is compared to the result of a function.
rows
- The rows column indicates the approximately number of rows need to examine to execute the query.
- Lower is Better: Generally, you want the number in this column to be as low as possible. The fewer rows that need to be examined, the faster your query will run.
filtered
- estimates the percentage of table rows that will meet the WHERE condition (before examining any possible JOINs)
- If the value is 100, it indicates means no filtering of rows occurred
- the low number indicates that the query could benefit from an index
- rows column shows the estimated number of rows examined
- rows × filtered / 100 shows the number of rows that will be joined with the table
- For example, rows is 100 and filtered is 20.00 (20%), which means only around 20 rows will remian after filter has been applied.
Extra
- Provide additional information about how MySQL resolves the query
- Values
  - Using index
    - This is generally an ideal value
    - This indicates a ‘Covering Index Scan’ – the entire query can be resolved using only information within the index, without having to do additional table lookups.
    - It suggests that the query can benefit from index optimization.
  - Using where
    - a filtering operation is performed after retrieving rows using an index.
    - It means that the query needs to evaluate additional conditions that are not covered by the index, and it may involve scanning rows that don’t match the conditions.
    - While using where is not necessarily undesirable, it suggests that additional filtering is performed after retrieving rows using an index.
    - If this value appears in a significant number of rows in the EXPLAIN output, it may indicate that the query is not fully utilizing the available indexes or that further optimization is possible.
  - Using temporary
    - the query requires the creation of a temporary table to hold intermediate results.
    - Temporary tables are used when the query involves complex operations or sorting.
    - This often happens in queries with ORDER BY and GROUP BY.
    - Using a temporary table is not inherently undesirable, but it indicates that the query requires the creation of a temporary table for sorting or other intermediate results.
      - If this value appears in a significant number of rows or for large result sets, it may suggest potential performance bottlenecks.
  - Using filesort
    - This is generally considered undesirable, because it’s less efficient than in-memory sorting or sorting using an index. It may indicate a performance concern.
    - the query requires a filesort operation to sort the result set. It occurs when the requested order of rows doesn’t match the order of the index used, or when sorting cannot be performed using an index.
    - You might be able to avoid this by indexing the columns in the ORDER BY clause.
  - Using join buffer (Block Nested Loop, Batched Key Access, etc.)
    - This is not necessarily undesirable
      - However, if these values appear frequently and involve large result sets, it may indicate that the query performance could be improved through better indexing or query optimization.
    - The join is executed by employing a join buffer to hold rows from the table and does not require a full table scan.
    - The join buffer is a temporary storage area used to process JOIN operations efficiently.
  - Range checked for each record: the query is using a range scan to access rows within a specified range, such as using BETWEEN or IN conditions.
  - Full scan on NULL key: the query is scanning the entire index or table to find rows with NULL values for the indexed column.
  - Distinct: the query is removing duplicate rows from the result set using a distinct operation.

For example

EXPLAIN SELECT * FROM orders WHERE product_id = 123 AND order_date > '2023-01-01';
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | orders | NULL       | ref  | product_index | product_id | 4       | const | 100  |   10.00  | Using where |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+

In this example, the filtered column shows a value of 10.00. This means MySQL estimates that about 10% of the rows selected by the query (specifically, those rows matching the index on product_id) will pass the additional order_date condition in the WHERE clause. which means there will be only 10 rows in the result.

The difference in the `type` value between `ref` and `index`

ref
- Use an index to look up rows in a table.
- It may match multiple rows unlike a unique or primary key (const, eq_ref)
- e.g. SELECT * FROM employees WHERE department_id = 10, assuming department_id is an indexed column
index
- Uses the index to do a full index scan
- All the data needed is available within the index itself
- Usualy the query doesn’t have WHERE to help mysql choose an better index
- e.g. SELECT department_id FROM employees, assuming department_id is an indexed column

What columns in the `EXPLAIN` output we want to pay attention to?

type: This tells you about how data in the database is scanned
possible_keys: This shows which indexes could be used for the query
key: This is the actual index that is choen to use for the query
rows: This is the estimated number of rows that need to be examined to execute the query
Extra: This tells you what else operation needs to be done after filtering out the data

What’s `filesort`?

The filesort operation is used to sort rows in a result set when the requested order cannot be fulfilled using an index or when sorting cannot be performed in memory. It involves writing temporary data to disk and performing an external sorting algorithm to order the rows.

Here’s what filesort operation does:

Sorting Algorithm
- When a filesort is needed, MySQL uses an external sorting algorithm, typically based on the merge sort algorithm.
- This algorithm is designed to efficiently sort data that is too large to fit entirely in memory.
Temporary File
- MySQL creates a temporary file on disk to store intermediate results during the sorting process.
- This file is usually written to the disk, and the data is read back as needed for the sorting operation.
Data Partitioning
- If the result set is larger than the available memory, MySQL divides the data into partitions that can fit in memory.
- It sorts each partition individually and writes the sorted partitions to the temporary file.
Merging Sorted Partitions
- Once all the partitions are sorted individually, MySQL performs a merge operation to combine the sorted partitions into a single sorted result set.
- This merge process continues until all partitions have been merged.
Disk I/O
- The filesort operation involves reading data from disk, writing intermediate results to disk, and performing disk I/O during the merge process.
- This can have an impact on query performance, especially if the disk I/O operations become a bottleneck.

How to improve `filesort`?

Ensure that the query is using appropriate indexes
Review the query and the data model to see if there are ways to simplify or optimize the query to reduce the need for filesort
Consider adjusting the MySQL configuration to increase the available memory for sorting operations, if feasible.
Evaluate the possibility of adding or modifying indexes

SQL Execution order

FROM
JOIN
ON
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT

Independent Subquery vs Dependent Subquery

Independent Subquery

This type of subquery can be run independently of the outer query. It does not rely on information obtained from the outer query. It executes once for the entire main query and passes the result set to the main query. Here’s an example:

SELECT employee_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'Chicago');

In this case, the subquery (SELECT ...) can be run independently of the outer query, and it returns a list of department IDs. This subquery executes once and its result is used by the main query.

more performance-efficient

Dependent Subquery

This type of subquery relies on the outer query for its values. This means that the subquery is executed once for each row processed by the outer query. Here’s an example:

SELECT e.employee_name
FROM employees e
WHERE EXISTS (SELECT 1 FROM departments d WHERE d.manager_id = e.employee_id AND d.location = 'Chicago');

In this case, the subquery (SELECT ...) is a dependent subquery because it uses e.employee_id from the outer query. This subquery executes once for each row in the outer query, checking if an employee is a manager of a department located in ‘Chicago’.

more powerful and flexible because they can use values from the outer query

Table/Row lock (MySQL 5.6 and onwards)

The table or row lock behaviours depend on the engine such as InnoDB, MyISAM, PostgreSQL. The given list below only serves as a reference or an overview.

As of MySQL 5.6 and onwards, the InnoDB storage engine supports online DDL (Data Definition Language) operations. This allows most types of ALTER TABLE operations to be performed online, meaning that the table remains available for ‘read’ and ‘write’ operations during the operation.

Table Locks
- TRUNCATE TABLE: This statement removes all rows from a table, and it acquires a table lock to ensure exclusive access during the truncation process.
- REPAIR TABLE, OPTIMIZE TABLE: Certain table maintenance operations, such as repairing or optimizing a table, may require a table lock to prevent other operations from modifying the table during the maintenance process.
- LOCK TABLES: This statement explicitly locks one or more tables, preventing other connections from accessing or modifying the locked tables.
- DDL Statements: Certain data definition language (DDL) statements, like CREATE TABLE, DROP TABLE, or RENAME TABLE, may acquire table locks during their execution.
Row Locks
- UPDATE: When an UPDATE statement modifies one or more rows, it typically acquires row-level locks on the affected rows to ensure data consistency. The locks are held until the transaction is committed or rolled back.
- DELETE: Similar to the UPDATE statement, a DELETE statement acquires row-level locks on the affected rows to maintain data consistency.
- SELECT ... FOR UPDATE: When using the SELECT ... FOR UPDATE syntax, the selected rows are locked for exclusive access within the transaction. This allows you to perform updates on the selected rows without conflicts.
- INSERT INTO ... SELECT: If an INSERT INTO ... SELECT statement inserts data into a table based on a select query, it may acquire row-level locks on the selected rows during the insertion process.
- Transactions: In a transactional environment, locks can be acquired implicitly during data modification operations to maintain consistency and isolation. The duration and scope of the locks depend on the transaction isolation level and the specific statements executed within the transaction.
No Table or Row lock
- SELECT
- CREATE: such as create a new table, index or view
- DROP: It doesn’t lock on other tables or rows
- GRAN / REVOKE
ALTER TABLE
- At the beginning and end of the operation, brief locks may be required, but for the majority of the operation, normal operations on the table can proceed.
  - Adding a new column.
  - Adding or dropping an index or primary key.
  - Changing a column’s default value.
  - Reordering of columns.
  - Changing the auto_increment value.
  - Creating, dropping, or renaming a column.
- table lock as they require a full table copy
  - Changing the data type of existing columns.
  - Dropping a column or primary key.
  - Adding a auto_increment to an existing column.

Add a new column

Environment: AWS RDS Aurora MySQL 5.6.10a (db.t2.small)

It doesn’t lock table.

For 1.3M rows, it takes 67s and the CPU usage is around 35%. It doesn’t affect INSERT operation.

Modify a column (ALTER TABLE)