Software engineering notes

Database - MySQL Optimisation

Index

How does index work?

MySQL uses a data structure (B-Tree is default) that stores the value of the indexed column(s) and a reference to the row that contains the rest of the data. This allows mysql to find the values without scanning the whole table.

types of index

Multi-column index best practices

Most selective

Column order Matters

Does the order matter in “WHERE” clauses?

No, the order of where clauses does not matter.

Query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query

Can a query choose more than one index?

Yes, a query optimizer can choose to use more than one index in a single query if doing so would make the query more efficient.

It’s worth noting that using multiple indexes in a single query can also have downsides. Each index lookup has a cost, and the database has to combine the results from multiple lookups, which also takes time. Therefore, the query optimizer will only choose to use multiple indexes if it estimates that doing so will be faster than the alternatives. This estimation is based on various factors, such as the size of the table, the selectivity of the indexes, and the complexity of the query.

For example, if there are two indexes, one for ‘uid’ and another for ‘pid’, and both are used for filtering in the WHERE clause, will the query optimizer use both indexes?

It depends. The query optimizer may use both indexes if both ‘uid’ and ‘pid’ columns are used in the WHERE clause and their indexes are highly selective. However, it may also choose to use only one index which is more selective than the other one.

Can an index be used for both ORDER BY and WHERE at the same time?

Yes, a multi-column index can be used for both WHERE and ORDER BY at the same time.

The index needs to be carefully designed so that the fields used in the WHERE clause and the ORDER BY clause match the prefix of the multi-column index.

For example:

SELECT * FROM employees WHERE last_name = 'Bob' ORDER BY created_at;

index: last_name, created_at

The index can be used for both WHERE, searching the leftmost part of the index (last_name = bob), and the ORDER BY, mainting the order of created_at in the filtered data

What factors does the query optimizer consider when making decisions on index?

What’s covering index?

A covering index in MySQL is a type of index that includes all the fields retrieved by a particular query. if a query can be satisfied entirely using an index without having to perform additional disk I/O operation to fetch the actual data row.

For example:

SELECT first_name, last_name FROM users WHERE email = 'user@example.com';
CREATE INDEX idx_email_first_last ON users(email, first_name, last_name);

This index “covers” the query because the index includes all the fields used in this query.

Any single-column index or multi-column index could act as a covering index, given the right query. Design your indexes and queries to take advantage of this optimization technique wherever possible.

Which clauses in a query can affect the query optimizer’s decision on indexes?

Which clauses are given precedence over others by the query optimizer within a query?

No clause takes precedence over the others.

When deciding on the usage of indexes, the optimizer doesn’t strictly prioritize one clause over the other.

For example, it doesn’t necessarily favor ‘WHERE’ over ‘ORDER BY’. It evaluates the entire query and the available indexes to choose the best indexes to use.

The pros and cons of setting foreign keys

Setting foreign keys in a database comes with several advantages and a few considerations. Here are some pros and cons of setting foreign keys:

Pros

Cons

the benefits of setting foreign keys typically outweigh the drawbacks

How to speed up LIKE query by using an index?

For string columns, MySQL indexes the left side of a string

It can speed up like query that has a wildcard on the right side

SELECT * FROM foo WHERE field LIKE "bar%"

It doesn’t work for a query that has a wildcard on the left side

SELECT * FROM foo WHERE field LIKE "%bar%"

EXPLAIN

The columns in the output

For example

EXPLAIN SELECT * FROM orders WHERE product_id = 123 AND order_date > '2023-01-01';
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | orders | NULL       | ref  | product_index | product_id | 4       | const | 100  |   10.00  | Using where |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-------------+

In this example, the filtered column shows a value of 10.00. This means MySQL estimates that about 10% of the rows selected by the query (specifically, those rows matching the index on product_id) will pass the additional order_date condition in the WHERE clause. which means there will be only 10 rows in the result.

The difference in the type value between ref and index

What columns in the EXPLAIN output we want to pay attention to?

What’s filesort?

The filesort operation is used to sort rows in a result set when the requested order cannot be fulfilled using an index or when sorting cannot be performed in memory. It involves writing temporary data to disk and performing an external sorting algorithm to order the rows.

Here’s what filesort operation does:

How to improve filesort?

SQL Execution order

  1. FROM
  2. JOIN
  3. ON
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. SELECT
  8. ORDER BY
  9. LIMIT

Independent Subquery vs Dependent Subquery

Independent Subquery

This type of subquery can be run independently of the outer query. It does not rely on information obtained from the outer query. It executes once for the entire main query and passes the result set to the main query. Here’s an example:

SELECT employee_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'Chicago');

In this case, the subquery (SELECT ...) can be run independently of the outer query, and it returns a list of department IDs. This subquery executes once and its result is used by the main query.

more performance-efficient

Dependent Subquery

This type of subquery relies on the outer query for its values. This means that the subquery is executed once for each row processed by the outer query. Here’s an example:

SELECT e.employee_name
FROM employees e
WHERE EXISTS (SELECT 1 FROM departments d WHERE d.manager_id = e.employee_id AND d.location = 'Chicago');

In this case, the subquery (SELECT ...) is a dependent subquery because it uses e.employee_id from the outer query. This subquery executes once for each row in the outer query, checking if an employee is a manager of a department located in ‘Chicago’.

more powerful and flexible because they can use values from the outer query

Table/Row lock (MySQL 5.6 and onwards)

Add a new column

Environment: AWS RDS Aurora MySQL 5.6.10a (db.t2.small)

It doesn’t lock table.

For 1.3M rows, it takes 67s and the CPU usage is around 35%. It doesn’t affect INSERT operation.

Modify a column (ALTER TABLE)

Environment: AWS RDS Aurora MySQL 5.6.10a (db.t2.small)

This operation locks table as it executes copy to tmp table, so the whole table gets stucked.

It takes 30+ mins for a table with 3M rows.

ref: