How to design a MySQL database schema with better performance

03-03-2023

This article mainly introduces how to design a MySQL database schema with better performance. The editor will show you the operation process through actual cases. The operation method is simple, fast, and practical. I hope this article how to design a better performance Excellent MySQL database schema article can help you solve the problem.

Choose the optimized data type

mysql supports many data types, Choosing the correct data type is critical to achieving high performance. Regardless of the type of data you store, here are a few simple principles you need to keep in mind.

  • Smaller is usually better

  • Simple is better

  • Try to avoid null

Integer type

We have the following integer types:

TINYINT

SMALLINT

MEDIUMINT

INT

BIGINT

8 bits

16 bits

24 bits

32-bit

64-bit

Their storage space ranges from -2(N-1) power to 2 (N-1) minus one. N is the number of digits

The integer type has an optional unsigned attribute, which means that negative values are not allowed, which roughly makes positive The upper limit of the number is doubled.

Real number type

Real number, it is Numbers with a fractional part, however, it is not just for storing fractional parts; you can also use decimal to store integers larger than bigint.

decimal(18,9) store 9 numbers on both sides of the decimal point, one uses 9 bytes; the number before the decimal point uses 4 bytes, the number after the decimal point uses 4 bytes, and the decimal point itself occupies one byte .

Because additional space and computational overhead are required, you should try to use decimal only when performing precise calculations on decimals--such as storing financial data, but in data When the amount is relatively large, you can use bigint instead of decimal, and multiply the stored data by the corresponding multiple according to the number of decimal places.

String type

varchar

used to store variable length String, if MySQL uses ROW_FORMAT=FIXED to create, each row will use fixed-length storage, which will waste space.

Varchar needs to use 1 or 2 The extra bytes record the length of the string. For example: a column of varchar(10) requires 11 bytes of storage.

But because the row is variable length When updating, the row may become longer than the original, which leads to additional work. For example, myisam will split the row into different fragments for storage, and innodb needs to split the page so that the row can be placed in the page.

char

char type is fixed length, when storing char value .mysql will delete all trailing spaces.

char is suitable for storing very short strings, or all values are close to the same length. Example: char is very suitable for storing the md5 value of a password.

For data that changes frequently, char is also better than varchar, because the fixed-length char type is not easy to generate fragments.

BLOBand TEXT

are string data types designed to store large amounts of data, which are stored in binary and character formats respectively.

MySQL treats each blob and text value as a separate object. Storage engines usually do special handling when storing. When the blob and text values are too large, InnoDB will use a dedicated external storage area for storage. At this time, each value needs 1-4 bytes in the row to store a pointer, and then store the actual value in the external storage area. .

The only difference between the BLOB and TEXT families is that the BLOB type stores binary data without collation rules or character sets, while the text type has character sets and sorting rule.

Use enumeration enum instead of string type

Sometimes OK Use an enumerated column instead of the usual string type. Enumerated columns can store some unique strings as a predefined set. mysql is very compact when storing enumerations, and will be compressed into one or two bytes depending on the number of list values. Mysql internally saves the position of each value in the list as an integer, and saves a lookup table of the number-string' mapping relationship in the .frm file of the table.

Let's look at the following example:

create TABLE enum_test( e ENUM('fish','apple','dog') NOT NULL ); INSERT INTO enum_test(e) VALUES ('fish'),('dog'),('apple');

SELECT e + 0 FROM enum_test;

QQ截图20230306151602.jpg

SELECT e FROM enum_test;

63ec3527e559d.png

So using numbers as enum enumeration constants, this duality can easily lead to confusion, for example enum('1', '2', '3'). So try not to use it that way.

Another surprising thing:

The enumeration fields are sorted by internally stored integers instead of defined strings.

The worst part of enumeration is that the list of strings is fixed, and you must use ALTER TABLE to add or delete strings. So for a series of strings that may change in the future, Enums are not a good idea.

Date and time types

DATETIME

This type can store a wide range of values, from 1001 to 9999, with an accuracy of seconds. It encapsulates the date and time in An integer in the format YYYYMMDDHHMMSS, regardless of time zone. Use 8 bytes of storage space.

TIMRSTAMP

Just like its name, the TIMESTAMP type saves the number of seconds since midnight (GMT) on January 1, 1970. It is the same as the unix timestamp. The timestamp value uses 4 Byte storage space, so its range is much smaller than datetime. It can only mean that from 1970 to 2038, MySQL provides the FROM_UNIXTIME() function to convert Unix timestamps to dates, and provides the Unix_TIMESTAMP() function to convert dates to Unix Timestamp.

bit data type

BIT

mysql regards bit as a string type, not a numeric type. When retrieving the value of bit (1), the result is a character containing binary 0 or 1 String, not 0,1 of ascii code.

SET

If you need to save a lot of true/false values, you can consider merging these columns into a set data type, which is represented by a series of packed bits in mysql. This can effectively use space, and MySQL There are functions like FIND_IN_SET() and FIELD(), which are convenient to use in queries.

Its main disadvantage is that changing the column definition is expensive: ALTER TABLE is required, which is a very expensive operation for large tables.

select identifier
Special type data

Defects in MySQL schema design

Too many columns

MySQL's storage engine API needs to work between the server layer and the storage engine layer Copy the data through the row buffer format, and then decode the buffer content into columns at the server layer. The cost of converting the encoded columns from the row buffer to a row data structure is very high. The conversion cost depends on the number of columns. When we study a case where the CPU usage is very high, we find that the customer uses a very wide table, but only a small part of the columns are actually used, and the conversion cost is very high at this time.

MySQL limits each associated operation to a maximum of 61 tables. A rough experience, if you want fast query execution and good concurrency, it is best to do a single query within 12 tables Association.

Almighty enumeration

Pay attention to placing overused enumeration For example

Don't make an enumeration, it would be impolite to give a complete set of numbers.

Disguised enumeration

The enumeration column allows theColumns store a single value in a set of defined values, and set columns allow one or more values in a set of defined values to be stored in the column.

For example

create TABLE. . . ( is_default set('Y','N') NOT NULL default 'N' )

Here we need to notice that this true and false situation will not appear at the same time, then we should undoubtedly use enumeration instead of this set.

Not invented null

We wrote before to avoid using The benefits of null, and it is recommended to consider alternatives as much as possible. For example, we can use 0, or some special characters to replace null.

But follow this principle and don't go to extremes. Don't be afraid to use null when you really need to represent an unknown value.

paradigm and anti-paradigm

paradigm:

A normal form is a collection of relational schemas that conform to a certain level. Relationships in relational databases must meet certain requirements, and those that meet different levels of requirements are different paradigms.

First Normal Form (1NF)

In any relational database Among them, the first normal form (1NF) [2] is the basic requirement for the relational schema, and a database that does not meet the first normal form (1NF) is not a relational database.

Second Normal Form (2NF)

is in first normal form (1NF), that is, to satisfy the second normal form (2NF) must first satisfy the first normal form (1NF). Second Normal Form (2NF) requires database tablesEach [instance] or row in must be uniquely distinguishable. In order to realize the distinction, it is usually necessary to add a column to the table to store the unique identifier of each instance. This unique attribute column is called [primary key] or primary key, primary key.

Pros and Cons of Paradigm

Pros:

  • Normalized update operation is faster than denormalized update

  • When the data is relatively small With good normalization, there is little or less duplicate data, so only less data needs to be modified.

  • Normalized tables are usually smaller and fit better in memory, so operations execute faster.

  • Few duplicate data means that we will use less distinct or group by statements when selecting.


Disadvantages:

  • Requires association

Advantages and disadvantages of denormalization

The denormalized schema avoids association very well because all the data is in one table.

Mixed normalization and denormalization

The most common denormalization The method of data is to copy or cache, store the same specific column in different tables. We can also use triggers to update cached values, which makes implementing such scenarios much simpler.

Cache tables and summary tables

Sometimes the best way to improve performance It is to save the derived redundant data in the same table. However, sometimes it is necessary to create an entirely separate summary or cache table.

We use the term cache table to denote storingA table for data obtained from other tables in the schema. The term summary table, on the other hand, holds a table that aggregates data using the group by statement.

We use a summary table, which is far more efficient than scanning all the rows of the table.

The cache table is the opposite, which is very effective for optimizing search and retrieval queries. These queries often require special table and index structures. For example: many different index combinations may be required to speed up various types of queries. These conflicting requirements sometimes create a cache table that contains only some of the columns in the main table. A useful trick is that I can use different storage engines. For example, if the main table uses innodb, can I use myisam as the engine of the cache table, so that I can get a smaller index footprint and can do full-text search.

When using cache tables and summary tables, I have to decide whether to maintain the data in real time or rebuild it periodically. Which is better depends on the application, but regular rebuilds not only save resources, but also keep tables from getting so fragmented, and have fully sequentially organized indexes.

Of course, for safety, we will also use a shadow table when rebuilding these tables to ensure that the data is also available during the operation.

Materialized view
Counter table

The counter table is something that is used a lot, and we use a separate table to help avoid query cache invalidation.

Now we are going to show some more advanced techniques:

For example, we have a The counter table is such a table that records the number of clicks on this website, but every time we modify it, there will be a global mutex lock, which leads to these transactions that can only be executed serially. If we want to get better performance, we can save the counter in multiple rows and randomly select a row to update each time. We update this counter table like this:

CREATE TABLE hit_counter( slot tinyint unsigned not null primary key ,cnt int unsigned not null )ENGINE = InnoDB

We add 100 rows of data to the table in advance, and select a random slot to update:

UPDATE hit_counter SET cnt = cnt +1 WHERE slot = RAND()*100;

To count the results, we use the following aggregation query:

SELECT SUM (cnt) FROM hit_counter;

One of our common needs is to start a new counter every once in a while, we modify the table like this:

CREATE TABLE daily_hit_counter( day date not null, slot tinyint unsigned not null, cnt int unsigned not null, primary key (day, slot) )ENGINE = InnoDB;

In this case, we don’t need to pre-generate rows, but use the on duplicate key update statement (if it exists, update it, if it doesn’t exist, insert it)

INSERT INTO daily_hit_counter(day,slot,cnt) VALUES (CURRENT_DATE,RAND()*100,1) ON DUPLICATE KEY UPDATE cnt = cnt + 1;

If you want to reduce the number of rows in the table and avoid the table from becoming too large, you can write a periodic task and merge All results to slot 0, and delete all other slots:

UPDATE daily_hit_counter as c 

INNER JOIN ( 

SELECT day,SUM(cnt)AS cnt,MIN(slot)AS mslot 

FROM daily_hit_counter 

GROUP BY day 

)AS x USING(day) SET c.cnt = IF(c.slot = x.mslot,x.slot,0), c.slot = IF (c.slot = x.mslot,0,c.slot);

DELETE FROM daily_hit_counter WHERE slot <>0 AND cnt = 0;

Speed up the speed of alter TABLE operation

MySQL has always been a big problem for alter TABLE of large tables. The way mysql performs most operations that modify the structure of a table is to create an empty table with the new structure, and then insert the data from the old table into the new table.

For common scenarios, there are only two scenarios that can be used:

  • Execute the ALTER TABLE operation on a machine that does not provide services first, and then switch with the main library that provides services

  • Shadow copy: Create a table with the required table structure and the source Create a new table that has nothing to do with the table, and then exchange the two tables by renaming and deleting the table.

Not all alter TABLE operations will cause table reconstruction. For example, there are two ways to change or delete a column's default value (one is fast, one is slow).

The slow way:

ALTER TABLE sakila.film MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5;

This method is relatively slow, because modifying this method will cause the table to be rebuilt.

ALTER TABLE sakila.film ALTER COLUMN rental_duration SET DEFAULT 5;

This alter method is very fast, because it directly modifies the .firm file without involving table data. So this operation is very fast.


Copyright Description:No reproduction without permission。

Knowledge sharing community for developers。

Let more developers benefit from it。

Help developers share knowledge through the Internet。

Follow us