This article mainly introduces how to design a MySQL database schema with better performance. The editor will show you the operation process through actual cases. The operation method is simple, fast, and practical. I hope this article how to design a better performance Excellent MySQL database schema article can help you solve the problem.
mysql supports many data types, Choosing the correct data type is critical to achieving high performance. Regardless of the type of data you store, here are a few simple principles you need to keep in mind.
Smaller is usually better
Simple is better
Try to avoid null
We have the following integer types:
Their storage space ranges from -2(N-1) power to 2 (N-1) minus one. N is the number of digits
The integer type has an optional unsigned attribute, which means that negative values are not allowed, which roughly makes positive The upper limit of the number is doubled.
Real number, it is Numbers with a fractional part, however, it is not just for storing fractional parts; you can also use decimal to store integers larger than bigint.
decimal(18,9) store 9 numbers on both sides of the decimal point, one uses 9 bytes; the number before the decimal point uses 4 bytes, the number after the decimal point uses 4 bytes, and the decimal point itself occupies one byte .
Because additional space and computational overhead are required, you should try to use decimal only when performing precise calculations on decimals--such as storing financial data, but in data When the amount is relatively large, you can use bigint instead of decimal, and multiply the stored data by the corresponding multiple according to the number of decimal places.
used to store variable length String, if MySQL uses ROW_FORMAT=FIXED to create, each row will use fixed-length storage, which will waste space.
Varchar needs to use 1 or 2 The extra bytes record the length of the string. For example: a column of varchar(10) requires 11 bytes of storage.
But because the row is variable length When updating, the row may become longer than the original, which leads to additional work. For example, myisam will split the row into different fragments for storage, and innodb needs to split the page so that the row can be placed in the page.
char type is fixed length, when storing char value .mysql will delete all trailing spaces.
char is suitable for storing very short strings, or all values are close to the same length. Example: char is very suitable for storing the md5 value of a password.
For data that changes frequently, char is also better than varchar, because the fixed-length char type is not easy to generate fragments.
are string data types designed to store large amounts of data, which are stored in binary and character formats respectively.
MySQL treats each blob and text value as a separate object. Storage engines usually do special handling when storing. When the blob and text values are too large, InnoDB will use a dedicated external storage area for storage. At this time, each value needs 1-4 bytes in the row to store a pointer, and then store the actual value in the external storage area. .
The only difference between the BLOB and TEXT families is that the BLOB type stores binary data without collation rules or character sets, while the text type has character sets and sorting rule.
Use enumeration enum instead of string type
Sometimes OK Use an enumerated column instead of the usual string type. Enumerated columns can store some unique strings as a predefined set. mysql is very compact when storing enumerations, and will be compressed into one or two bytes depending on the number of list values. Mysql internally saves the position of each value in the list as an integer, and saves a lookup table of the number-string' mapping relationship in the .frm file of the table.
Let's look at the following example:
create TABLE enum_test( e ENUM('fish','apple','dog') NOT NULL ); INSERT INTO enum_test(e) VALUES ('fish'),('dog'),('apple');
SELECT e + 0 FROM enum_test;
SELECT e FROM enum_test;
So using numbers as enum enumeration constants, this duality can easily lead to confusion, for example enum('1', '2', '3'). So try not to use it that way.
Another surprising thing:
The enumeration fields are sorted by internally stored integers instead of defined strings.
The worst part of enumeration is that the list of strings is fixed, and you must use ALTER TABLE to add or delete strings. So for a series of strings that may change in the future, Enums are not a good idea.
This type can store a wide range of values, from 1001 to 9999, with an accuracy of seconds. It encapsulates the date and time in An integer in the format YYYYMMDDHHMMSS, regardless of time zone. Use 8 bytes of storage space.
Just like its name, the TIMESTAMP type saves the number of seconds since midnight (GMT) on January 1, 1970. It is the same as the unix timestamp. The timestamp value uses 4 Byte storage space, so its range is much smaller than datetime. It can only mean that from 1970 to 2038, MySQL provides the FROM_UNIXTIME() function to convert Unix timestamps to dates, and provides the Unix_TIMESTAMP() function to convert dates to Unix Timestamp.
mysql regards bit as a string type, not a numeric type. When retrieving the value of bit (1), the result is a character containing binary 0 or 1 String, not 0,1 of ascii code.
If you need to save a lot of true/false values, you can consider merging these columns into a set data type, which is represented by a series of packed bits in mysql. This can effectively use space, and MySQL There are functions like FIND_IN_SET() and FIELD(), which are convenient to use in queries.
Its main disadvantage is that changing the column definition is expensive: ALTER TABLE is required, which is a very expensive operation for large tables.
Too many columns
MySQL's storage engine API needs to work between the server layer and the storage engine layer Copy the data through the row buffer format, and then decode the buffer content into columns at the server layer. The cost of converting the encoded columns from the row buffer to a row data structure is very high. The conversion cost depends on the number of columns. When we study a case where the CPU usage is very high, we find that the customer uses a very wide table, but only a small part of the columns are actually used, and the conversion cost is very high at this time.
MySQL limits each associated operation to a maximum of 61 tables. A rough experience, if you want fast query execution and good concurrency, it is best to do a single query within 12 tables Association.
Pay attention to placing overused enumeration For example
Don't make an enumeration, it would be impolite to give a complete set of numbers.
The enumeration column allows theColumns store a single value in a set of defined values, and set columns allow one or more values in a set of defined values to be stored in the column.
create TABLE. . . ( is_default set('Y','N') NOT NULL default 'N' )
Here we need to notice that this true and false situation will not appear at the same time, then we should undoubtedly use enumeration instead of this set.
Not invented null
We wrote before to avoid using The benefits of null, and it is recommended to consider alternatives as much as possible. For example, we can use 0, or some special characters to replace null.
But follow this principle and don't go to extremes. Don't be afraid to use null when you really need to represent an unknown value.
A normal form is a collection of relational schemas that conform to a certain level. Relationships in relational databases must meet certain requirements, and those that meet different levels of requirements are different paradigms.
First Normal Form (1NF)
In any relational database Among them, the first normal form (1NF)  is the basic requirement for the relational schema, and a database that does not meet the first normal form (1NF) is not a relational database.
Second Normal Form (2NF)
is in first normal form (1NF), that is, to satisfy the second normal form (2NF) must first satisfy the first normal form (1NF). Second Normal Form (2NF) requires database tablesEach [instance] or row in must be uniquely distinguishable. In order to realize the distinction, it is usually necessary to add a column to the table to store the unique identifier of each instance. This unique attribute column is called [primary key] or primary key, primary key.
Normalized update operation is faster than denormalized update
When the data is relatively small With good normalization, there is little or less duplicate data, so only less data needs to be modified.
Normalized tables are usually smaller and fit better in memory, so operations execute faster.
Few duplicate data means that we will use less distinct or group by statements when selecting.
The denormalized schema avoids association very well because all the data is in one table.
The most common denormalization The method of data is to copy or cache, store the same specific column in different tables. We can also use triggers to update cached values, which makes implementing such scenarios much simpler.
Sometimes the best way to improve performance It is to save the derived redundant data in the same table. However, sometimes it is necessary to create an entirely separate summary or cache table.
We use the term cache table to denote storingA table for data obtained from other tables in the schema. The term summary table, on the other hand, holds a table that aggregates data using the group by statement.
We use a summary table, which is far more efficient than scanning all the rows of the table.
The cache table is the opposite, which is very effective for optimizing search and retrieval queries. These queries often require special table and index structures. For example: many different index combinations may be required to speed up various types of queries. These conflicting requirements sometimes create a cache table that contains only some of the columns in the main table. A useful trick is that I can use different storage engines. For example, if the main table uses innodb, can I use myisam as the engine of the cache table, so that I can get a smaller index footprint and can do full-text search.
When using cache tables and summary tables, I have to decide whether to maintain the data in real time or rebuild it periodically. Which is better depends on the application, but regular rebuilds not only save resources, but also keep tables from getting so fragmented, and have fully sequentially organized indexes.
Of course, for safety, we will also use a shadow table when rebuilding these tables to ensure that the data is also available during the operation.
The counter table is something that is used a lot, and we use a separate table to help avoid query cache invalidation.
Now we are going to show some more advanced techniques:
For example, we have a The counter table is such a table that records the number of clicks on this website, but every time we modify it, there will be a global mutex lock, which leads to these transactions that can only be executed serially. If we want to get better performance, we can save the counter in multiple rows and randomly select a row to update each time. We update this counter table like this:
CREATE TABLE hit_counter( slot tinyint unsigned not null primary key ,cnt int unsigned not null )ENGINE = InnoDB
We add 100 rows of data to the table in advance, and select a random slot to update:
UPDATE hit_counter SET cnt = cnt +1 WHERE slot = RAND()*100;
To count the results, we use the following aggregation query:
SELECT SUM (cnt) FROM hit_counter;
One of our common needs is to start a new counter every once in a while, we modify the table like this:
CREATE TABLE daily_hit_counter( day date not null, slot tinyint unsigned not null, cnt int unsigned not null, primary key (day, slot) )ENGINE = InnoDB;
In this case, we don’t need to pre-generate rows, but use the on duplicate key update statement (if it exists, update it, if it doesn’t exist, insert it)
INSERT INTO daily_hit_counter(day,slot,cnt) VALUES (CURRENT_DATE,RAND()*100,1) ON DUPLICATE KEY UPDATE cnt = cnt + 1;
If you want to reduce the number of rows in the table and avoid the table from becoming too large, you can write a periodic task and merge All results to slot 0, and delete all other slots:
UPDATE daily_hit_counter as c
INNER JOIN (
SELECT day,SUM(cnt)AS cnt,MIN(slot)AS mslot
GROUP BY day
)AS x USING(day) SET c.cnt = IF(c.slot = x.mslot,x.slot,0), c.slot = IF (c.slot = x.mslot,0,c.slot);
DELETE FROM daily_hit_counter WHERE slot <>0 AND cnt = 0;
MySQL has always been a big problem for alter TABLE of large tables. The way mysql performs most operations that modify the structure of a table is to create an empty table with the new structure, and then insert the data from the old table into the new table.
For common scenarios, there are only two scenarios that can be used:
Execute the ALTER TABLE operation on a machine that does not provide services first, and then switch with the main library that provides services
Shadow copy: Create a table with the required table structure and the source Create a new table that has nothing to do with the table, and then exchange the two tables by renaming and deleting the table.
Not all alter TABLE operations will cause table reconstruction. For example, there are two ways to change or delete a column's default value (one is fast, one is slow).
The slow way:
ALTER TABLE sakila.film MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5;
This method is relatively slow, because modifying this method will cause the table to be rebuilt.
ALTER TABLE sakila.film ALTER COLUMN rental_duration SET DEFAULT 5;
This alter method is very fast, because it directly modifies the .firm file without involving table data. So this operation is very fast.
Copyright Description：No reproduction without permission。
Knowledge sharing community for developers。
Let more developers benefit from it。
Help developers share knowledge through the Internet。