MySQL的統計信息學習總結


統計信息概念

 

MySQL統計信息是指數據庫通過采樣、統計出來的表、索引的相關信息,例如,表的記錄數、聚集索引page個數、字段的Cardinality....。MySQL在生成執行計划時,需要根據索引的統計信息進行估算,計算出最低代價(或者說是最小開銷)的執行計划.MySQL支持有限的索引統計信息,因存儲引擎不同而統計信息收集的方式也不同. MySQL官方關於統計信息的概念介紹幾乎等同於無,不過對於已經接觸過其它類型數據庫的同學而言,理解這個概念應該不在話下。相對於其它數據庫而言,MySQL統計信息無法手工刪除。MySQL 8.0之前的版本,MySQL是沒有直方圖的。

 

統計信息參數

 

MySQL的InnoDB存儲引擎的統計信息參數有7(個別版本有8個之多),如下所示:

 

MySQL 5.6.41 有8個參數:

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+
8 rows in set (0.00 sec)

 

MySQL 8.0.18 有7個參數:

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+

 

關於這些參數的功能,下面做了一個大概的整理、收集。

 

 

參數名稱

參數意義

innodb_stats_auto_recalc

是否自動觸發更新統計信息。當被修改的數據超過10%時就會觸發統計信息重新統計計算

innodb_stats_include_delete_marked

控制在重新計算統計信息時是否會考慮刪除標記的記錄。

innodb_stats_method

null值的統計方法

innodb_stats_on_metadata

操作元數據時是否觸發更新統計信息

innodb_stats_persistent

統計信息是否持久化

innodb_stats_sample_pages

不推薦使用,已經被innodb_stats_persistent_sample_pages替換

innodb_stats_persistent_sample_pages

持久化抽樣page

innodb_stats_transient_sample_pages

瞬時抽樣page

 

 

參數innodb_stats_auto_recalc

 

 

該參數innodb_stats_auto_recalc控制是否自動重新計算統計信息,當表中數據有大於10%被修改時就會重新計算統計信息(注意,由於統計信息重新計算是在后台發生,而且它是異步處理,這個可能存在延時,不會立即觸發,具體見下面介紹)。如果關閉了innodb_stats_auto_recalc,需要通過analyze table來保證統計信息的准確性。不管有沒有開啟全局變量innodb_stats_auto_recalc。即使innodb_stats_auto_recalc=OFF時,當新索引被增加到表中,所有索引的統計信息會被重新計算並且更新到innodb_index_stats表上。

 

 

 

下面驗證一下系統變量innodb_stats_auto_recalc=OFF時,創建索引時,會觸發該表所有索引重新統計計算。

 

mysql> set global innodb_stats_auto_recalc=off;
Query OK, 0 rows affected (0.00 sec)
 
mysql> show variables like 'innodb_stats_auto_recalc%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_stats_auto_recalc | OFF   |
+--------------------------+-------+
1 row in set (0.00 sec)
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> create index ix_test_name on test(name);
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx01 |          1 |           1 | name                              |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx02 |          2 |           1 | name,DB_ROW_ID                    |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

 

 

下面是我另外一個測試,全局變量innodb_stats_auto_recalc=ON的情況,修改表的屬性STATS_AUTO_RECALC=0,然后新建索引,測試驗證發現也會重新計算所有索引的統計信息。

mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.01 sec)
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> CREATE INDEX ix_test_name ON test(name);
Query OK, 0 rows affected (1.41 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name   | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx01 |        999 |          17 | name                              |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx02 |        999 |          17 | name,id                           |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_leaf_pages |         17 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | size         |         18 |        NULL | Number of pages in the index      |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)
 
mysql> 

 

關於統計信息重新計算延時,官方的介紹如下:

 

Because of the asynchronous nature of automatic statistics recalculation, which occurs in the background, statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc is enabled. Statistics recalculation can be delayed by few seconds in some cases. If up-to-date statistics are required immediately, run ANALYZE TABLE to initiate a synchronous (foreground) recalculation of statistics

 

 

參數innodb_stats_include_delete_marked

 

重新計算統計信息時是否會考慮刪除標記的記錄.

innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

 

網上有個關於innodb_stats_include_delete_marked的建議,如下所示,但是限於經驗無法對這個建議鑒定真偽,個人覺得還是選擇默認關閉,除非有特定場景真有這種需求。

 

·         innodb_stats_include_delete_marked建議設置開啟,這樣可以針對未提交事務中刪除的數據也收集統計信息。

 

 

By default, InnoDB reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, delete-marked records are excluded when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED. To avoid this scenario, innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

When innodb_stats_include_delete_marked is enabled, ANALYZE TABLE considers delete-marked records when recalculating statistics.innodb_stats_include_delete_marked is a global setting that affects all InnoDB tables, and it is only applicable to persistent optimizer statistics.innodb_stats_include_delete_marked was introduced in MySQL 5.6.34. 

   

 

 

 

參數innodb_stats_method

 

Specifies how InnoDB index statistics collection code should treat NULLs. Possible values are NULLS_EQUAL (default), NULLS_UNEQUAL and NULLS_IGNORED

 

·         當變量設置為nulls_equal時,所有NULL值都被視為相同(即,它們都形成一個 value group)

·         當變量設置為nulls_unequal時,NULL值不被視為相同。相反,每個NULL value 形成一個單獨的 value group,大小為 1

·         當變量設置為nulls_ignored時,將忽略NULL值。

 

 

 

更多詳細信息,參考官方文檔InnoDB and MyISAM Index Statistics Collection,另外,還有一個系統變量myisam_stats_method控制MyISAM表對Null值的統計方法。

 

 

mysql> show variables like 'myisam_stat%';
+---------------------+---------------+
| Variable_name       | Value         |
+---------------------+---------------+
| myisam_stats_method | nulls_unequal |
+---------------------+---------------+
1 row in set (0.00 sec)

 

 

 

參數innodb_stats_on_metadata

 

 

參數innodb_stats_on_metadataMySQL 5.6.6之前的版本默認開啟(默認值為O),每當查詢information_schema元數據庫里的表時(例如,information_schema.TABLESinformation_schema.TABLE_CONSTRAINTS .... )或show table statusSHOW INDEX..這類操作時,Innodb還會隨機提取其他數據庫每個表索引頁的部分數據,從而更新information_schema.STATISTICS表,並返回剛才查詢的結果。當你的表很大,且數量很多時,耗費的時間就很長,以致很多經常不訪問的數據也會進入Innodb_buffer_pool緩沖池中,造成池污染,關閉這個參數,可以加快對於schema庫表訪問,同時也可以改善查詢執行計划的穩定性(對於Innodb表的訪問)。所以從MySQL 5.6.6這個版本開始,此參數默認為OFF

 

注意僅當優化器統計信息配置為非持久性時,此選項才生效。這個參數開啟的時候,InnoDB會更新非持久統計信息

 

 

官方文檔的介紹如下:

 

innodb_stats_on_metadata

Property

Value

Command-Line Format

--innodb-stats-on-metadata[={OFF|ON}]

System Variable

innodb_stats_on_metadata

Scope

Global

Dynamic

Yes

Type

Boolean

Default Value

OFF

 

This option only applies when optimizer statistics are configured to be non-persistent. Optimizer statistics are not persisted to disk when innodb_stats_persistent is disabled or when individual tables are created or altered with STATS_PERSISTENT=0. For more information, see Section 14.8.11.2, “Configuring Non-Persistent Optimizer Statistics Parameters”.

 

When innodb_stats_on_metadata is enabled, InnoDB updates non-persistent statistics when metadata statements such as SHOW TABLE STATUS or when accessing the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables. (These updates are similar to what happens for ANALYZE TABLE.) When disabled,InnoDB does not update statistics during these operations. Leaving the setting disabled can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

To change the setting, issue the statement SET GLOBAL innodb_stats_on_metadata=mode, where mode is either ON or OFF (or 1 or 0). Changing the setting requires privileges sufficient to set global system variables (see Section 5.1.8.1, “System Variable Privileges”) and immediately affects the operation of all connections

 

 

參數innodb_stats_persistent

 

 

此參數控制統計信息是否持久化,如果此參數啟用,統計信息將會保存到mysql數據庫的innodb_table_statsinnodb_index_stats表中。從MySQL 5.6.6開始,MySQL默認使用持久化的統計信息,即默認INNODB_STATS_PERSISTENT=ON Persistent optimizer statistics were introduced in MySQL 5.6.2 and were made the default in MySQL 5.6.6置此參數之后我們就不需要實時去收集統計信息了,因為實時收集統計信息在高並發下可能會造成一定的性能上影響,並且會導致執行計划有所不同。

 

 

  另外,我們可以使用表的建表參數(STATS_PERSISTENT,STATS_AUTO_RECALC和STATS_SAMPLE_PAGES子句)來覆蓋系統變量設置的值,建表選項可以在CREATE TABLE或ALTER TABLE語句中指定。表上面指定的參數會覆蓋全局變量,也就是說優先級要高於全局變量。例子如下:

 

 
mysql> ALTER TABLE test STATS_PERSISTENT=1;
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

 

持久化統計新存儲在mysql.innodb_index_stats和mysql.innodb_table_stats中,這兩個表的定義如下:

 

 

innodb_table_stats

 

Column name

Description

database_name

數據庫名

table_name

表名,分區名或者子分區名

last_update

統計信息最后一次更新時間戳

n_rows

表中數據行數

clustered_index_size

聚集索引page個數

sum_of_other_index_sizes

非聚集索引page個數

 

innodb_index_stats

 

Column name

Description

database_name

數據庫名

table_name

表名,分區名或者子分區名

index_name

索引名

last_update

最后一次更新時間戳

stat_name

統計信息名

stat_value

統計信息不同值個數

sample_size

采樣page個數

stat_description

描述

 

 

 

非持久化(Non-persistent optimizer statistics) 存儲在內存里,並在服務器關閉時丟失。某些業務和某些條件下也會定期更新統計數據。  注意,這里保存在內存指保存在哪里呢?

 

Optimizer statistics are not persisted to disk when innodb_stats_persistent=OFF or when individual tables are created or altered with STATS_PERSISTENT=0. Instead, statistics are stored in memory, and are lost when the server is shut down. Statistics are also updated periodically by certain operations and under certain conditions.

 

其實這里指保存在內層表(MEMROY TABLE),下面有簡單介紹。

 

 

 

參數innodb_stats_persistent_sample_pages

 

如果參數innodb_stats_persistent設置為ON,該參數表示ANALYZE TABLE更新Cardinality值時每次采樣頁的數量。默認值為20個頁面。innodb_stats_persistent_sample_pages太少會導致統計信息不夠准確,太多會導致分析執行太慢。

 

我們可以在創建表的時候對不同的表指定不同的page數量、是否將統計信息持久化到磁盤上、是否自動收集統計信息,如下所示:

 

CREATE TABLE `test` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
P
PRIMARY KEY  (`id`),
I
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
  STATS_PERSISTENT=1,
  STATS_AUTO_RECALC=1,
  STATS_SAMPLE_PAGES=25;

 

 

參數innodb_stats_sample_pages 

 

 

已棄用. 已用innodb_stats_transient_sample_pages 替代。

 

 

參數innodb_stats_transient_sample_pages

 

 

innodb_stats_transient_sample_pages控制采樣pages個數,默認為8Innodb_stats_transient_sample_pages可以runtime設置

 

innodb_stats_transient_sample_pagesinnodb_stats_persistent=0的時候影響采樣。注意點:

 

1.若值太小,會導致評估不准

2.若果值太大,會導致disk read增加。

3.會生產很不同的執行計划,因為統計信息不同。

 

 

還有一個參數information_schema_stats_expiry。這個參數的作用如下:

 

·         對於INFORMATION_SCHEMA下的STATISTICS表和TABLES表中的信息,8.0中通過緩存的方式,以提高查詢的性能。可以通過設置information_schema_stats_expiry參數設置緩存數據的過期時間,默認是86400秒。查詢這兩張表的數據的時候,首先是到緩存中進行查詢,緩存中沒有緩存數據,或者緩存數據過期了,查詢會從存儲引擎中獲取最新的數據。如果需要獲取最新的數據,可以通過設置information_schema_stats_expiry參數為0或者ANALYZE TABLE操作

 

 

 

查看統計信息

 

 

統計信息分持久化(PERSISTENT)與非持久化統計數據(TRANSIENT),那么它們存儲在哪里呢?

 

 

·         持久化統計數據

 

  存儲在mysql.innodb_index_statsmysql.innodb_table_stats

 

·         非持久化統計數據

 

      MySQL 8.0之前,存儲在information_schema.INDEXESinformation_schema.TABLES中, 那么MySQL8.0之后放在那里呢? INFORMATION_SCHEMA.TABLESINFORMATION_SCHEMA.STATISTICSINNODB_INDEXES。官方文檔說非持久化統計信息放在內存中,其實就是內存表(MEMORY Table)中。

 

 

 

 

我們可以用下面腳本查看持久化統計信息信息,mysql.innodb_index_stats的數據如何看懂,要搞懂stat_namestat_value的具體含義:

 

 

select * from mysql.innodb_index_stats 
where table_name = 'test';
 
 
select * from mysql.innodb_index_stats 
where database_name='MyDB' and table_name = 'test';

 

 

 

 

stat_name=size時:stat_value表示索引的頁的數量(Number of pages in the index

 

stat_name=n_leaf_pages時:stat_value表示葉子節點的數量(Number of leaf pages in the index

 

stat_name=n_diff_pfxNN時:stat_value表示索引字段上唯一值的數量,此處做一下具體說明:

 

  *n_diff_pfxNN NN代表數字(例如: 0102等),當stat_namen_diff_pfxNN時,stat_value列值顯示索引的first column(即索引的最前索引列,從索引定義順序的第一個列開始)列的唯一值數量,例如: NN01時,stat_value列值就表示索引的第一個列的唯一值數量,當NN02時,stat_value列值就表示索引的第一和第二個列的組合唯一值數量,以此類推。 此外,在stat_name = n_diff_pfxNN的情況下,stat_description列顯示一個以逗號分隔的計算索引統計信息列的列表。

 

 

 

MySQL的直方圖

 

 

MySQL 8.0推出了直方圖(histogram), 直方圖數據存放在information_schema.column_statistics這個系統表下,每行記錄對應一個字段的直方圖,以json格式保存。同時,新增了一個參數histogram_generation_max_mem_size來配置建立直方圖內存大小。

 

直方圖是數字數據分布的准確表示。對於RDBMS,直方圖是特定列內數據分布的近似值。

 

 

mysql> show variables like 'histogram_generation_max_mem_size';
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| histogram_generation_max_mem_size | 20000000 |
+-----------------------------------+----------+
1 row in set (0.01 sec)
 
mysql> 
 
mysql> desc information_schema.column_statistics;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | NO   |     | NULL    |       |
| TABLE_NAME  | varchar(64) | NO   |     | NULL    |       |
| COLUMN_NAME | varchar(64) | NO   |     | NULL    |       |
| HISTOGRAM   | json        | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
 
mysql> 

 

 

MySQL的直方圖有兩種,等寬直方圖和等高直方圖。等寬直方圖每個桶(bucket)保存一個值以及這個值累積頻率;等高直方圖每個桶需要保存不同值的個數,上下限以及累計頻率等。MySQL會自動分配用哪種類型的直方圖,有時候可以通過設置合適Buckets數量來實現。?

 

 

 

創建刪除直方圖

 

 

直方圖數據會自動生成嗎? MySQL的直方圖比較特殊,不會在創建索引的時候自動生成直方圖數據,需要手工執行 ANALYZE TABLE [table] UPDATE HISTOGRAM .... 這樣的命令產生表上各列的直方圖,默認情況下這些信息會被復制到備庫。

 

 

 

ANALYZE TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] WITH N BUCKETS;

ANALYZE TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name];

 

ANALYZE TABLE test UPDATE HISTOGRAM ON create_date,name WITH 16 BUCKETS;

 

 

注意:可指定BUCKETS的值,也可以不指定,它的取值范圍為11024,如果不指定BUCKETS值的話,默認值是100

 

 

我們測試如下,首先刪除所有的直方圖數據。然后使用下面SQL生成直方圖數據。

 

 

ANALYZE TABLE test UPDATE HISTOGRAM ON name;
 
SELECT SCHEMA_NAME
      ,TABLE_NAME
      ,COLUMN_NAME
   ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
      ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
      ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
      ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
      ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';

 

 

clip_image001

 

 

其實不是所有默認的BUCKETS都是100,如下所示,如果我將記錄刪除,只剩下49條記錄,然后創建直方圖,你會看到BUCKETS的數量為49,所有這個值還跟表的數據量有關系。如果數據量較大的話,默認是100

 

 

clip_image002

 

 

另外,如下測試所示,如果BUCKETS超過1024,就會報ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'

 

 

mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1024 BUCKETS;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics created for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.13 sec)
 
mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1025 BUCKETS;
ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'
mysql> 

 

 

clip_image003

 

 

 

 

刪除刪除直方圖

 

 

 

--刪除字段上的統計直方圖信息

ANALYZE TABLE test DROP HISTOGRAM ON create_date

 

 

mysql> ANALYZE TABLE test DROP HISTOGRAM ON name;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics removed for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.10 sec)

 

 

直方圖信息查看

 

 

    我們知道直方圖的數據是以json格式保存的,直接將json格式展示出來,看起來非常不直觀。其實有一些SQL可以解決這個問題。

 

 

SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) 
FROM information_schema.column_statistics 
WHERE TABLE_NAME='test'\G
 
 
SELECT SCHEMA_NAME
     ,TABLE_NAME
     ,COLUMN_NAME
     ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
     ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
     ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
     ,HISTOGRAM->>'$."histogram-type"' AS HISTOGRAM_TYPE
     ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
     ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';
 
 
SELECT FROM_BASE64(SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';
 
 
 
SELECT v value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';

 

 

 

 

更新統計信息

 

非持久統計統計信息也會觸發自動更新,非持久化統計信息在以下情況會被自動更新,官方文檔介紹如下:

 

Non-persistent optimizer statistics are updated when:
 
Running ANALYZE TABLE.
 
Running SHOW TABLE STATUS, SHOW INDEX, or querying the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables with theinnodb_stats_on_metadata option enabled.
The default setting for innodb_stats_on_metadata is OFF. Enabling innodb_stats_on_metadata may reduce access speed for schemas that have a large number of tables or indexes, and reduce stability of execution plans for queries that involve InnoDB tables. innodb_stats_on_metadata is configured globally using a SETstatement.
SET GLOBAL innodb_stats_on_metadata=ON
Note
innodb_stats_on_metadata only applies when optimizer statistics are configured to be non-persistent (when innodb_stats_persistent is disabled).
 
Starting a mysql client with the --auto-rehash option enabled, which is the default. The auto-rehash option causes all InnoDB tables to be opened, and the open table operations cause statistics to be recalculated.
To improve the start up time of the mysql client and to updating statistics, you can turn off auto-rehash using the --disable-auto-rehash option. The auto-rehashfeature enables automatic name completion of database, table, and column names for interactive users.
 
A table is first opened.
 
InnoDB detects that 1 / 16 of table has been modified since the last time statistics were updated.

 

 

 簡單整理如下:

 

 

1 執行ANALYZE TABLE

 

2 innodb_stats_on_metadata=ON情況下,執SHOW TABLE STATUS, SHOW INDEX, 查詢 INFORMATION_SCHEMA下的TABLES, STATISTICS

 

3 啟用--auto-rehash功能情況下,使用mysql client登錄

 

4 表第一次被打開

 

5 距上一次更新統計信息,表1/16的數據被修改

 

 

持久統計信息的統計信息更新上面已經有介紹,還有一種方法就是手動更新統計信息,

 

 

 

1、手動更新統計信息,注意執行過程中會加讀鎖:

 

ANALYZE TABLE TABLE_NAME;

 

2、如果更新后統計信息仍不准確,可考慮增加表采樣的數據頁,兩種方式可以修改:

 

1) 全局變量INNODB_STATS_PERSISTENT_SAMPLE_PAGES,默認為20;

 

2) 單個表可以指定該表的采樣:

ALTER TABLE TABLE_NAME STATS_SAMPLE_PAGES=100;

 

經測試,此處STATS_SAMPLE_PAGES的最大值是65535,超出會報錯。

 

mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65535;
 
Query OK, 0 rows affected (0.12 sec)
 
Records: 0  Duplicates: 0  Warnings: 0
 
 
 
mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65536;
 
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '65536' at line 1
 
mysql>

 

 

 

參考資料:

 

https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/8.0/en/index-statistics.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-performance-optimizer-statistics.html

https://www.percona.com/blog/2019/10/29/column-histograms-on-percona-server-and-mysql-8-0/  重點

http://chinaunix.net/uid-31396856-id-5787793.html

https://mysqlserverteam.com/histogram-statistics-in-mysql/

https://mp.weixin.qq.com/s/698g5lm9CWqbU0B_p0nLMw?


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM