23.Secondary Index

本文轉載自查看原文 2018-12-22 14:54 1090 MySQL

一. Secondary Index（二級索引）
1.1. Secondary Index 介紹

• Clustered Index（聚集索引）
    ◦ 葉子節點存儲所有記錄（all row data）
• Secondary Index（二級索引）
    ◦ 也可以稱為 非聚集索引
    ◦ 葉子節點存儲的是 索引 和 主鍵 信息
    ◦ 在找到索引后，得到對應的主鍵，再 回到聚集索引 中找主鍵對應的記錄（row data）
        ◾ Bookmark Lookup （書簽查找）
        ◾ 俗稱 回表
        ◾ 回表 不止 多 一次IO
        ◾ 而是 多N次 IO（N=樹的高度）

1.2. Secondary Index 回表

create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
unique key idx_username(username),
key idx_registdate(registdate)
);

1. 假設查找 username 為Tom，先找二級索引 idx_username ，通過找到 key 為Tom，並得到對應的primary key：userid_a。
2. 得到了userid_a后，再去找聚集索引中userid_a的記錄（row data）。
3. 上述一次通過 二級索引 得到 數據 （row data）的 查找過程 ，即為 回表 。
4. 上述過程都是MySQL自動幫你做的。

可以將上述的 userinfo 表進行人工拆分，從而進行人工回表，拆分如下：

-- 表1 : 創建一個只有主鍵userid的表，將原來的二級索引 人工拆分 成獨立的表
create table userinfo(
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid)
);
-- 表2： idx_username表，將userid和username作為表的字段，並做一個復合主鍵 （對應原來的idx_username索引）
create table idx_username(
userid int not null,
username varchar(30),
primary key(username, userid)
);
-- 表3： idx_registdate表，將userid和registdate作為表的字段，並做一個復合主鍵 （對應原來的idx_registdate 索引）
create table idx_registdate(
userid int not null,
registdate datetime,
primary key(registdate, userid)
);
-- 表4：一致性約束表
create table idx_username_constraint(
username varchar(30),
primary key(username)
);
-- 插入數據，使用事物，要么全插，要么全不差
start transaction;
insert into userinfo values(1, 'Tom', '1990-01-01', 'tom@123.com');
insert into idx_username_constraint('Tom');
insert into idx_username(1, 'Tom');
insert into idx_registdate(1, '1990-01-01')
commit；

• 假設要查找TOM的 email ：

1. 先查找 Tom 對應的 userid ，即找的是 idx_username表 （對應之前就是在idx_username索引中找tom）
2. 得到 userid 后，再去 userinfo表 ，通過 userid 得到 email 字段的內容（對對應之前就是在 聚集索引 中找userid的記錄（row data））
3. 上述兩次查找就是 人工回表

拆表后，就需要開發自己去實現 回表 的邏輯；而開始的一張大表，則是MySQL自動實現該邏輯。

1.3. 堆表的二級索引
1. 在堆表中，是沒有聚集索引的，所有的索引都是二級索引；
2. 索引的葉子節點存放的是 key 和指向堆中記錄的指針（物理位置）

1.4. 堆表和IOT表二級索引的對比

1. 堆表中的二級索引查找 不需要回表 ，且查找速度和 主鍵索引 一致，因為兩者的 葉子節點 存放的都是 指向數據 的 指針 ；反之 IOT表 的的二級索引查找需要回表。
2. 堆表中某條記錄（row data）發生更新且 無法原地更新 時，該記錄（row data）的物理位置將發生改變；此時， 所有索引 中對該記錄的 指針 都需要 更新 （代價較大）；反之，IOT表中的記錄更新，且 主鍵沒有更新 時， 二級索引 都 無需更新 （通常來說主鍵是不更新的）
◦ 實際數據庫設計中，堆表的數據無法原地更新時，且在一個 頁內有剩余空間 時，原來數據的空間位置不會釋放，而是使用指針指向新的數據空間位置，此時該記錄對應的所有索引就無需更改了；
◦ 如果 頁內沒有剩余空間 ，所有的索引還是要更新一遍；
3. IOT表頁內是有序的，頁與頁之間也是有序的，做range查詢很快。

1.5. index with included column（含列索引）
在上面給出的 userinfo 的例子中，如果要查找某個用戶的email ，需要回表，如何不回表進行查詢呢？

1. 方案一 ：復合索引
-- 表結構
create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
unique key idx_username(username, email), -- 索引中有email，可以直接查到，不用回表
key idx_registdate(registdate)
);

-- 查詢
select email from userinfo where username='Tom';
該方案可以做到 只查一次 索引就可以得到用戶的email，但是 復合索引 中username和email都要 排序
而 含列索引 的意思是索引中 只對username 進行排序，email是不排序的，只是帶到索引中，方便查找

2. 方案二：拆表
create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
key idx_registdate(registdate)
);

create table idx_username_include_email (
userid int not null,
username varchar(30),
email varchar(50),
primary key(username, userid),
unique key(username)
);

-- 兩個表的數據一致性可以通過事物進行保證

通過拆表的方式，查找 idx_username_include_email 表，既可以通過 username 找到 email ，但是需要告訴研發，如果想要通過useranme得到email，查這張表速度更快，而不是查userinfo表

對於含有多個索引的IOT表，可以將索引拆成不同的表，進而提高查詢速度
但是實際使用中，就這個例子而言，使用復合索引，代價也不會太大。

二. Multi-Range Read（MRR）
2.1. 回表的代價

mysql> alter table employees add index idx_date (hire_date); -- 給 employees 增加一個索引


mysql> show create table employees\G
*************************** 1. row ***************************
Table: employees
Create Table: CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`),
KEY `idx_date` (`hire_date`) -- 新增的索引
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

-- 查詢語句1
mysql> select * from employees where emp_no between 10000 and 20000; -- 主鍵查找1W條數據

-- 查詢語句2
mysql> select * from employees where hire_date >= '1990-01-01' limit 10000; -- select * 操作，每次查找需要回表
1. 對於 查詢語句1 ，假設一個頁中有100條記錄，則只需要100次IO；
2. 對於 查詢語句2 ，此次查詢中，假設 聚集索引 和 hire_date索引 （二級索引）的高度都是 3 ，且查找 1W 條（假設不止1W條），則需要查詢的IO數為 (3+N)+3W
　　◦ 3 為 第一次 找到 hire_date>=1990-01-01 所在的頁（二級索引）的IO次數
　　◦ N 為從第一次找到的頁 往后 讀頁的IO次數（注意二級索引也是連續的， 不需要 從根再重新查找）
　　　　◾ 所以 3+N 就是在 hire_date （二級索引）中讀取IO的次數
　　◦ 3W 為在IOT表中進行 回表 的次數
3. 在MySQL5.6之前，實際使用過程中，優化器可能會選擇直接進行 掃表 ，而 不會 進行如此多的回表操作。

2.2. MRR 介紹
MRR：針對物理訪問，隨機轉順序，空間換時間。

1. 開辟一塊 內存 空間作為cache
　　◦ 默認為 32M ，注意是 線程級 的，不建議設置的很大；

mysql> show variables like "%read_rnd%";
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| read_rnd_buffer_size | 33554432 | -- 32M
+----------------------+----------+
1 row in set (0.00 sec)

2. 將 需要回表 的 主鍵 放入上述的 內存 空間中（空間換時間）， 放滿 后進行 排序 （隨機轉順序）；
3. 將 排序 好數據（主鍵）一起進行回表操作，以提高性能；
　　◦ 在 IO Bound 的SQL場景下，使用MRR比不使用MRR系能 提高 將近 10倍 （磁盤性能越低越明顯）；
　　◦ 如果數據都在內存中，MRR的幫助不大， 已經在內存 中了，不存在隨機讀的概念了（隨機讀主要針對物理訪問）
SSD 仍然需要開啟該特性，多線程下的隨機讀確實很快，但是我們這里的操作是一條SQL語句，是 單線程 的，所以 順序 的訪問還是比 隨機 訪問要 更快 。

mysql> show variables like 'optimizer_switch'\G
*************************** 1. row ***************************
Variable_name: optimizer_switch
Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_in
dex_extensions=on,condition_fanout_filter=on,derived_merge=on
1 row in set (0.00 sec)

-- 其中MRR默認是打開的 mrr=on，不建議關閉
mysql> explain select * from employees where hire_date >= '1990-01-01';
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | employees | NULL | ALL | idx_date | NULL | NULL | NULL | 298124 | 50.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


-- 雖然mrr=on打開了，但是沒有使用MRR
mysql> set optimizer_switch='mrr_cost_based=off'; -- 將該值off，不讓MySQL對MRR進行成本計算（強制使用MRR）
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from employees where hire_date >= '1990-01-01';
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
| 1 | SIMPLE | employees | NULL | range | idx_date | idx_date | 3 | NULL | 149062 | 100.00 | Using index condition; Using MRR |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
-- 使用了MRR

三. 求B+樹的高度
每個頁的 Page Header 中都包含一個 PAGE_LEVEL 的信息，表示該頁所在B+樹中的層數，葉子節點的PAGE_LEVEL為 0 。
所以樹的高度就是 root頁的 PAGE_LEVEL + 1

3.3. PAGE_LEVEL
從一個頁的第64字節開始讀取，然后再讀取 2個字節，就能得到 PAGE_LEVEL 的值

3.4. 獲取root頁
mysql> use information_schema;Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A Database changed
 mysql> desc INNODB_SYS_INDEXES; +-----------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------------+------+-----+---------+-------+ | INDEX_ID | bigint(21) unsigned | NO | | 0 |  | | NAME | varchar(193) | NO | | |  | | TABLE_ID | bigint(21) unsigned | NO | | 0 | | | TYPE | int(11)　　　　　　　 | NO | | 0 | | | N_FIELDS 　　　　| int(11) | NO | | 0  | | | PAGE_NO 　　　　　| int(11) | NO | | 0  | | | SPACE 　　　　　　| int(11) | NO | | 0 |  | | MERGE_THRESHOLD | int(11) | NO | | 0 |  | +-----------------+---------------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
 mysql> select * from INNODB_SYS_INDEXES where space<>0 limit 1\G *************************** 1. row *************************** INDEX_ID: 18 NAME: PRIMARY TABLE_ID: 16 TYPE: 3 N_FIELDS: 1 PAGE_NO: 3 -- 根據官方文檔，該字段就是B+樹root頁的PAGE_NO SPACE: 5 MERGE_THRESHOLD: 50 1 row in set (0.01 sec)
 -- 沒有table的name，只有ID
 mysql> select b.name , a.name, index_id, type, a.space, a.PAGE_NO -> from INNODB_SYS_INDEXES as a, INNODB_SYS_TABLES as b -> where a.table_id = b.table_id -> and a.space <> 0 and b.name like "dbt3/%"; -- 做一次關聯 +----------------------+-----------------------+----------+------+-------+---------+ | name 　　　　　　　　　　| name 　　　　　　　　　　| index_id | type | space | PAGE_NO | +----------------------+-----------------------+----------+------+-------+---------+ | dbt3/customer | PRIMARY 　　　　　　　　　| 64 | 3 | 43 | 3  | | dbt3/customer | i_c_nationkey | 65 | 0 | 43 | 4 | | dbt3/lineitem 　　 | PRIMARY | 66 | 3 　　| 44　　| 3 　　　 | | dbt3/lineitem　　　　 | i_l_shipdate | 67 | 0 | 44 | 4 | | dbt3/lineitem　　　　 | i_l_suppkey_partkey | 68 | 0 | 44 | 5 | | dbt3/lineitem　　　　 | i_l_partkey | 69 | 0 | 44 | 6 | | dbt3/lineitem 　　　　 | i_l_suppkey | 70 | 0 | 44  | 7  | | dbt3/lineitem　　　　 | i_l_receiptdate | 71 | 0 | 44 | 8 | | dbt3/lineitem　　　　 | i_l_orderkey | 72  | 0  | 44 | 9 | | dbt3/lineitem　　　　 | i_l_orderkey_quantity | 73  | 0 　　| 44 　　| 10 　　| | dbt3/lineitem　　　　 | i_l_commitdate | 74 | 0　　 | 44 　　| 11 　　| | dbt3/nationq　　　　　 | PRIMARY  | 75 | 3 　　| 45 　　| 3 　　 | | dbt3/nation　　　　 | i_n_regionkey | 76 | 0　　 | 45　　 | 4　　  | | dbt3/orders　　　　 | PRIMARY | 77 | 3 　　| 46 　　| 3  | | dbt3/orders　　　　　　 | i_o_orderdate | 78  | 0　　 | 46 　　| 4 | | dbt3/orders | i_o_custkey | 79  | 0 　　| 46 　　| 5 | | dbt3/part　　　　　　 | PRIMARY | 80 | 3 | 47 | 3 | | dbt3/partsupp | PRIMARY | 81 | 3 | 48 　　 | 3 | | dbt3/partsupp | i_ps_partkey | 82 | 0  | 48 | 4 | | dbt3/partsupp | i_ps_suppkey | 83  | 0  | 48 | 5  | | dbt3/region | PRIMARY | 84  | 3  | 49 | 3  | | dbt3/supplier | PRIMARY  | 85  | 3  | 50 | 3  | | dbt3/supplier | i_s_nationkey | 86  | 0 | 50 | 4  | | dbt3/time_statistics | GEN_CLUST_INDEX | 87  | 1  | 51 | 3  | +----------------------+-----------------------+----------+------+-------+---------+ 24 rows in set (0.00 sec)
 -- 聚集索引頁的root頁的PAGE_NO一般就是3

3.5. 讀取PAGE_LEVEL

mysql> select count(*) from dbt3.lineitem;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+
1 row in set (5.68 sec)

shell> hexdump -h
hexdump: invalid option -- 'h'
hexdump: [-bcCdovx] [-e fmt] [-f fmt_file] [-n length] [-s skip] [file ...]

shell> hexdump -s 24640 -n 2 -Cv lineitem.ibd
00006040 00 02 |..|
00006042


1. 24640 = 8192 * 3 + 64
　　◦ 其中 8192 是我的頁大小
　　◦ root頁 的 PAGE_NO 為 3 ，表示是 第4個頁 ，則需要 跳過 前面 3個頁 ，才能 定位到root頁 ，所以要 *3
　　◦ 然后加上 64 個字節的偏移量，即可定位到 PAGE_LEVEL
2. -n 2 表示讀取的字節數，這里讀取 2個字節 ，即可以讀到 PAGE_LEVEL

根據上述 hexdump 的結果，root頁中的 PAGE_LEVEL 為2，表示該索引的高度為 3 （從0開始計算）

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mongodb主從模式SECONDARY提升為PRIMARY Phoenix二級索引(Secondary Indexing)的使用 MongoDB 副本集把SECONDARY提升為PRIMARY $(obj).index(this) 與 $(this).index()的區別 pages/index/index elasticsearch index 之 create index（二） python abs函數（23） python map函數(23) 23、Flask實戰第23天：Flask-Restful 【C語言】23-typedef