MySQL参坑--参数innodb_thread_concurrency问题


基础信息

MySQL版本:5.7.27-30-log Percona Server (GPL), wsrep_31.39

MySQL参数:

innodb_purge_threads        = 4
innodb_read_io_threads      = 8
innodb_write_io_threads     = 8
innodb_thread_concurrency   = 24
innodb_thread_sleep_delay   = 148358
innodb_concurrency_tickets  = 5000
gtid_mode                   = ON
thread_handling             = one-thread-per-connection

涉及SQL:

SELECT
`iast_agent_method_pool`.`id`, `iast_agent_method_pool`.`agent_id`, 
`iast_agent_method_pool`.`url`, `iast_agent_method_pool`.`uri`, 
`iast_agent_method_pool`.`http_method`, `iast_agent_method_pool`.`http_scheme`,
`iast_agent_method_pool`.`http_protocol`, `iast_agent_method_pool`.`req_header`,
`iast_agent_method_pool`.`req_params`, `iast_agent_method_pool`.`req_data`, 
`iast_agent_method_pool`.`res_header`, `iast_agent_method_pool`.`res_body`, 
`iast_agent_method_pool`.`req_header_for_search`, `iast_agent_method_pool`.`context_path`, 
`iast_agent_method_pool`.`method_pool`, `iast_agent_method_pool`.`pool_sign`,
`iast_agent_method_pool`.`clent_ip`, `iast_agent_method_pool`.`create_time`,
 `iast_agent_method_pool`.`update_time`, `iast_agent_method_pool`.`uri_sha1` 
FROM `iast_agent_method_pool` FORCE INDEX(PRIMARY)
WHERE (
	`iast_agent_method_pool`.`agent_id` IN (
		SELECT U0.`id` 
		FROM `iast_agent` U0 
		WHERE ((U0.`project_name` IS NULL 
		OR U0.`bind_project_id` = 806) 
		AND U0.`online` = 1 
		AND U0.`project_version_id` = 806 
		AND U0.`user_id` = 1)
	) 
	AND `iast_agent_method_pool`.`pool_sign` = 'd94e7aaee21470b45ceef899dc35c2f46e4dbd24'
) 
ORDER BY `iast_agent_method_pool`.`id` ASC LIMIT 1;

涉及表结构:

CREATE TABLE `iast_agent_method_pool` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `agent_id` int(11) DEFAULT NULL COMMENT 'Agent',
  `url` varchar(2000) DEFAULT NULL COMMENT 'URL',
  `uri` varchar(2000) DEFAULT NULL COMMENT 'URI',
  `http_method` varchar(10) DEFAULT NULL COMMENT 'HTTP请求方法',
  `http_scheme` varchar(20) DEFAULT NULL COMMENT '协议',
  `http_protocol` varchar(255) DEFAULT NULL COMMENT 'HTTP协议',
  `req_header` varchar(2000) DEFAULT NULL COMMENT '请求头',
  `req_params` varchar(2000) DEFAULT NULL COMMENT '请求参数',
  `req_data` varchar(4000) DEFAULT NULL COMMENT '请求体',
  `res_header` varchar(1000) DEFAULT NULL COMMENT '响应头',
  `res_body` varchar(1000) DEFAULT NULL COMMENT '响应体',
  `context_path` varchar(255) DEFAULT NULL COMMENT '应用上下文',
  `method_pool` json DEFAULT NULL COMMENT '方法池',
  `clent_ip` varchar(255) DEFAULT NULL COMMENT '客户端IP',
  `create_time` int(11) DEFAULT NULL COMMENT '创建时间',
  `update_time` int(11) DEFAULT NULL COMMENT '修改时间',
  `pool_sign` varchar(40) DEFAULT NULL COMMENT '方法池签名',
  `req_header_for_search` text COMMENT '用于搜索功能的请求头,请求方法、uri、参数、协议、请求头的拼接',
  `uri_sha1` varchar(40) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `pool_sign` (`pool_sign`,`agent_id`) USING BTREE,
  KEY `agent_id` (`agent_id`) USING BTREE,
  KEY `update_time` (`update_time`) USING BTREE,
  KEY `iast_agent__uri_sha_aac6d7_idx` (`uri_sha1`,`http_method`,`agent_id`),
  KEY `iast_agent_method_pool_uri_sha1_a64012e4` (`uri_sha1`),
  KEY `iast_agent_method_pool_agent_id_30df78eb` (`agent_id`)
) ENGINE=InnoDB AUTO_INCREMENT=106703 DEFAULT CHARSET=utf8mb4;

 CREATE TABLE `iast_agent` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `token` varchar(255) DEFAULT NULL COMMENT 'agent唯一标识',
  `version` varchar(255) DEFAULT NULL COMMENT '版本',
  `latest_time` int(11) DEFAULT NULL COMMENT '更新时间',
  `user_id` int(11) NOT NULL COMMENT '用户ID',
  `server_id` int(11) DEFAULT NULL COMMENT '服务器ID',
  `is_running` int(1) DEFAULT NULL COMMENT 'agent运行状态',
  `control` int(1) DEFAULT NULL COMMENT 'agent控制位,1-安装、2-卸载、0-无控制',
  `is_control` int(1) DEFAULT NULL COMMENT '是否正处于控制中,0-否,1-是',
  `bind_project_id` int(11) DEFAULT '0' COMMENT '捆绑项目ID,存在则为已捆绑',
  `project_name` varchar(255) DEFAULT NULL COMMENT '项目名称,用于先启动agent后创建项目',
  `project_version_id` int(11) DEFAULT '0' COMMENT '项目版本ID',
  `online` tinyint(4) DEFAULT '0' COMMENT '1在线运行,0未运行,同token,仅一个online',
  `is_core_running` int(11) DEFAULT NULL COMMENT '核心引擎是否启动',
  `language` varchar(10) DEFAULT 'JAVA' COMMENT '语言,JAVA、PYTHON',
  `startup_time` int(11) NOT NULL DEFAULT '0' COMMENT '启动时间',
  `alias` varchar(255) NOT NULL DEFAULT '' COMMENT 'agent别名',
  `register_time` int(11) NOT NULL DEFAULT '0' COMMENT '注册时间',
  `is_audit` int(11) NOT NULL DEFAULT '0' COMMENT '0:未审核;1:已审核',
  PRIMARY KEY (`id`) USING BTREE,
  KEY `user_id` (`user_id`) USING BTREE,
  KEY `server_id` (`server_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=29519 DEFAULT CHARSET=utf8mb4;

存储引擎信息

通过SHOW ENGINE INNODB STATUS输出可发现,大量进程处于"not started sleeping before entering InnoDB"

---TRANSACTION 421116931020976, not started sleeping before entering InnoDB
mysql tables in use 2, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421116931082032, not started sleeping before entering InnoDB
mysql tables in use 2, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421116931076272, not started sleeping before entering InnoDB
mysql tables in use 2, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421116931034800, not started sleeping before entering InnoDB
mysql tables in use 2, locked 0

只有少量进程(innodb_thread_concurrency)进入的INNODB存储引擎层进行执行:

--------------
ROW OPERATIONS
--------------
24 queries inside InnoDB, 1182 queries in queue
25 read views open inside InnoDB
0 RW transactions active inside InnoDB

实例堆栈信息

从堆栈信息可以看到,上千个线程在使用srv_conc_enter_innodb方法:

Thread 539 (Thread 0x7f9036907700 (LWP 378610)):
#0  0x00007f9506517e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000010af8d0 in os_thread_sleep(unsigned long) ()
#2  0x000000000114a0b8 in srv_conc_enter_innodb(row_prebuilt_t*) ()
#3  0x000000000102306c in ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function) ()
#4  0x000000000084e918 in handler::index_read_idx_map(unsigned char*, unsigned int, unsigned char const*, unsigned long, ha_rkey_function) ()
#5  0x0000000000851e5a in handler::ha_index_read_idx_map(unsigned char*, unsigned int, unsigned char const*, unsigned long, ha_rkey_function) ()
#6  0x0000000000d0e993 in read_const(TABLE*, st_table_ref*) ()
#7  0x0000000000d0ead0 in join_read_const_table(JOIN_TAB*, st_position*) ()
#8  0x0000000000d23f38 in JOIN::extract_func_dependent_tables() ()
#9  0x0000000000d2f00e in JOIN::make_join_plan() ()
#10 0x0000000000d332a6 in JOIN::optimize() ()
#11 0x0000000000d7e322 in st_select_lex::optimize(THD*) ()
#12 0x0000000000d7e5cf in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) ()
#13 0x0000000000d39e33 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
#14 0x0000000000d40f38 in mysql_execute_command(THD*, bool) ()
#15 0x0000000000d419c5 in mysql_parse(THD*, Parser_state*, bool) ()
#16 0x0000000000d41b10 in wsrep_mysql_parse(THD*, char const*, unsigned int, Parser_state*, bool) [clone .clone.0] ()
#17 0x0000000000d43d2f in dispatch_command(THD*, COM_DATA const*, enum_server_command) ()
#18 0x0000000000d44efb in do_command(THD*) ()
#19 0x0000000000e24ce4 in handle_connection ()
#20 0x0000000000f9ebe4 in pfs_spawn_thread ()
#21 0x00007f9506510dd5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f9504db7ead in clone () from /lib64/libc.so.6

有24个线程在使用btr_copy_blob_prefix方法:

Thread 1081 (Thread 0x7f8fd756d700 (LWP 41431)):
#0  0x00007f9506514965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000010ae7fb in os_event::wait_low(long) ()
#2  0x000000000115d6e9 in sync_array_wait_event(sync_array_t*, sync_cell_t*&) ()
#3  0x000000000115fb98 in rw_lock_s_lock_spin(rw_lock_t*, unsigned long, char const*, unsigned long) ()
#4  0x00000000011ea9d3 in buf_page_get_gen(page_id_t const&, page_size_t const&, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, bool, dberr_t*) ()
#5  0x00000000011c63fe in btr_copy_blob_prefix(unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) ()
#6  0x00000000011c6a9a in btr_copy_externally_stored_field(unsigned long*, unsigned char const*, page_size_t const&, unsigned long, mem_block_info_t*) ()
#7  0x0000000001124900 in row_sel_store_mysql_field_func(unsigned char*, row_prebuilt_t*, unsigned char const*, unsigned long const*, unsigned long, mysql_row_templ_t const*, unsigned long) ()
#8  0x0000000001124d9b in row_sel_store_mysql_rec(unsigned char*, row_prebuilt_t*, unsigned char const*, dtuple_t const*, unsigned long, dict_index_t const*, unsigned long const*, bool) ()
#9  0x000000000112b18f in row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) ()
#10 0x0000000001022889 in ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int) ()
#11 0x0000000000851bbb in handler::ha_index_next(unsigned char*) ()
#12 0x0000000000d0a6fc in join_read_next(READ_RECORD*) ()
#13 0x0000000000d0f0c6 in sub_select(JOIN*, QEP_TAB*, bool) ()
#14 0x0000000000d0e3aa in JOIN::exec() ()
#15 0x0000000000d7e600 in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) ()
#16 0x0000000000d39e33 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
#17 0x0000000000d40f38 in mysql_execute_command(THD*, bool) ()
#18 0x0000000000d419c5 in mysql_parse(THD*, Parser_state*, bool) ()
#19 0x0000000000d41b10 in wsrep_mysql_parse(THD*, char const*, unsigned int, Parser_state*, bool) [clone .clone.0] ()
#20 0x0000000000d43d2f in dispatch_command(THD*, COM_DATA const*, enum_server_command) ()
#21 0x0000000000d44efb in do_command(THD*) ()
#22 0x0000000000e24ce4 in handle_connection ()
#23 0x0000000000f9ebe4 in pfs_spawn_thread ()
#24 0x00007f9506510dd5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f9504db7ead in clone () from /lib64/libc.so.6

实例监控信息

在异常阶段,数据库实例监控信息如下:

1、Thread Running爆涨至1200,同时QPS下滑接近为0:

2、大量数根读入Buffer Pool中:

3、Buffer Poll使用内存快速上涨:

4、故障时产生大量的s_os_wait和s_spin_waits等待

5、数据库服务器CPU使用率超过30%

6、数据库服务器磁盘使用率达到100%

7、数据库服务器磁盘读操作吞吐超过500MB/s,磁盘读操作IO超过35k/s,平均每次IO读16KB

异常原因分析

由于表iast_agent_method_pool上存在唯一索引pool_sign(pool_sign,agent_id),子查询返回的agent_id 数量有限,虽然表上存在诸多大字段和JSON,甚至部分记录总长度超过28MB,但异常期间出现的SQL正常情况下均能在50ms内返回。在异常期间,大量查询甚至按主键查询的SQL执行时间超过10分钟,结合各种监控数据分析,怀疑该SQL执行计划出现异常,由于查询中包含ORDER BY iast_agent_method_pool.id ASC LIMIT 1 语句,查询优化器可能选择使用主键索引扫描的方式来执行,导致查询需要扫描大量数据才能找到满足WHERE条件的第一条记录,由于所需数据未缓存于buffer pool中,需要从磁盘读取并加载到buffer pool,因此产生大量的磁盘读操作。

由于参数innodb_thread_concurrency的限制,每次只有24个请求能进入到INNODB存储引擎层进行处理,这些执行计划异常导致查询耗时严重的SQL长期在INNODB存储引擎层占用,其他SQL无法获得进入INNODB存储引擎层执行的机会。

问题优化建议

1、尽量避免使用TEXT或BLOB还是JSON等大字段列。

2、尽量避免使用ORDER BY语句,避免在不需要排序的场景中使用ORDER BY语句。

3、尽量避免使用子查询,查询优化器很难准确评估子查询的返回记录,很容易引发执行计划异常。

4、合理设置参数innodb_thread_concurrency的值,推荐使用默认值0(即不限制并发),当使用>0的值时,需考虑超慢查询对集群的影响。


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM