那些年我们踩过的一些坑之 ClickHouse

本文转载自查看原文 2022-03-30 17:10 838 SQL

现在用不到没关系，先收藏，防止后面找不到哦。

#大数据##大数据学习#

1、group by 使用内存限制

错误信息如下：

Code: 241.DB::Exception: Memory limit (for query) exceeded:would use 9.37 GiB (attempt to allocate chunk of 134217760 bytes), maximum: 9.31 GiB.定位到该查询中的 SQL 中使用了 group by ，默认的配置中没有对 group by 做内存限制。

解决方案：

执行 SQL 之前，在客户端进行如下设置

set max_memory_usage=32000000000;set max_bytes_before_external_group_by=16000000000;-- 下面是 sql 内容在进行group by的时候，内存使用量已经达到了max_bytes_before_external_group_by的时候就进行写磁盘(基于磁盘的group by相对于基于磁盘的order by性能损耗要好很多的)，一般max_bytes_before_external_group_by设置为max_memory_usage / 2，原因是在clickhouse中聚合分两个阶段：查询并且建立中间数据；合并中间数据写磁盘在第一个阶段，如果无须写磁盘，clickhouse在第一个和第二个阶段需要使用相同的内存。https://clickhouse.tech/docs/en/sql-reference/statements/select/group-by/#select-group-by-in-external-memory

2、写入数据失败

1.错误信息如下：

Too many parts (300). Merges are processing significantly slower than inserts...使用 Flink 实时消费 Kafka 的数据，Sink 到 ClickHouse ，策略是一条一条插入，任务上线一段时间之后，ClickHouse 扛不住数据插入的压力了(

是因为MergeTree的merge的速度跟不上 data part 生成的速度。)，就报错了上述的报错信息。

解决方案：

优化 FLink ClickHouse Sink逻辑，根据时间和数据量做触发，满足其一才会执行插入操作。

2.错误信息如下

Code: 252, e.displayText() = DB::Exception: Too many partitionsfor single INSERT block (more than 100). 大概意思就是单次插入的数据分区太多了，超过默认配置的 100 个了。

解决方案：

1.合理设置分区字段 2.修改这个 max_partitions_per_insert_block 参数，调大这个值。

3、删除数据失败

错误信息如下：

Code: 359,e.displayText()=DB::Exception: Table or Partition in xxx was not dropped.Reason:1. Size (158.40 GB) is greater than max_[table/partition]_size_to_drop (50.00 GB)2. File '/data/clickhouse/clickhouse-server/flags/force_drop_table' intended to force DROP doesn't exist从报错信息中的原因 1 可以看到，删除的数据实际大小已经超过了配置的大小。原因 2 说明是可以跳过配置检查，进行强制删除的，但是没找到这个文件 /data/clickhouse/clickhouse-server/flags/force_drop_table，所以不能跳过检查，也就是不能强制删除。

根据错误提示2 ，在所在的节点执行：

sudo touch '/data/clickhouse/clickhouse-server/flags/force_drop_table' && sudo chmod 666 '/data/clickhouse/clickhouse-server/flags/force_drop_table' 然后再次执行删除操作就可以了。

需要注意的是，这个标识文件有效期只有一次，执行删除完毕之后，这个文件就会消失。

4、Join 误用

关联两张表，对于未关联的行，使用该字段的默认值填充，而不是使用 null 填充。

在 system.settings 表中可以找到参数 join_use_nulls

这和我们在 Mysql 或者 Hive 等使用习惯上不一致，如果想要改成一样的，需要修改这个参数 join_use_nulls 为 1。

准备数据

-- 建表 1create table st_center.test_join_1( id String, name String) engine = MergeTree() order by tuple() SETTINGS index_granularity = 8192;-- 建表 2create table st_center.test_join_2( id String, name String) engine = MergeTree() order by tuple() SETTINGS index_granularity = 8192;-- 插入测试数据insert into test_join_1(id, name) values ('1','大数据学习指南');insert into test_join_1(id, name) values ('2','大数据进阶之路');insert into test_join_2(id, name) values ('1','大数据学习指南');数据准备好了，下面我们测试一下。

select * from st_center.test_join_1 as t1all left join st_center.test_join_2 as t2on t1.id = t2.id关联结果如下，未连接的行使用默认值填充的。String类型就填充空字符串，数值类型就填充 0

修改参数，在 SQL 最后加入 settings join_use_nulls = 1

select * from st_center.test_join_1 as t1all left join st_center.test_join_2 as t2on t1.id = t2.idsettings join_use_nulls = 1关联结果如下，和我们在 mysql 等中的使用习惯一样了。

如果对你有帮助，欢迎点赞收藏转发，关注我，不迷路，带你学习大数据！

一、异常

1）DB::Exception: Nested type Array(String) cannot be inside Nullable type (version 20.4.6.53 (official build))
原因：字段类型是Nullable(String)，在使用一些字符串函数如splitByString，他们对Nullable类型是不支持的，需要转成String。
解决：使用cast强转一下字段类型就行：

select splitByString(',',cast(col as String)) col from test

2）DB::Exception: Cannot convert NULL value to non-Nullable type: while converting source column second_channel to destination column second_channel (version 20.4.6.53 (official build))
原因：字段类型是非空类型，insert null值到非空字段second_channel会报错。
解决：可以将非空类型改成Nullable(String)，但是要注意Nullable字段不允许用于order by。

3）DB::Exception: Memory limit (total) exceeded: would use 113.20 GiB (attempt to allocate chunk of 134200512 bytes), maximum: 113.14 GiB: While executing CreatingSetsTransform. (version 20.4.6.53 (official build))
原因：单次查询出来的数据量，大于单台机器的剩余内存。
解决：可以将查询范围缩小，比如添加查询条件对查询结果取余，也可以清理或者添加物理机内存。

5）DB::Exception: Table columns structure in ZooKeeper is different from local table structure (version 20.12.3.3 (official build))
原因：Replicated（副本）表删表重建，但zk中表结构删除操作是异步的，默认为五分钟。
解决：重启该节点的ck，或者选择等待几分钟内。

6）Too many parts (300). Merges are processing significantly slower than inserts...

原因：使用 Flink 实时消费 Kafka 的数据，Sink 到 ClickHouse ，策略是一条一条插入，任务上线一段时间之后，ClickHouse 扛不住数据插入的压力了(是因为MergeTree的merge的速度跟不上 data part 生成的速度)。

解决：优化 FLink ClickHouse Sink逻辑，根据时间和数据量做触发，满足其一才会执行插入操作。

7）Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5915c8a5

原因：磁盘空间不足。

8）spark sql unsupported type array

原因：因为数据源的数组类型和spark sql的数组类型不一致，可以将数组转为string

解决：

select toString(field) from tablet

二、错误码

1）Code: 48

Received exception from server (version 21.1.2.15):
Code: 48. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Mutations are not supported by storage Distributed.

原因：分布式表不能进行更新，ALTER TABLE UPDATE/DELETE不支持分布式DDL

解决：需要在分布式环境中手动在每个节点上local的进行更新/删除数据。

2）Code:1002
2021-02-22 07:31:31,656 ERROR [main] execute clickhouse Query Error
ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002, host: xxxx, port: 8123; xxxx:8123 failed to respond
原因： JDBC client端和server端对 http connection , header `keep-alive` 不一致。

解决：升级clickhouse-jdbc 驱动jar或者pom引入依赖版本到0.2.6 。

3）Code: 159，read timeout

原因：查询超时导致报错。
解决：执行某些SQL很耗时导致最后报错读超时，这是因为clickhouse执行单次SQL的默认最大等待时间是30s，如果有比较耗时的SQL，可以通过将JdbcURL的socket_timeout参数值设置的大一点来解决这个问题（注意这个参数的时间单位是毫秒，默认是30000）。

4）Code 62，Max query size exceeded

原因：Select语句中使用in方式查询报错。

解决：
这其实是因为查询语句特别的大造成的，而默认的max_query_size最大是256 KiB。打开/etc/clickhouse-server/users.xml（只配置了一些常用的用户）。max_query_size这种配置，就需要在profiles部分中配置修改。

注意这里的单位是bytes(字节),我这里设置了102410241024=1,073,741,824,就解决问题了。如果是sql创建的用户，需要通过sql修改配额，修改方式参考https://www.cnblogs.com/MrYang-11-GetKnow/p/15896355.html。

5）Code: 168，AST is too big，Maximum: 50000

原因：AST太大了。

解决：
在users.xml配置文件中添加相应配置，或者通过sql修改，具体步骤参照修改权限文档即可。

<max_ast_elements>10000000</max_ast_elements> <max_expanded_ast_elements>10000000</max_expanded_ast_elements>

6）Code: 221，db::exception: no interserver io endpoint named…
复制副本数据时报错导致无法同步数据，直接在err.log日志文件看到的报错是：auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Poco::Exception. Code: 1000, e.code() = 111, Connection refused

原因：没有指定interserver_http_host参数，clickhouse配置文件中关于对这个参数的描述我翻译过来大概意思就是这个参数是其他副本用于请求此服务器的主机名；如果未指定，则与“hostname-f”命令类似确定，此设置可用于将复制切换到另一个网络接口（服务器可以通过多个地址连接到多个网络）。不指定该参数的话，服务器就会试图连接到自己，而对应的端口号未提供服务时就会报Connection refused这样的错误了。

7）Code: 253， Replica /clickhouse/tables/XXX/XXX/replicas/dba07 already exists
原因：建立副本表（ReplicatedMergeTree）的时候，如果数据库的引擎是Atomic，则在删除表之后马上重建会报这个错。删除的时候clickhouse是通过异步线程清除掉zookeeper上的数据的，立马新建的话可能异步线程还没开始执行，如果不想做其他操作的话，等一会再执行创建语句就不会报这个错了，也可以通过指定如下参数设置清除zookeeper上数据操作的延迟时间：

<!-- 修改参数 database_atomic_delay_before_drop_table_sec = 0 ，解决删除副本表立马重建会报错的问题 --> <database_atomic_delay_before_drop_table_sec>0</database_atomic_delay_before_drop_table_sec>

8）Code: 252
Code: 252, e.displayText() = DB::Exception: Too many partitions ,for single INSERT block (more than 100).
原因：单次插入的数据分区太多了，超过默认配置的 100 个了。
解决：
1.合理设置分区字段。
2.修改这个 max_partitions_per_insert_block 参数，调大这个值。
3.避免同一批次写入包含太多分区的数据。

9）Code: 359
Code: 359,e.displayText()=DB::Exception: Table or Partition in xxx was not dropped.
Reason:
1. Size (158.40 GB) is greater than max_[table/partition]_size_to_drop (50.00 GB)
2. File '/data/clickhouse/clickhouse-server/flags/force_drop_table' intended to force DROP doesn't exist
原因：
1）可以看到，删除的数据实际大小已经超过了配置的大小。
2）说明是可以跳过配置检查，进行强制删除的，但是没找到这个文件 /data/clickhouse/clickhouse-server/flags/force_drop_table，所以不能跳过检查，也就是不能强制删除。
解决：
根据错误提示2 ，在所在的节点执行：

sudo touch '/data/clickhouse/clickhouse-server/flags/force_drop_table' && sudo chmod 666 '/data/clickhouse/clickhouse-server/flags/force_drop_table'

然后再次执行删除操作就可以了。需要注意的是，这个标识文件有效期只有一次，执行删除完毕之后，这个文件就会消失。

10）Code: 117

Code: 117, e.displayText() = DB::Exception: Unexpected NULL value of not Nullable type String (version 20.8.3.18)
原因：因为null值导致的,hive底层存储null值是用\N表示,而clickhouse处理null值的方式不一致,因为需要在建表时特殊说明。

解决：参照处理null值文档

11）Code: 62

ERROR ApplicationMaster: User class threw exception: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: 127.0.0.1, port: 8123; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 1432 (end of query): . Expected one of: ENGINE, storage definition (version 20.8.3.18)ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: 127.0.0.1, port: 8123; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 1432 (end of query): . Expected one of: ENGINE, storage definition (version 20.8.3.18)
原因：表不存在

解决：创建相关表

12）Code: 241

Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 4223048 bytes), maximum: 9.31 GiB: While executing MergeTreeThread: While executing CreatingSetsTransform.

原因：内存使用超出限制,默认的最大限制是10G。

解决：sql设置单次查询内存或者设置用户配额（sql设置或者users.xml设置调整max_memory_usage = 20000000000000）

13）Code: 202

ClickHouse exception, code: 202, host: xxxxx, port: 8123; Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100
原因：最大并发为100。

解决：修改config.xml文件:<max_concurrent_queries>100</max_concurrent_queries>。

14）Code: 252

ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 252, host: xxxx, port: 8123; Code: 252, e.displayText() = DB::Exception: Too many parts (308). Merges are processing significantly slower than inserts. (version 20.8.3.18)
原因：插入的速度太快了,clickhouse合并的速度太慢。

解决：调小并行度,减少批次处理的条数。

15）Code: 159

Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000002 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently active), they are going to execute the query in background.
原因：ck端口是否写错
解决：检查metrika.xml文件中ck端口

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Redis上踩过的一些坑 mysql升级的一些踩坑点微信小程序踩过的一些坑 Windows 下安装Docker踩过的一些坑 ClickHouse 的一些优化参数那些年vue踩过的坑那些年，在nodejs上踩过的坑（一）记录一次ClickHouse的踩坑经历 luckyframe的一些坑 grafana的一些坑