個人學習筆記,謝絕轉載!!!
原文:https://www.cnblogs.com/wshenjin/p/13143929.html
zookeeper+ReplicatedMergeTree(復制表)+Distributed(分布式表)
節點IP
- 192.168.31.101
- 192.168.31.102
因為手頭沒有足夠多的機器,所以只能用兩台機器各起兩個實例組成兩分片兩副本的集群。
基本規划
副本01 | 副本02 | |
---|---|---|
分片01 | 192.168.31.101:9100 | 192.168.31.102:9200 |
分片02 | 192.168.31.102:9100 | 192.168.31.101:9200 |
ZK集群部署
忽略
配置:
各個實例config0*.xml的差異化配置:
<log>/var/log/clickhouse-server/clickhouse-server0*.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server0*.err.log</errorlog>
<http_port>8*23</http_port>
<tcp_port>9*00</tcp_port>
<mysql_port>9*04</mysql_port>
<interserver_http_port>9*09</interserver_http_port>
<path>/data/database/clickhouse0*/</path>
<tmp_path>/data/database/clickhouse0*/tmp/</tmp_path>
<user_files_path>/data/database/clickhouse0*/user_files/</user_files_path>
<format_schema_path>/data/database/clickhouse0*/format_schemas/</format_schema_path>
<include_from>/etc/clickhouse-server/metrika0*.xml</include_from>
各個實例metrika0*.xml中相同的配置:
<!--集群相關配置-->
<clickhouse_remote_servers>
<!--自定義集群名稱 ckcluster_2shards_2replicas-->
<ckcluster_2shards_2replicas>
<!--分片1-->
<shard>
<internal_replication>true</internal_replication>
<!--副本1-->
<replica>
<host>192.168.31.101</host>
<port>9100</port>
</replica>
<!--副本2-->
<replica>
<host>192.168.31.102</host>
<port>9200</port>
</replica>
</shard>
<!--分片2-->
<shard>
<internal_replication>true</internal_replication>
<!--副本1-->
<replica>
<host>192.168.31.102</host>
<port>9100</port>
</replica>
<!--副本2-->
<replica>
<host>192.168.31.101</host>
<port>9200</port>
</replica>
</shard>
</ckcluster_2shards_2replicas>
</clickhouse_remote_servers>
<!--zookeeper相關配置-->
<zookeeper-servers>
<node index="1">
<host>192.168.31.101</host>
<port>2181</port>
</node>
<node index="2">
<host>192.168.31.102</host>
<port>2181</port>
</node>
</zookeeper-servers>
<!--壓縮算法-->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
各個節點metrika0*.xml中復制標識的配置:
# 192.168.31.101 9100 metrika01.xml
<macros>
<shard>01</shard>
<replica>ckcluster-01-01</replica>
</macros>
# 192.168.31.101 9200 metrika02.xml
<macros>
<shard>02</shard>
<replica>ckcluster-02-02</replica>
</macros>
# 192.168.31.102 9100 metrika01.xml
<macros>
<shard>02</shard>
<replica>ckcluster-02-01</replica>
</macros>
# 192.168.31.102 9200 metrika02.xml
<macros>
<shard>01</shard>
<replica>ckcluster-01-02</replica>
</macros>
復制標識, 也稱為宏配置,這里唯一標識一個副本名稱,每個實例都要配置並且都是唯一的。
- shard 表示分片編號
- replica是副本標識
啟動實例
192.168.31.101:
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon \
--pid-file=/var/run/clickhouse-server/clickhouse-server01.pid \
--config-file=/etc/clickhouse-server/config01.xml"
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon \
--pid-file=/var/run/clickhouse-server/clickhouse-server02.pid \
--config-file=/etc/clickhouse-server/config02.xml"
192.168.31.102:
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon \
--pid-file=/var/run/clickhouse-server/clickhouse-server01.pid \
--config-file=/etc/clickhouse-server/config01.xml"
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon \
--pid-file=/var/run/clickhouse-server/clickhouse-server02.pid \
--config-file=/etc/clickhouse-server/config02.xml"
各個節點上查看狀態:
:) SELECT * FROM system.clusters;
┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ ckcluster_2shards_2replicas │ 1 │ 1 │ 1 │ 192.168.31.101 │ 192.168.31.101 │ 9100 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 1 │ 1 │ 2 │ 192.168.31.102 │ 192.168.31.102 │ 9200 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 2 │ 1 │ 1 │ 192.168.31.102 │ 192.168.31.102 │ 9100 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 2 │ 1 │ 2 │ 192.168.31.101 │ 192.168.31.101 │ 9200 │ 1 │ default │ │ 0 │ 0 │
└─────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
建庫建表
在每個實例上建庫:
:) create database testdb ;
192.168.31.101 9100 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/person_local','ckcluster-01-01',BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local \
ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.101 9200 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/02/person_local','ckcluster-02-02',BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local \
ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.102 9100 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/02/person_local','ckcluster-02-01',BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local \
ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.102 9200 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/person_local','ckcluster-01-02',BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local \
ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
本地表創建的語法:
create table person_local(ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/${shard}/person_local','${replica}',BirthDate, (Name, BirthDate), 8192);
- /clickhouse/tables/${shard}/person_local 代表的是這張表在ZooKeeper上的路徑。即配置在相同shard里面的不同replica的機器需要配置相同的路徑,不同shard的路徑不同。shard對應該實例的metrika.xml配置。
- ${replica} 分片的名稱,需要每個實例都不同,對應該實例的metrika.xml配置
分布表語法:
:) create table person_all as person_local \
ENGINE = Distributed(${cluster_name}, ${db_name}, ${local_table_name}, rand());
- ${cluster_name} 集群名稱
- ${db_name} 庫名
- ${local_table_name} 本地表名
- rand() 是分布式算法
分布式表只是作為一個查詢引擎,本身不存儲任何數據,查詢時將sql發送到所有集群分片,然后進行進行處理和聚合后將結果返回給客戶端,因此clickhouse限制聚合結果大小不能大於分布式表節點的內存,當然這個一般條件下都不會超過。
分布式表可以所有實例都創建,也可以只在一部分實例創建,這個和業務代碼中查詢的示例一致,建議設置多個,當某個節點掛掉時可以查詢其他節點上的表。
分布式DDL的建表方式方式
本地表
:) create table people_local ON CLUSTER ckcluster_2shards_2replicas (ID Int8, Name String, BirthDate Date) \
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/people_local','{replica}') PARTITION BY toYYYYMMDD(BirthDate) ORDER BY (Name, BirthDate) SETTINGS index_granularity = 8192;
分布式表:
:) create table people_all ON CLUSTER ckcluster_2shards_2replicas (ID Int8, Name String, BirthDate Date) \
ENGINE = Distributed(ckcluster_2shards_2replicas,testdb,people_local, rand());
這種建表方式無需在所有的節點都建表,自動同步到所有節點,只需所有節點提前統一建好庫就行。
數據測試
導入1w的csv:
[root@ ~]# clickhouse-client --host 127.0.0.1 --port 9200 --database testdb --query="insert into person_all FORMAT CSV" < /tmp/a.csv
檢查各個實例的數據量:
192.168.31.101:9100 :) select count(*) from testdb.person_local ;
┌─count()─┐
│ 4932 │
└─────────┘
192.168.31.101:9200 :) select count(*) from testdb.person_local ;
┌─count()─┐
│ 5068 │
└─────────┘
192.168.31.102:9100 :) select count(*) from testdb.person_local ;
┌─count()─┐
│ 5068 │
└─────────┘
192.168.31.102:9200 :) select count(*) from testdb.person_local ;
┌─count()─┐
│ 4932 │
└─────────┘
可以看出,各個分配的兩副本的數量關系是正確的
down一個節點
這里將192.168.31.102 9001節點kill掉
[root@ ~]# ps -ef | grep click
clickho+ 3485 1 2 17:33 ? 00:01:25 /usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid --config-file=/etc/clickhouse-server/config02.xml
clickho+ 3547 1 2 17:34 ? 00:01:34 /usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml
root 12650 12503 0 18:42 pts/0 00:00:00 grep --col click
[root@ ~]# kill -SIGTERM 3547
繼續導入數據,進行讀寫測試,發現集群功能正常。
查看一下down掉的節點對應的副本數據:
192.168.31.101:9200 :) select * from person_local where BirthDate='2020-01-01' ;
┌─ID─┬─Name─────────────────────────────┬──BirthDate─┐
│ 2 │ 26ab0db90d72e28ad0ba1e22ee510510 │ 2020-01-01 │
│ 10 │ 31d30eea8d0968d6458e0ad0027c9f80 │ 2020-01-01 │
│ 4 │ 48a24b70a0b376535542b996af517398 │ 2020-01-01 │
│ 9 │ 7c5aba41f53293b712fd86d08ed5b36e │ 2020-01-01 │
│ 7 │ 84bc3da1b3e33a18e8d5e1bdd7a18d7a │ 2020-01-01 │
│ 6 │ 9ae0ea9e3c9c6e1b9b6252c8395efdc1 │ 2020-01-01 │
│ 1 │ b026324c6904b2a9cb4b88d6d61c81d1 │ 2020-01-01 │
└────┴──────────────────────────────────┴────────────┘
這時候,重啟192.168.31.102:9100節點,查看數據:
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml"
[root@ ~]# clickhouse-client --port 9100
192.168.31.102:9100 :) use testdb
192.168.31.102:9100 :) select * from person_local where BirthDate='2020-01-01' ;
┌─ID─┬─Name─────────────────────────────┬──BirthDate─┐
│ 2 │ 26ab0db90d72e28ad0ba1e22ee510510 │ 2020-01-01 │
│ 10 │ 31d30eea8d0968d6458e0ad0027c9f80 │ 2020-01-01 │
│ 4 │ 48a24b70a0b376535542b996af517398 │ 2020-01-01 │
│ 9 │ 7c5aba41f53293b712fd86d08ed5b36e │ 2020-01-01 │
│ 7 │ 84bc3da1b3e33a18e8d5e1bdd7a18d7a │ 2020-01-01 │
│ 6 │ 9ae0ea9e3c9c6e1b9b6252c8395efdc1 │ 2020-01-01 │
│ 1 │ b026324c6904b2a9cb4b88d6d61c81d1 │ 2020-01-01 │
└────┴──────────────────────────────────┴────────────┘
被kill的節點在重啟后,數據恢復到和副本一致,集群業務正常。