ClickHouse三分片二備份
安裝
- 安裝 docker
- 安裝 docker-compose
- docker pull yandex/clickhouse-server
- docker pull zookeeper
服務器設置
- s1r1(分片1副本1)、s3r2(分片3副本2) 位於 s1 服務器
- s2r1(分片2副本1)、s1r2 (分片1副本2)位於 s2 服務器
- s3r1(分片3副本1)、s2r2(分片2副本2) 位於 s3 服務器
- zoo1(位於s1)、zoo2(位於s2)、zoo3 (位於s3)為 zookeeper集群
/etc/hosts 映射信息,實際部署時需要替換為自己的服務器IP,本文只在s1服務器中添加hosts信息,后文中若未指明服務器名,則默認在s1中執行:
[S1] s1
[S2] s2
[S3] s3
[S1] s1r1
[S2] s1r2
[S2] s2r1
[S3] s2r2
[S3] s3r1
[S1] s3r2
[S1] zoo1
[S2] zoo2
[S3] zoo3
zookeeper 設置
使用 docker-compose 進行部署,docker-compose.yaml 文件內容如下:
-
zoo1(s1服務器):
version: '3.1' services: zoo1: image: zookeeper restart: always hostname: zoo1 ports: - 2181:2181 - 2888:2888 - 3888:3888 environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181 extra_hosts: - "zoo1:[S1]" - "zoo2:[S2]" - "zoo3:[S3]"
-
zoo2(s2服務器):
version: '3.1' services: zoo2: image: zookeeper restart: always hostname: zoo2 ports: - 2181:2181 - 2888:2888 - 3888:3888 environment: ZOO_MY_ID: 2 ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181 extra_hosts: - "zoo1:[S1]" - "zoo2:[S2]" - "zoo3:[S3]"
-
zoo3(s3服務器):
version: '3.1' services: zoo3: image: zookeeper restart: always hostname: zoo3 ports: - 2181:2181 - 2888:2888 - 3888:3888 environment: ZOO_MY_ID: 3 ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181 extra_hosts: - "zoo1:[S1]" - "zoo2:[S2]" - "z003:[S3]"
在s1、s2、s3上分別啟動zookeeper:
docker-compose up -d
停止容器 docker-compose stop
刪除容器 docker-compose rm
顯示容器 docker-compose ps
重啟容器 docker-compose restart
配置數據庫
導出配置文件模版
首先啟動一個臨時容器來獲取配置文件config.xml:
docker run -itd --rm --name db yandex/clickhouse-server
拷貝config.xml和users.xml:
docker cp db:/etc/clickhouse-server/config.xml ./config.xml
docker cp db:/etc/clickhouse-server/users.xml ./users.xml
停止容器:
docker stop db
config.xml
在三台服務器上均需要2份(2備份),users.xml
在三台服務器中各需要一份(2備份共享該配置)
添加用戶和密碼
在users.xml中有詳細的注釋說明了如何添加用戶、密碼、用戶權限等配置方法。這里添加defualt用戶密碼和新建一個tom用戶,由於密碼存儲在文件,因此推薦使用sha256編碼后放入文件:
root@mq-228 ~/i/chdb# echo -n "default" | sha256sum | tr -d '-'···
37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
root@mq-228 ~/i/chdb# echo -n "tom-password" | sha256sum | tr -d '-'
d8c862690b30f6f2add244327715cb08ac926c7c2fb4fcbb7694650bfde5b672
default用戶密碼為default,tom的密碼為tom-password,添加至users.xml,在 users/default中刪除password項,添加:
<password_sha256_hex>37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f</password_sha256_hex>
在users/下添加:
<tom>
<password_sha256_hex>d8c862690b30f6f2add244327715cb08ac926c7c2fb4fcbb7694650bfde5b672</password_sha256_hex>
<profile>default</profile>
<quota>default</quota>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
</tom>
隨后即可通過以下命令登陸數據庫:
clickhouse-client -u default --password default --port 9001
clickhouse-client -u tom --password tom-password --port 9001
設置監聽網段
confix.xml中取消 listen_host 行的注釋:
<!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
<listen_host>::</listen_host>
數據庫文件主要由 /etc/clickhouse-server/users.xml 、/etc/clickhouse-server/config.xml 、/etc/metrika.xml 文件構成,其中metrika.xml的內容會覆蓋config.xml對應內容,由用戶自行創建,主要用於設置分片和備份。metrika.xml的文件名和路徑可以在config.xml中自定義。
s1服務器
工作目錄下的文件列表參考:
root@mq-227 ~/i/chdb# tree .
.
├── check_table_p.sh
├── config-s1r1.xml
├── config-s3r2.xml
├── create_table.sh
├── data_p.csv
├── delete_table_p.sh
├── docker-compose.yaml
├── metrika-s1r1.xml
├── metrika-s3r2.xml
├── metrika.xml
├── query_table.sh
└── users.xml
metrika-s1r1.xml內容(稍后在docker-compose.xml中會映射到容器內的metrika.xml):
<yandex>
<!-- 集群配置 -->
<clickhouse_remote_servers>
<!-- 3分片2備份,perftest_3shards_2replicas 為唯一ID-->
<perftest_3shards_2replicas>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>s1r1</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>s1r2</host>
<port>9002</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>s2r1</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>s2r2</host>
<port>9002</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>s3r1</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>s3r2</host>
<port>9002</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
</perftest_3shards_2replicas>
</clickhouse_remote_servers>
<!-- ZooKeeper 配置 -->
<zookeeper-servers>
<node index="1">
<host>zoo1</host>
<port>2181</port>
</node>
<node index="2">
<host>zoo2</host>
<port>2181</port>
</node>
<node index="3">
<host>zoo3</host>
<port>2181</port>
</node>
<session_timeout_ms>30000</session_timeout_ms>
<operation_timeout_ms>10000</operation_timeout_ms>
</zookeeper-servers>
<!-- 環境變量,不同分片需要替換ID,在創建表時可以引用該變量從而實現建表語句統一-->
<macros>
<shard>s1</shard>
<replica>r1</replica>
</macros>
</yandex>
復制之前導出的config.xml
並重命名為config-s1r1.xml (后文同),修改其TCP連接端口和同步端口:
<tcp_port>9001</tcp_port>
<interserver_http_port>9011</interserver_http_port>
metrika-s3r2.xml 與 metrika-s1r1.xml
內容相同,只需修改環境變量配置部分:
<macros>
<shard>s3</shard>
<replica>r2</replica>
</macros>
config-s3r2.xml 修改其TCP連接端口和同步端口:
<tcp_port>9002</tcp_port>
<interserver_http_port>9013</interserver_http_port>
docker-compose.xml內容:
version: '3.1'
services:
chdb-s1r1:
image: yandex/clickhouse-server:latest
hostname: s1r1
ports:
- 9001:9001
- 9011:9011
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s1r1.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s1r1.xml:/etc/metrika.xml
- /root/iot/chdb/s1r1:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
chdb-s3r2:
image: yandex/clickhouse-server:latest
hostname: s3r2
ports:
- 9002:9002
- 9013:9013
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s3r2.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s3r2.xml:/etc/metrika.xml
- /root/iot/chdb/s3r2:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
volumes
選項中的本地路徑需要按實際部署路徑修改(下同),其中最后一條為數據庫數據本地持久化映射,可按需更改文件位置。
s2服務器
metrika-s2r1.xml 與 metrika-s1r1.xml 類似,只需修改環境變量配置部分:
<macros>
<shard>s2</shard>
<replica>r1</replica>
</macros>
config-s2r1.xml 修改其TCP連接端口和同步端口:
<tcp_port>9001</tcp_port>
<interserver_http_port>9012</interserver_http_port>
metrika-s1r2.xml 與 metrika-s1r1.xml 類似,只需修改環境變量配置部分:
<macros>
<shard>s1</shard>
<replica>r2</replica>
</macros>
config-s1r2.xml 修改其TCP連接端口和同步端口:
<tcp_port>9002</tcp_port>
<interserver_http_port>9011</interserver_http_port>
docker-compose.xml內容:
version: '3.1'
services:
chdb-s2r1:
image: yandex/clickhouse-server:latest
hostname: s2r1
ports:
- 9001:9001
- 9012:9012
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s2r1.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s2r1.xml:/etc/metrika.xml
- /root/iot/chdb/s2r1:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
chdb-s1r2:
image: yandex/clickhouse-server:latest
hostname: s1r2
ports:
- 9002:9002
- 9011:9011
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s1r2.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s1r2.xml:/etc/metrika.xml
- /root/iot/chdb/s1r2:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
s3服務器
metrika-s3r1.xml 需修改環境變量配置部分:
<macros>
<shard>s3</shard>
<replica>r1</replica>
</macros>
config-s3r1.xml 修改其TCP連接端口和同步端口:
<tcp_port>9001</tcp_port>
<interserver_http_port>9013</interserver_http_port>
metrika-s2r2.xml 修改環境變量配置部分:
<macros>
<shard>s2</shard>
<replica>r2</replica>
</macros>
config-s2r2.xml 修改其TCP連接端口和同步端口:
<tcp_port>9002</tcp_port>
<interserver_http_port>9012</interserver_http_port>
docker-compose.xml內容:
version: '3.1'
services:
chdb-s3r1:
image: yandex/clickhouse-server:latest
hostname: s3r1
ports:
- 9001:9001
- 9013:9013
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s3r1.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s3r1.xml:/etc/metrika.xml
- /root/iot/chdb/s3r1:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
chdb-s2r2:
image: yandex/clickhouse-server:latest
hostname: s2r2
ports:
- 9002:9002
- 9012:9012
volumes:
- /root/iot/chdb/users.xml:/etc/clickhouse-server/users.xml
- /root/iot/chdb/config-s2r2.xml:/etc/clickhouse-server/config.xml
- /root/iot/chdb/metrika-s2r2.xml:/etc/metrika.xml
- /root/iot/chdb/s2r2:/var/lib/clickhouse
extra_hosts:
- "s1r1:[S1]"
- "s1r2:[S2]"
- "s2r1:[S2]"
- "s2r2:[S3]"
- "s3r1:[S3]"
- "s3r2:[S1]"
- "zoo1:[S1]"
- "zoo2:[S2]"
- "zoo3:[S3]"
驗證
連接至任意數據庫,查詢集群信息:
clickhouse-client -u default --password default --host s1r1 --port 9001
示例
ClickHouse備份使用的是表級備份,要使用備份需要使用 Replicated*表引擎,需要注意的是ClickHouse集群在執行 CREATE、 DROP、ATTACH、DETACH和RENAME時不會同步命令,因此創建表時需要在每個數據庫實例中都進行創建。
在s1服務器上執行s1/create_table_p.sh
腳本來創建表:
ports="9001 9002"
hosts="s1 s2 s3"
for port in $ports
do
for host in $hosts
do
echo "Creating table on $host:$port"
clickhouse-client -u default --password default --host $host --port $port --query \
"CREATE TABLE p (
ozone Int8,
particullate_matter Int8,
carbon_monoxide Int8,
sulfure_dioxide Int8,
nitrogen_dioxide Int8,
longitude Float64,
latitude Float64,
timestamp DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/p/{shard}','{replica}')
ORDER BY timestamp
PRIMARY KEY timestamp"
done
done
為了使用分片,需要一個虛擬的統一入口表,ClickHouse中稱為分布表(Distributed Table),可以簡單認為是一個數據路由表。創建分布表:
clickhouse-client --host s1r1 -u default --password default --port 9001 --query "CREATE TABLE p_all AS p
ENGINE = Distributed(perftest_3shards_2replicas, default, p, rand())"
其中 perftest_3shards_2replicas 為之前定義的集群ID。然后導入測試數據:
clickhouse-client --host s1r1 -u default --password default --port 9001 --query "INSERT INTO p_all FORMAT CSV" < data_p.csv
查看數據分片存儲的情況 s1/check_table_p.sh
:
ports="9001 9002"
hosts="s1 s2 s3"
for port in $ports
do
for host in $hosts
do
echo "Data from $host:$port"
clickhouse-client -u default --password default --host $host --port $port --query "select count(*) from p"
done
done