目前Clickhouse的備份方式有以下幾種:
-
文本文件導入導出
-
表快照
-
ALTER TABLE…FREEZE
-
備份工具Clickhouse-Backup
-
Clickhouse-Copier
下面就逐個試試吧。
# 數據備份 概述
https://clickhouse.tech/docs/en/operations/backup/
1. 文本文件導入導出
# 測試數據
MySQL中源數據6.70G,表數據量899萬
--測試表數據量899萬
--MySQL中源數據6.70G
0 rows in set. Elapsed: 71.482 sec. Processed 8.99 million rows, 6.70 GB (125.77 thousand rows/s., 93.71 MB/s.)
# 導出
clickhouse-client --query="select * from caihao.ch_test_customer" > /data/clickhouse/tmp/caihao.ch_test_customer.tsv
# 導入 (注意FORMAT后面大寫) 多個文件可以用 ch_test*
cat /data/clickhouse/tmp/caihao.ch_test_customer.tsv | clickhouse-client --query="insert into caihao.ch_test_customer FORMAT TSV"
速度:導入需要20多秒
# CH文件磁盤占用 368MB
368 ch_test_customer
# 備份文件3.5G 壓縮后139MB
[root@clickhouse-01 tmp]# du -hsm *
3539 caihao.ch_test_customer.tsv
[root@clickhouse-01 tmp]# gzip caihao.ch_test_customer.tsv
[root@clickhouse-01 tmp]# du -hsm *
139 caihao.ch_test_customer.tsv.gz
# 對比下占用空間:
-
MySQL -- 6.7G
-
ClickHouse -- 368M
-
導出文本 -- 3.5G
-
壓縮后 -- 139M
2. CTAS表快照
# 1 本地復制表
clickhouse-01 :) create table ch1 as ch_test_customer ;
CREATE TABLE ch1 AS ch_test_customer
Ok.
0 rows in set. Elapsed: 0.006 sec.
clickhouse-01 :) insert into table ch1 select * from ch_test_customer ;
INSERT INTO ch1 SELECT *
FROM ch_test_customer
Ok.
0 rows in set. Elapsed: 18.863 sec. Processed 8.99 million rows, 6.70 GB (476.59 thousand rows/s., 355.13 MB/s.)
# 2 遠程復制表
https://clickhouse.tech/docs/en/sql-reference/table-functions/remote/
-# 語法
remote('addresses_expr', db, table[, 'user'[, 'password']])
remote('addresses_expr', db.table[, 'user'[, 'password']])
-# 例子:
dba-docker :) insert into table ch1 select * from remote ('10.222.2.222','caihao.ch_test_customer','ch_app','qwerty_123');
INSERT INTO ch1 SELECT *
FROM remote('10.222.2.222', 'caihao.ch_test_customer', 'ch_app', 'qwerty_123')
Ok.
0 rows in set. Elapsed: 17.914 sec. Processed 8.99 million rows, 6.70 GB (501.85 thousand rows/s., 373.95 MB/s.)
3. ALTER TABLE…FREEZE
語法:
ALTER TABLE table_name FREEZE [PARTITION partition_expr]
該操作為指定分區創建一個本地備份。
如果 PARTITION 語句省略,該操作會一次性為所有分區創建備份。整個備份過程不需要停止服務
注意:FREEZE PARTITION 只復制數據, 不備份元數據. 元數據默認在文件 /var/lib/clickhouse/metadata/database/table.sql
1. 備份的步驟:
# 確認shadow目錄為空:
(默認位置:/var/lib/clickhouse/shadow/)
# OPTIMIZE TABLE 把臨時分區的數據,合並到已有分區中
OPTIMIZE TABLE caihao.test_restore_tab PARTITION '2020-10' FINAL;
或者
OPTIMIZE TABLE caihao.test_restore_tab FINAL;
# 讓ClickHouse凍結表:
echo -n 'alter table caihao.ch_test_customer freeze' | clickhouse-client
# 備份后的文件
[root@clickhouse-01 shadow]# ll /data/clickhouse/data/shadow/
total 8
drwxr-x--- 3 clickhouse clickhouse 4096 Oct 16 15:34 1
-rw-r----- 1 clickhouse clickhouse 2 Oct 16 15:34 increment.txt
[root@clickhouse-01 shadow]# du -hsm *
309 1
1 increment.txt
# 按日期保存備份:
mkdir -p /data/clickhouse/data/backup/20201016/
cp -r /data/clickhouse/data/shadow/ /data/clickhouse/data/backup/20201016/
# 最后,為下次備份清理shadow目錄:
rm -rf /data/clickhouse/data/shadow/*
2. 手動恢復
從備份中恢復數據,按如下步驟操作:
-
如果表不存在,先創建。查看.sql 文件獲取執行語句 (將ATTACH 替換成 CREATE).
-
從 備份的data/database/table/目錄中,將數據復制到 /var/lib/clickhouse/data/database/table/detached/目錄
-
運行 ALTER TABLE t ATTACH PARTITION操作,將數據添加到表中
測試把數據恢復到一個新表test_restore_tab中
# 1 獲取建表語句:
cat /data/clickhouse/data/metadata/caihao/ch_test_customer.sql
然后將DDL語句中的 ATTACH TABLE 改為 CREATE TABLE
# 2 備份復制到表的“ detached”目錄中:
cp -rl /data/clickhouse/data/backup/20201016/shadow/1/data/caihao/ch_test_customer/* /data/clickhouse/data/data/caihao/test_restore_tab/detached/
chown clickhouse:clickhouse -R /data/clickhouse/data/data/caihao/test_restore_tab/detached/*
# 3 將數據添加到表中 attach partition
echo 'alter table caihao.test_restore_tab attach partition 202010 ' | clickhouse-client
echo 'alter table caihao.test_restore_tab attach partition 202009 ' | clickhouse-client
要把所有分區都執行一遍,最終detached 目錄下所有的分區,都移動到了上一目錄
# 4 確認數據已還原:
echo 'select count() from caihao.test_restore_tab attach' | clickhouse-client
clickhouse-01 :) select count(*) from test_restore_tab;
SELECT count(*)
FROM test_restore_tab
┌─count()─┐
│ 8990020 │
└─────────┘
1 rows in set. Elapsed: 0.002 sec.
clickhouse-01 :) select count(*) from ch_test_customer;
SELECT count(*)
FROM ch_test_customer
┌─count()─┐
│ 8990020 │
└─────────┘
1 rows in set. Elapsed: 0.002 sec.
數據量和原表一致
4. Clickhouse-Backup
# Clickhouse-Backup簡介
https://github.com/AlexAkulov/clickhouse-backup
# 使用限制:
-
支持1.1.54390以上的ClickHouse
-
僅MergeTree系列表引擎
-
不支持備份Tiered storage或storage_policy
-
雲存儲上的最大備份大小為5TB
-
AWS S3上的parts數最大為10,000
--安裝方式1:二進制文件安裝
# clickhouse-backup下載:
wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.6.0/clickhouse-backup.tar.gz
# 解壓即用
tar -xf clickhouse-backup.tar.gz
cd clickhouse-backup /
sudo cp clickhouse-backup /usr/local/bin
--安裝方式2:rpm安裝:
wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.6.0/clickhouse-backup-0.6.0-1.x86_64.rpm
rpm -ivh clickhouse-backup-0.6.0-1.x86_64.rpm
# 查看版本
[root@clickhouse-01 clickhouse-backup]# clickhouse-backup -v
Version: 0.6.0
Git Commit: 7d7df1e36575f0d94d330c7bfe00aef7a2100276
Build Date: 2020-10-02
# 編輯配置文件:
mkdir -p /etc/clickhouse-backup/
vi /etc/clickhouse-backup/config.yml
添加一些基本的配置信息
general:
remote_storage: none
backups_to_keep_local: 7
backups_to_keep_remote: 31
clickhouse:
username: default
password: ""
host: localhost
port: 9000
data_path: "/data/clickhouse/data"
# 查看全部默認的配置項
clickhouse-backup default-config
# 查看可備份的表
clickhouse-backup tables
# 創建備份
1. 全庫備份
clickhouse-backup create
備份存儲在中 $data_path/backup 下,備份名稱默認為時間戳,可手動指定備份名稱。例如:
clickhouse-backup create ch_bk_20201020
備份包含兩個目錄:
-
'metadata'目錄: 包含重新創建所需的DDL SQL
-
'shadow'目錄: 包含作為ALTER TABLE ... FREEZE操作結果的數據。
2. 單表備份
語法:
clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>
備份表caihao.ch_test_customer
clickhouse-backup create -t caihao.ch_test_customer ch_test_customer
3. 備份多個表
clickhouse-backup create -t caihao.test_restore_tab,caihao.ch1 ch_bak_2tab
# 查看備份文件
[root@clickhouse-01 backup]# clickhouse-backup list
Local backups:
- 'test20201019' (created at 20-10-2020 14:18:40)
- 'ch_bk_20201020' (created at 20-10-2020 14:20:35)
- '2020-10-20T06-27-08' (created at 20-10-2020 14:27:08)
- 'ch_test_customer' (created at 20-10-2020 15:17:13)
- 'ch_bak_2tab' (created at 20-10-2020 15:33:41)
# 刪除備份文件
[root@clickhouse-01 backup]# clickhouse-backup delete local test20201019
[root@clickhouse-01 backup]#
[root@clickhouse-01 backup]# clickhouse-backup list
Local backups:
- 'ch_bk_20201020' (created at 20-10-2020 14:20:35)
- '2020-10-20T06-27-08' (created at 20-10-2020 14:27:08)
- 'ch_test_customer' (created at 20-10-2020 15:17:13)
- 'ch_bak_2tab' (created at 20-10-2020 15:33:41)
# 清除shadow下的臨時備份文件
[root@clickhouse-01 shadow]# clickhouse-backup clean
2020/10/20 14:19:13 Clean /data/clickhouse/data/shadow
# 數據恢復
語法:
clickhouse-backup restore 備份名
[root@clickhouse-01 ~]# clickhouse-backup restore -help
NAME:
clickhouse-backup restore - Create schema and restore data from backup
USAGE:
clickhouse-backup restore [--schema] [--data] [-t, --tables=<db>.<table>] <backup_name>
OPTIONS:
--config FILE, -c FILE Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
--table value, --tables value, -t value
--schema, -s Restore schema only
--data, -d Restore data only
一些參數:
-
--table 只恢復特定表,可使用正則。
如針對特定的數據庫:--table=dbname.*
-
--schema 只還原表結構
-
--data 只還原數據
# 備份到遠程目標
Clickhouse-backup 支持從遠程對象存儲(例如S3,GCS或IBM的COS)上載和下載備份。
例如 AWS 的 S3, 修改配置文件/etc/clickhouse-backup/config.yml
s3:
access_key: <AWS訪問密鑰>
secret_key: <AWS SECRET KEY>
bucket: <存儲桶BUCKET名稱>
region: us-east-1
path: "/some/path/in/bucket" <備份路徑>
然后即可以上傳備份:
$ clickhouse-backup upload 2020-07-06T20-13-02
2020/07/07 15:22:32 Upload backup '2020-07-06T20-13-02'
2020/07/07 15:22:49 Done.
或者下載備份:
$ sudo clickhouse-backup download 2020-07-06T20-13-02
2020/07/07 15:27:16 Done.
# 備份保留策略
general:下的2個參數來控制備份的保留策略
-
backups_to_keep_local: 0 # 本地備份保留個數
-
backups_to_keep_remote: 0 # 遠程備份保留個數
默認為0,即不自動做備份清理。
可以設置為:
-
backups_to_keep_local: 7
-
backups_to_keep_remote: 31
使用clickhouse-backup upload 上傳備份可以使用參數 --diff-from
將文件與以前的本地備份進行比較,僅上載新的/更改的文件。
必須保留先前的備份,以便從新備份中進行還原。
# 備份恢復測試:
測試庫有3張表,數據量一樣
dba-docker :) show tables;
SHOW TABLES
┌─name─┐
│ ch1 │ # 數據量 8990020
│ ch2 │ # 數據量 8990020
│ ch3 │ # 數據量 8990020
└──────┘
做個備份:bk_3_tab
clickhouse-backup create bk_3_tab
進行數據破壞:
truncate table ch1;
insert into ch2 select * from ch3;
drop table ch3;
此時的數據量
dba-docker :) show tables;
SHOW TABLES
┌─name─┐
│ ch1 │ # 數據量 0
│ ch2 │ # 數據量 8990020*2=17980040
└──────┘ # ch3被drop
只使用 --schema 恢復ch3表的表結構
clickhouse-backup restore bk_3_tab -table caihao.ch3 --schema
只有表結構,沒數據
dba-docker :) select count(*) from ch3;
SELECT count(*)
FROM ch3
┌─count()─┐
│ 0 │
└─────────┘
用 --data 恢復ch3表中數據
(注意,由於是ATTACH PART操作,如果執行2次的話,數據會翻倍)
clickhouse-backup restore bk_3_tab -table caihao.ch3 --data
數據已導入
dba-docker :) select count(*) from ch3;
SELECT count(*)
FROM ch3
┌─count()─┐
│ 8990020 │
└─────────┘
恢復其他表:
[root@dba-docker ~]# clickhouse-backup restore bk_3_tab
2020/10/20 17:42:37 Create table 'caihao.ch1'
2020/10/20 17:42:37 can't create table 'caihao.ch1': code: 57, message: Table caihao.ch1 already exists.
由於要新建表,只能把表drop掉才能全庫恢復。
直接 drop database,然后全庫恢復
clickhouse-backup restore bk_3_tab
驗證后數據是全部恢復成功了
dba-docker :) show tables;
SHOW TABLES
┌─name─┐
│ ch1 │
│ ch2 │
│ ch3 │
└──────┘
dba-docker :) select count(*) from ch1;
SELECT count(*)
FROM ch1
┌─count()─┐
│ 8990020 │
└─────────┘
# 加到每日備份任務中:
mkdir -p /data/clickhouse/scripts
vi /data/clickhouse/scripts/CH_Full_Backup.sh
#!/bin/bash
BACKUP_NAME=CH_Full_Backup_$(date +%Y-%m-%dT%H-%M-%S)
/usr/bin/clickhouse-backup create $BACKUP_NAME
# /usr/bin/clickhouse-backup upload $BACKUP_NAME
原文鏈接:https://mp.weixin.qq.com/s?src=11×tamp=1629098841&ver=3255&signature=FwALsxuhUf4-kQ9K-WOCyFrKbyZ6FZ2BZiSghswGuU6UBjRRQkL8kM99k3Tb*IQuh-Px2KtwmGISneW02qSt4I8-J*dVLX3bJF9Z8jK5W-GTfetzpUIizMO8XygA-Ph5&new=1
