【Clickhouse】clickhouse各種模式容器環境搭建


為了方便學習測試,在沒有服務器部署的情況,容器方式可以幫助我們快速啟動一個學習測試環境。可以根據需要啟一個單機模式,或者分片集群模式。下面分別介紹兩種啟動方式:

  • 單機模式
  • 副本模式
  • 分片模式

其實可以通過docker-compose 快速起一個clickhouse集群環境,但是為了學習部署過程還是分步操作
本次試驗clickhouse版本為 22.3.2.1

1. 單機模式

單機模式很簡單 https://hub.docker.com/r/clickhouse/clickhouse-server

1.1 服務端容器啟動方式

之前鏡像是yandex/clickhouse-server,現在clickhouse/clickhouse-server
可以根據需要選擇合適啟動命令

方式一:最簡單命令

docker run -d --name single-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server

方式二:指定端口映射

docker run -d --name single-clickhouse-server --ulimit nofile=262144:262144 -p 8123:8123 -p 9000:9000 -p 9009:9009 clickhouse/clickhouse-server

方式三:掛載目錄

## 數據目錄 /var/lib/clickhouse/ 
$ mkdir -p /Users/momo/studySpace/docker/clickhouse/single-volume/data
$ docker run -d --name single-clickhouse-server \
--ulimit nofile=262144:262144 \
-p 8123:8123 -p 9000:9000 -p 9009:9009 \
--volume=/Users/momo/studySpace/docker/clickhouse/single-volume/data:/var/lib/clickhouse \
--platform linux/amd64 \
clickhouse/clickhouse-server 

如果報該錯,並且在mac m1環境,可嘗試在docker run命令中添加選項**--platform linux/amd64 **
官方解釋:https://docs.docker.com/desktop/mac/apple-silicon/
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

方式四:掛載配置

為了在本機修改配置方便,可將配置掛載到本機

  • 容器中配置目錄: /etc/clickhouse-server
  • 需要提前准備好配置放到掛載目錄,否則會有啟動報錯: Configuration file '/etc/clickhouse-server/config.xml' isn't readable by user with id '101'
  • 查看clickhouse版本:select version() 查看版本 22.3.2.1
# 創建配置目錄在本機掛載目錄
$ mkdir -p /Users/momo/studySpace/docker/clickhouse/single-volume/{config,logs}
# 可先能過以上方式啟動容器,然后將容器配置拷貝出來, docker cp 容器id:容器目錄 本機目錄
$ docker cp b723e2da1368:/etc/clickhouse-server /Users/momo/studySpace/docker/clickhouse/single-volume/config

$ docker run -d --name single-clickhouse-server \
--ulimit nofile=262144:262144 \
-p 8123:8123 -p 9000:9000 -p 9009:9009 \
--volume=/Users/momo/studySpace/docker/clickhouse/single-volume/data:/var/lib/clickhouse \
--volume=/Users/momo/studySpace/docker/clickhouse/single-volume/config/clickhouse-server:/etc/clickhouse-server \
--volume=/Users/momo/studySpace/docker/clickhouse/single-volume/logs:/var/log/clickhouse-server \
--platform linux/amd64 \
clickhouse/clickhouse-server

刪除容器

## 刪除容器
docker stop single-clickhouse-server && docker rm single-clickhouse-server

1.2 客戶端連接命令

docker run -it --rm --link single-clickhouse-server:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

2. 副本模式

副本的目的主要是保障數據的高可用性,即使一台 ClickHouse 節點宕機,那么也可以從 其他服務器獲得相同的數據。https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/

zookeeper
host1
host2

2.1 zookeeper

standalone方式啟動,詳細使用說明:https://hub.docker.com/_/zookeeper

# 創建專有網絡,zk,clickhouse容器都加入該網絡
docker network create clickhouse

# 方式一:快速啟動
$ docker run --name zk-standalone --network clickhouse --restart always -d zookeeper

# 方式二:掛載目錄,防止重啟數據丟失
## 數據目錄 /data,日志目錄 /datalog
$ mkdir -p /Users/momo/studySpace/docker/zookeeper/data
$ mkdir -p /Users/momo/studySpace/docker/zookeeper/datalog
$ docker run --name zk-standalone \
--restart always \
-v /Users/momo/studySpace/docker/zookeeper/data:/data \
-v /Users/momo/studySpace/docker/zookeeper/datalog:/datalog	\
-d zookeeper

2.2 clickhouse

2.2.1 准備配置

創建本地掛載目錄

mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume1/config
mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume2/config

# 可先通過單機模式啟動容器,然后將容器配置拷貝出來, docker cp 容器id:容器目錄 本機目錄
$ docker cp b723e2da1368:/etc/clickhouse-server /Users/momo/studySpace/docker/clickhouse/replica-volume1/config
$ docker cp b723e2da1368:/etc/clickhouse-server /Users/momo/studySpace/docker/clickhouse/replica-volume2/config

$ tree replica-volume1 
replica-volume1
└── config
    └── clickhouse-server
        ├── config.d
        │   └── docker_related_config.xml
        ├── config.xml
        ├── users.d
        └── users.xml

4 directories, 3 files

clickhouse-server/config.xml 添加內容如下:

<zookeeper>
  <node>
    <host>zookeeper</host>
    <port>2181</port>
  </node>
</zookeeper>

修改docker_related_config.xml

<clickhouse>
     <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
    <listen_host>::</listen_host> <!--刪除這一條,ipv6時保留該配置-->
    <listen_host>0.0.0.0</listen_host> <!--當前使用ipv4, 保留該配置-->
    <listen_try>1</listen_try>

    <!--
    <logger>
        <console>1</console>
    </logger>
    -->
</clickhouse>

同步replica-volume1和replica-volume2下的配置

2.2.2 啟動clickhouse server

# 創建相應的掛載目錄
mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume1/data
mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume2/data

mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume1/logs
mkdir -p /Users/momo/studySpace/docker/clickhouse/replica-volume2/logs


$ docker run -d --name replica-host1 \
-h replica-host1 \
--ulimit nofile=262144:262144 \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume1/data:/var/lib/clickhouse \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume1/logs:/var/log/clickhouse-server \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume1/config/clickhouse-server:/etc/clickhouse-server \
--link zk-standalone:zookeeper \
--network clickhouse \
--platform linux/amd64 \
clickhouse/clickhouse-server

$ docker run -d --name replica-host2 \
-h replica-host2 \
--ulimit nofile=262144:262144 \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume2/data:/var/lib/clickhouse \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume2/logs:/var/log/clickhouse-server \
--volume=/Users/momo/studySpace/docker/clickhouse/replica-volume2/config/clickhouse-server:/etc/clickhouse-server \
--link zk-standalone:zookeeper \
--network clickhouse \
--platform linux/amd64 \
clickhouse/clickhouse-server

2.2.3 啟動客戶端

客戶端連接
副本只能同步數據,不能同步表結構,所以我們需要在每台機器上自己手動建表

### 1. 連接replica-host1
$ docker run -it --rm --network clickhouse --link replica-host1:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

### 1. replica-host1建表
replica-host1 :) create table t_order_rep2 (
	id UInt32,
	sku_id String,
	total_amount Decimal(16,2), 
  create_time Datetime
 ) engine =ReplicatedMergeTree('/clickhouse/tables/01/t_order_rep','rep_001')
  partition by toYYYYMMDD(create_time)
  primary key (id)
  order by (id,sku_id);

### 2. 連接replica-host1
$ docker run -it --rm --network clickhouse --link replica-host2:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server
### 2. 這時候查看replica-host2,是沒有表的
replica-host2 :) show tables

SHOW TABLES

Query id: e688ae0a-d457-4446-a014-c2f4fdd0a9e4

Ok.

0 rows in set. Elapsed: 0.017 sec. 
### 2. replica-host2建表
replica-host2 :)  create table t_order_rep2 (
  id UInt32,
  sku_id String,
  total_amount Decimal(16,2), 
  create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/01/t_order_rep','rep_002')
  partition by toYYYYMMDD(create_time)
  primary key (id)
  order by (id,sku_id);
  
  
### 3. 這個時候可以檢查zookeeper上是否已有元數據, 如果沒有,檢查zookeeper相關配置是否正常
  $ docker exec -it zk-standalone /bin/bash
  $ ./bin/zkCli.sh 
  [zk: localhost:2181(CONNECTED) 12] ls /clickhouse
tables        task_queue   
[zk: localhost:2181(CONNECTED) 12] ls /clickhouse/tables 
[01]
[zk: localhost:2181(CONNECTED) 13] ls /clickhouse/tables/01
[t_order_rep]
[zk: localhost:2181(CONNECTED) 14] ls /clickhouse/tables/01/t_order_rep 
[alter_partition_version, block_numbers, blocks, columns, leader_election, log, metadata, mutations, nonincrement_block_numbers, part_moves_shard, pinned_part_uuids, quorum, replicas, table_shared_id, temp, zero_copy_hdfs, zero_copy_s3]

### 3. 也可以在clickhouse client執行sql查詢
replica-host1 :) select * from system.zookeeper where path='/'

SELECT *
FROM system.zookeeper
WHERE path = '/'

Query id: 30f82647-2d59-42b7-b48b-5c46eba16d72

┌─name───────┬─value─┬─czxid─┬─mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬─pzxid─┬─path─┐
│ zookeeper  │       │     0 │     0 │ 1970-01-01 00:00:00 │ 1970-01-01 00:00:00 │       0 │       -2 │        0 │              0 │          0 │           2 │     0 │ /    │
│ clickhouse │       │     2 │     2 │ 2022-03-31 03:36:02 │ 2022-03-31 03:36:02 │       0 │        2 │        0 │              0 │          0 │           2 │    11 │ /    │
└────────────┴───────┴───────┴───────┴─────────────────────┴─────────────────────┴─────────┴──────────┴──────────┴────────────────┴────────────┴─────────────┴───────┴──────┘

2 rows in set. Elapsed: 0.028 sec.

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
參數說明:
ReplicatedMergeTree中:

  • 第一個參數是分片的zk_path一般按照:/clickhouse/table/{shard}/{table_name} 的格式寫,如果只有一個分片就寫 01 即可。
  • 第二個參數是副本名稱,相同的分片副本名稱不能相同。

實例插入數據,驗證數據是否同步

# 1. 在replica_host1插入數據
insert into t_order_rep2 values
(101,'sku_001',1000.00,'2022-03-01 12:00:00'),
(102,'sku_002',2000.00,'2022-03-01 12:00:00'),
(103,'sku_004',2500.00,'2022-03-01 12:00:00'),
(104,'sku_002',2000.00,'2022-03-01 12:00:00'),
(105,'sku_003',600.00,'2022-03-02 12:00:00');

# 2. 在replica_host1查詢數據,已經插入成功
replica-host1 :) select * from t_order_rep2

SELECT *
FROM t_order_rep2

Query id: b9dc6dc0-891b-4b68-8d6b-44cd835da82a

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 105 │ sku_003 │          600 │ 2022-03-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │         1000 │ 2022-03-01 12:00:00 │
│ 102 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
│ 103 │ sku_004 │         2500 │ 2022-03-01 12:00:00 │
│ 104 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

5 rows in set. Elapsed: 0.059 sec.

# 3. 在replica_host2查詢數據,發現數據自動同步過來了
replica-host2 :) select * from t_order_rep2

SELECT *
FROM t_order_rep2

Query id: a2f6a208-5986-48d8-b50b-15779d4e3da4

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │         1000 │ 2022-03-01 12:00:00 │
│ 102 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
│ 103 │ sku_004 │         2500 │ 2022-03-01 12:00:00 │
│ 104 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 105 │ sku_003 │          600 │ 2022-03-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

5 rows in set. Elapsed: 0.095 sec. 

2.2.4 停止刪除容器

# 停止
docker stop replica-host1
docker stop replica-host2

# 重啟
docker restart replica-host1
docker restart replica-host2

# 停止並刪除
docker stop replica-host1 && docker rm replica-host1
docker stop replica-host2 && docker rm replica-host2

2.2.5 錯誤處理

No interserver IO endpoint named DataPartsExchange

2022.03.31 03:40:52.215028 [ 121 ] {} <Error> InterserverIOHTTPHandler: Code: 221. DB::Exception: No interserver IO endpoint named DataPartsExchange:/clickhouse/tables/01/t_order_rep/replicas/rep_001. (
NO_SUCH_INTERSERVER_IO_ENDPOINT), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa4dde1a in /usr/bin/clickhouse
1. DB::InterserverIOHandler::getEndpoint(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x151e36db in /usr/bin/clickhouse
2. DB::InterserverIOHTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTTPServerResponse&, DB::InterserverIOHTTPHandler::Output&) @ 0x151e2f29 in /usr/bin/clickhouse
3. DB::InterserverIOHTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x151e3a51 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0x1545979b in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x164b264f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x164b4aa1 in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x16671e49 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x1666f1a0 in /usr/bin/clickhouse
9. ? @ 0x400086b609 in ?
10. clone @ 0x40009a5163 in ?

解決辦法:
配置config.xml里interserver_http_host配置不合理,刪除就好了
<interserver_http_host>0.0.0.0</interserver_http_host>

CertificateReloader: Cannot obtain modification time for certificate file

2022.03.31 03:36:01.021444 [ 1 ] {} <Error> CertificateReloader: Cannot obtain modification time for certificate file /etc/clickhouse-server/server.crt, skipping update. errno: 2, strerror: No such file
 or directory
2022.03.31 03:36:01.022164 [ 1 ] {} <Error> CertificateReloader: Cannot obtain modification time for key file /etc/clickhouse-server/server.key, skipping update. errno: 2, strerror: No such file or dire
ctory
2022.03.31 03:36:01.041271 [ 1 ] {} <Error> CertificateReloader: Poco::Exception. Code: 1000, e.code() = 0, SSL context exception: Error loading private key from file /etc/clickhouse-server/server.key:
error:02000002:system library:OPENSSL_internal:No such file or directory (version 22.3.2.1)

解決辦法:
添加配置 /etc/clickhouse-server/config.d/ssl.xml,內容如下:

<?xml version="1.0"?>
<clickhouse>
    <openSSL>
      <server>
        <certificateFile remove="1"/>
        <privateKeyFile remove="1"/>
      </server>
   </openSSL>
</clickhouse>

Application: Listen [::1]:8123 failed

2022.03.31 06:41:25.734263 [ 702 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.
2022.03.31 06:41:25.739990 [ 702 ] {} <Warning> Application: Listen [::1]:8123 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:8123 (version 22.3.2.1). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2022.03.31 06:41:25.740849 [ 702 ] {} <Warning> Application: Listen [::1]:9000 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:9000 (version 22.3.2.1). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2022.03.31 06:41:25.741669 [ 702 ] {} <Warning> Application: Listen [::1]:9004 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:9004 (version 22.3.2.1). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>

解決辦法
修改配置docker_related_config.xml
默認:

<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
<listen_try>1</listen_try>

改為: 0.0.0.0 因為我當前使用的ipv4,所以只保留這個,另一個是ipv6 1

插入數據后,副本節點沒有同步數據

比如有2個副本節點replica-host1 replica-host2,在兩個節點分別建表后,在replica-host1插入數據后,在replica-host1能查到數據,在replica-host2沒有查到數據
查看replica-host2的日志發現:

2022.03.31 07:00:42.687816 [ 156 ] {} <Error> DNSResolver: Cannot resolve host (replica-host1), error 0: replica-host1.
2022.03.31 07:00:42.695884 [ 156 ] {} <Error> default.t_order_rep2 (908dbbab-99fe-4e7e-9901-c74dcbdc94ac): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 198. DB::Exception: Not found address of host: replica-host1. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):

解決辦法:
容器環境,容器間通過主機名沒有找到容器,dns問題
但是我添加dns后發現仍然不行
最終通過docker network create clickhouse創建network,將zk,clickhouse所有實例都放到該網絡就可以通過容器主機名通信了

3. 分片模式

  • 副本雖然能夠提高數據的可用性,降低丟失風險,但是每台服務器實際上必須容納全量 數據,對數據的橫向擴容沒有解決。
  • 要解決數據水平切分的問題,需要引入分片的概念。通過分片把一份完整的數據進行切分,不同的分片分布到不同的節點上,再通過 Distributed 表引擎把數據拼接起來一同使用。
  • Distributed 表引擎本身不存儲數據,有點類似於 MyCat 之於 MySql,成為一種中間件,通過分布式邏輯表來寫入、分發、路由來操作多台節點不同分片的分布式數據。

注意:ClickHouse 的集群是表級別的,實際企業中,大部分做了高可用,但是沒有用分片,避免降低查詢性能以及操作集群的復雜性。

物理機步驟:

  1. 所有節點安裝ClickHouse server
  2. 在所有節點修改集群配置
  3. 在每個節點創建本地表 (副本只能同步數據,不能同步表結構,所以我們需要在每台機器上自己手動建表)
  4. 創建一個分布式表(Distributed 表引擎本身不存儲數據,用於拼接分片數據)

容器方式步驟:

  1. 三個clickhouse server容器實例
  2. 為每個容器實例掛載相應的配置
  3. 通過clickhouse client連接每個實例,創建本地表
  4. 創建分布式表

原理就是zookeeper + ReplicatedMergeTree(復本表) + Distributed(分布式表)

官網分布式表參考:https://clickhouse.com/docs/en/engines/table-engines/special/distributed/

3.1 zookeeper

standalone方式啟動,詳細使用說明:https://hub.docker.com/_/zookeeper

# 創建專有網絡,zk,clickhouse容器都加入該網絡
docker network create clickhouse

# 方式一:快速啟動
$ docker run --name zk-standalone --network clickhouse --restart always -d zookeeper

# 方式二:掛載目錄,防止重啟數據丟失
## 數據目錄 /data,日志目錄 /datalog
$ mkdir -p /Users/momo/studySpace/docker/zookeeper/data
$ mkdir -p /Users/momo/studySpace/docker/zookeeper/datalog
$ docker run --name zk-standalone \
--restart always \
-v /Users/momo/studySpace/docker/zookeeper/data:/data \
-v /Users/momo/studySpace/docker/zookeeper/datalog:/datalog	\
-d zookeeper

3.2 clickhouse(1分片2副本-1分片1副本-共6節點)

3.2.1 准備配置

創建本地掛載目錄

# 創建相應的掛載目錄
for ((i=1;i<=3;i++));do echo ch-host$i;mkdir -p /Users/momo/studySpace/docker/clickhouse/ch-volume$i/{data,logs,config};done

# 可先通過單機模式啟動容器,然后將容器配置拷貝出來, docker cp 容器id:容器目錄 本機目錄
$ docker cp b723e2da1368:/etc/clickhouse-server /Users/momo/studySpace/docker/clickhouse/ch-volume1/config

$ tree ch-volume1
ch-volume1
├── config
│   └── clickhouse-server
│       ├── config.d
│       │   ├── docker_related_config.xml
│       │   ├── metrika-shard.xml 內容如下
│       │   └── ssl.xml,參考2.2.5第2個報錯解決辦法
│       ├── config.xml
│       ├── users.d
│       └── users.xml
├── data
└── logs

6 directories, 5 files

/etc/clickhouse-server/config.d/metrika-shard.xml(也可以直接配置在config.xml中)

<?xml version="1.0"?>
<clickhouse>
    <remote_servers>
        <jiazz_cluster>
            <!-- 集群名稱-->
            <shard>
                <!--集群的第一個分片-->
                <internal_replication>true</internal_replication>
                <replica>
                    <!--該分片的第一個副本-->
                    <host>ch-host1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <!--該分片的第二個副本-->
                    <host>ch-host2</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <!--集群的第二個分片-->
                <internal_replication>true</internal_replication>
                <replica>
                    <!--該分片的第一個副本-->
                    <host>ch-host3</host>
                    <port>9000</port>
                </replica>
            </shard>
        </jiazz_cluster>
    </remote_servers>
    <zookeeper-servers>
        <node index="1">
            <host>zookeeper</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    <macros>
        <shard>01</shard>
        <!--不同機器放的分片數不一樣-->
        <replica>rep_1_1</replica>
        <!--不同機器放的副本數不一樣-->
    </macros>
</clickhouse>

/etc/clickhouse-server/config.xml 添加如下配置:

<zookeeper incl="zookeeper-servers" optional="true" />
<include_from>/etc/clickhouse-server/config.d/metrika-shard.xml</include_from>

同步配置config.xml 到每個節點,並按節點修改配置metrika-shard.xml的macros

3.2.2 啟動clickhouse server

# 啟動 
for ((i=1;i<=3;i++));do \
echo "start ch-host$i";\
docker run -d --name ch-host$i \
-h ch-host$i \
--ulimit nofile=262144:262144 \
--volume=/Users/momo/studySpace/docker/clickhouse/ch-volume$i/data:/var/lib/clickhouse \
--volume=/Users/momo/studySpace/docker/clickhouse/ch-volume$i/logs:/var/log/clickhouse-server \
--volume=/Users/momo/studySpace/docker/clickhouse/ch-volume$i/config/clickhouse-server:/etc/clickhouse-server \
--link zk-standalone:zookeeper \
--network clickhouse \
--platform linux/amd64 \
clickhouse/clickhouse-server; \
done 

# 停止 
for ((i=1;i<=3;i++));do echo "stop ch-host$i"; docker stop ch-host$i;done

# 重啟
for ((i=1;i<=3;i++));do echo "restart ch-host$i"; docker restart ch-host$i;done

# 刪除
for ((i=1;i<=3;i++));do echo "rm ch-host$i"; docker stop ch-host$i && docker rm ch-host$i;done

# 清空掛載目錄
for ((i=1;i<=3;i++));do echo "clear ch-host$i"; rm -rf /Users/momo/studySpace/docker/clickhouse/ch-volume$i/{data,logs}/*;done

3.2.3 啟動客戶端

3.2.3.1 啟動連接指定clickhouse server的客戶端

# 連接ch-host1
docker run -it --rm --network clickhouse --link ch-host1:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

# 連接ch-host2
docker run -it --rm --network clickhouse --link ch-host2:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

# 連接ch-host3
docker run -it --rm --network clickhouse --link ch-host3:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

3.2.3.2 在ch-host1上建表

  • 會自動同步到ch-host2和ch-host2上
  • 集群名字要和配置文件中的一致
  • 分片和副本名稱從配置文件的宏定義中獲取
# 在一個節點建表
ch-host1 :) create table st_order_mt on cluster jiazz_cluster (
              id UInt32,
              sku_id String,
              total_amount Decimal(16,2),
              create_time Datetime
            ) engine=ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt','{replica}')
            partition by toYYYYMMDD(create_time)
            primary key (id)
            order by (id,sku_id);

CREATE TABLE st_order_mt ON CLUSTER jiazz_cluster
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt', '{replica}')
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)

Query id: 366f51ae-f5eb-4773-ba5e-beb4df56cf1a

┌─host─────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ ch-host1 │ 9000 │      0 │       │                   2 │                0 │
│ ch-host2 │ 9000 │      0 │       │                   1 │                0 │
│ ch-host3 │ 9000 │      0 │       │                   0 │                0 │
└──────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

3 rows in set. Elapsed: 0.292 sec.

# ch-host2 ch-host3 表都有了
ch-host2 :) show tables

SHOW TABLES

Query id: e4c9d46c-476c-423a-ba0f-ab2c3984b6b4

┌─name────────┐
│ st_order_mt │
└─────────────┘

1 rows in set. Elapsed: 0.037 sec.

3.2.3.3 在ch-host1上建分布式表

參數含義:
Distributed(集群名稱,庫名,本地表名,分片鍵)
分片鍵必須是整型數字,所以用 hiveHash 函數轉換,也可以 rand()

ch-host1 :) create table st_order_mt_all on cluster jiazz_cluster(
              id UInt32,
              sku_id String,
              total_amount Decimal(16,2),
              create_time Datetime
            )engine = Distributed(jiazz_cluster,default,st_order_mt,hiveHash(sku_id));

CREATE TABLE st_order_mt_all ON CLUSTER jiazz_cluster
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = Distributed(jiazz_cluster, default, st_order_mt, hiveHash(sku_id))

Query id: 655b7f5d-ea2c-42c0-bfde-358ef50192bd

┌─host─────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ ch-host1 │ 9000 │      0 │       │                   2 │                0 │
│ ch-host2 │ 9000 │      0 │       │                   1 │                0 │
│ ch-host3 │ 9000 │      0 │       │                   0 │                0 │
└──────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

3 rows in set. Elapsed: 0.155 sec.

# 在其它節點確認建表成功
ch-host2 :) show tables

SHOW TABLES

Query id: 0c5d7ee9-80a0-46e6-aee7-7b06afb17729

┌─name────────────┐
│ st_order_mt     │
│ st_order_mt_all │
└─────────────────┘

2 rows in set. Elapsed: 0.044 sec.

3.2.3.4 在ch-host1插入測試數據

ch-host1 :) insert into st_order_mt_all values
            (201,'sku_001',1000.00,'2022-03-01 12:00:00') ,
            (202,'sku_002',2000.00,'2022-03-01 12:00:00'),
            (203,'sku_004',2500.00,'2022-03-01 12:00:00'),
            (204,'sku_002',2000.00,'2022-03-01 12:00:00'),
            (205,'sku_003',600.00,'2022-03-02 12:00:00');

INSERT INTO st_order_mt_all FORMAT Values

Query id: 0c886dc1-066c-402a-804e-0cbb9d004b5d

Ok.

5 rows in set. Elapsed: 0.156 sec.

3.2.3.5 通過查詢分布式表和本地表觀察輸出結果

####### 查詢分布式表,所有節點查詢結果都一樣
ch-host2 :) SELECT * FROM st_order_mt_all;

SELECT *
FROM st_order_mt_all

Query id: 72a08fec-5477-4c9c-941b-6bc471400467

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2022-03-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 201 │ sku_001 │         1000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 205 │ sku_003 │          600 │ 2022-03-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

5 rows in set. Elapsed: 0.099 sec.

####### 查詢本地表,ch-host1和ch-host2結果一樣,因為是第一分片互為復本,ch-host3結果是第二個分片內容
ch-host1 :) select * from st_order_mt;

SELECT *
FROM st_order_mt

Query id: f728b8b0-46ec-4b02-8eca-825173c82dab

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2022-03-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

3 rows in set. Elapsed: 0.041 sec.

ch-host2 :) select * from st_order_mt;

SELECT *
FROM st_order_mt

Query id: e67512ea-1b43-498d-97b0-5bc57e5172be

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2022-03-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

3 rows in set. Elapsed: 0.041 sec.

ch-host3 :) select * from st_order_mt;

SELECT *
FROM st_order_mt

Query id: f097753a-4d28-46d8-8d52-82a46908d9af

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 205 │ sku_003 │          600 │ 2022-03-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 201 │ sku_001 │         1000 │ 2022-03-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

2 rows in set. Elapsed: 0.021 sec.

3.3 clickhouse(3分片2副本-共6節點)

3分片2副本部署參考:https://zhuanlan.zhihu.com/p/461792873
此設置僅用於理論驗證,不建議在生產中進行此設置。
Clickhouse支持分片和復本:

  • Data distribution是指將非常大的數據集拆分為多個存儲在不同服務器上的碎片(數據集的較小部分)。ClickHouse根據分片鍵將數據集分為多個分片。每個分片保存並處理一部分數據,然后將來自多個分片的查詢結果組合在一起,給出最終結果。

  • Data replication是指在其他服務器節點上保留一份數據副本,以確保在服務器節點出現故障時可用。這還可以通過允許多台服務器並行處理數據查詢來提高性能。

  • Circular Replication這里介紹的循環復本是將數據分布在3個分片,並復制2次

  • 這需要在每個節點上放置2個不同的分片。由於每個分片都有相同的表名,因此當碎片/副本位於同一服務器上時,ClickHouse無法區分它們。

  • 解決這個問題的辦法:

    • 將每個分片放入單獨的數據庫
    • 為每個分片設置默認數據庫
    • 在分布式表ddl中設置空數據庫參數,在查詢時系統將使用默認數據庫將特定表的查詢路由到正確的數據庫
    • 設置每個分片的internal_replication=true

  • 生產中的這種拓撲通常需要6個服務器節點,因為每個服務器只存儲一個分片的數據,不需要單獨的數據庫解決方案。

3.3.1 准備配置

  • 在config.d和users.d目錄下的.xml文件都會被自動加載到clickhouse中,所以我們可以將不同配置放到不同文件中,再起一個代表性的名字,比如zookeeper.xml表示zk相關配置。
  • 盡管默認配置是config.xml 和user.xml,但是不建議直接編輯修改該文件,因為
    • 這些文件可能會在下次升級時被覆蓋
    • 一個文件中配置非常多的時候,查找某個配置比較困難
    • 只需復制XML文件即可輕松復制用戶配置文件
    • 便於在節點間復制配置文件

3.3.1.1 創建本地掛載目錄

# 創建相應的掛載目錄
for ((i=1;i<=3;i++));do echo ch2-host$i;mkdir -p /Users/momo/studySpace/docker/clickhouse/ch2-volume$i/{data,logs,config};done

# 可先通過單機模式啟動容器,然后將容器配置拷貝出來, docker cp 容器id:容器目錄 本機目錄
$ docker cp b723e2da1368:/etc/clickhouse-server /Users/momo/studySpace/docker/clickhouse/ch2-volume1/config

$ tree ch2-volume1
ch-volume1
├── config
│   └── clickhouse-server
│       ├── config.d
│       │   ├── docker_related_config.xml
│       │   ├── metrika-shard.xml 內容如下
│       │   └── ssl.xml,參考2.2.5第2個報錯解決辦法
│       ├── config.xml
│       ├── users.d
│       └── users.xml
├── data
└── logs

6 directories, 5 files

3.3.1.2 cluster.xml

/etc/clickhouse-server/config.d/cluster.xml

<?xml version="1.0"?>
<clickhouse>
    <remote_servers>
        <jiazz_cluster_3s_2r>
            <!-- 集群名稱-->
            <shard>
                <!--集群的第一個分片-->
                <internal_replication>true</internal_replication>
                <replica>
                    <!--該分片的第一個副本-->
                    <default_database>dwh01</default_database>
                    <host>ch-host1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <!--該分片的第二個副本-->
                    <default_database>dwh01</default_database>
                    <host>ch-host2</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <!--集群的第二個分片-->
                <internal_replication>true</internal_replication>
                <replica>
                    <!--該分片的第一個副本-->
                    <default_database>dwh02</default_database>
                    <host>ch-host2</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <!--該分片的第二個副本-->
                    <default_database>dwh02</default_database>
                    <host>ch-host3</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <!--集群的第三個分片-->
                <internal_replication>true</internal_replication>
                <replica>
                    <!--該分片的第一個副本-->
                    <default_database>dwh03</default_database>
                    <host>ch-host3</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <!--該分片的第二個副本-->
                    <default_database>dwh03</default_database>
                    <host>ch-host1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </jiazz_cluster_3s_2r>
    </remote_servers>
</clickhouse>

3.3.1.3 zookeeper.xml

/etc/clickhouse-server/config.d/zookeeper.xml

<?xml version="1.0"?>
<clickhouse>
    <zookeeper>
        <node index="1">
            <host>zookeeper</host>
            <port>2181</port>
        </node>
    </zookeeper>
</clickhouse>

3.3.1.4 macro.xml

/etc/clickhouse-server/config.d/macro.xml

<?xml version="1.0"?>
<clickhouse>
    <macros>
        <cluster01>jiazz_cluster_3s_2r</cluster01>
        <!--不同機器放的分片數不一樣-->
        <shard01>s1</shard01>
        <shard02>s3</shard02>
        <!--不同機器放的副本數不一樣-->
        <replica01>ch1_s1_r1</replica01>
        <replica02>ch1_s3_r1</replica02>
    </macros>
</clickhouse>

3.3.2 啟動clickhouse server

# 啟動 
for ((i=1;i<=3;i++));do \
echo "start ch-host$i";\
docker run -d --name ch-host$i \
-h ch-host$i \
--ulimit nofile=262144:262144 \
--volume=/Users/momo/studySpace/docker/clickhouse/ch2-volume$i/data:/var/lib/clickhouse \
--volume=/Users/momo/studySpace/docker/clickhouse/ch2-volume$i/logs:/var/log/clickhouse-server \
--volume=/Users/momo/studySpace/docker/clickhouse/ch2-volume$i/config/clickhouse-server:/etc/clickhouse-server \
--link zk-standalone:zookeeper \
--network clickhouse \
--platform linux/amd64 \
clickhouse/clickhouse-server; \
done 

# 停止 
for ((i=1;i<=3;i++));do echo "stop ch-host$i"; docker stop ch-host$i;done

# 重啟
for ((i=1;i<=3;i++));do echo "restart ch-host$i"; docker restart ch-host$i;done

# 刪除
for ((i=1;i<=3;i++));do echo "rm ch-host$i"; docker stop ch-host$i && docker rm ch-host$i;done

# 清空掛載目錄
for ((i=1;i<=3;i++));do echo "clear ch-host$i"; rm -rf /Users/momo/studySpace/docker/clickhouse/ch2-volume$i/{data,logs}/*;done

3.3.3 啟動客戶端

3.3.3.1 啟動連接指定clickhouse server的客戶端

# 連接ch-host1
docker run -it --rm --network clickhouse --link ch-host1:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

# 連接ch-host2
docker run -it --rm --network clickhouse --link ch-host2:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

# 連接ch-host3
docker run -it --rm --network clickhouse --link ch-host3:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server

檢查配置
SELECT * FROM system.macros m ;
SELECT * FROM system.clusters c WHERE cluster = 'jiazz_3s_2r';

3.3.3.2 創建database

每個分片有各自的database,如shard01屬於dwh01

--ch-host1
CREATE DATABASE IF NOT EXISTS dwh
CREATE DATABASE IF NOT EXISTS dwh01
CREATE DATABASE IF NOT EXISTS dwh03

--ch-host2
CREATE DATABASE IF NOT EXISTS dwh
CREATE DATABASE IF NOT EXISTS dwh02
CREATE DATABASE IF NOT EXISTS dwh01

--ch-host3
CREATE DATABASE IF NOT EXISTS dwh
CREATE DATABASE IF NOT EXISTS dwh03
CREATE DATABASE IF NOT EXISTS dwh02

show databases

3.3.3.3 創建本地表

創建表模板,根據每個節點,修改database名稱即可,一個節點創建兩個表

CREATE TABLE dwh03.hits_shard
(
 `WatchID` UInt64,
 `JavaEnable` UInt8,
 `Title` String,
 `GoodEvent` Int16,
 `EventTime` DateTime,
 `EventDate` Date,
 `CounterID` UInt32,
 `ClientIP` UInt32,
 `ClientIP6` FixedString(16),
 `RegionID` UInt32,
 `UserID` UInt64,
 `CounterClass` Int8,
 `OS` UInt8,
 `UserAgent` UInt8,
 `URL` String,
 `Referer` String,
 `URLDomain` String,
 `RefererDomain` String,
 `Refresh` UInt8,
 `IsRobot` UInt8,
 `RefererCategories` Array(UInt16),
 `URLCategories` Array(UInt16),
 `URLRegions` Array(UInt32),
 `RefererRegions` Array(UInt32),
 `ResolutionWidth` UInt16,
 `ResolutionHeight` UInt16,
 `ResolutionDepth` UInt8,
 `FlashMajor` UInt8,
 `FlashMinor` UInt8,
 `FlashMinor2` String,
 `NetMajor` UInt8,
 `NetMinor` UInt8,
 `UserAgentMajor` UInt16,
 `UserAgentMinor` FixedString(2),
 `CookieEnable` UInt8,
 `JavascriptEnable` UInt8,
 `IsMobile` UInt8,
 `MobilePhone` UInt8,
 `MobilePhoneModel` String,
 `Params` String,
 `IPNetworkID` UInt32,
 `TraficSourceID` Int8,
 `SearchEngineID` UInt16,
 `SearchPhrase` String,
 `AdvEngineID` UInt8,
 `IsArtifical` UInt8,
 `WindowClientWidth` UInt16,
 `WindowClientHeight` UInt16,
 `ClientTimeZone` Int16,
 `ClientEventTime` DateTime,
 `SilverlightVersion1` UInt8,
 `SilverlightVersion2` UInt8,
 `SilverlightVersion3` UInt32,
 `SilverlightVersion4` UInt16,
 `PageCharset` String,
 `CodeVersion` UInt32,
 `IsLink` UInt8,
 `IsDownload` UInt8,
 `IsNotBounce` UInt8,
 `FUniqID` UInt64,
 `HID` UInt32,
 `IsOldCounter` UInt8,
 `IsEvent` UInt8,
 `IsParameter` UInt8,
 `DontCountHits` UInt8,
 `WithHash` UInt8,
 `HitColor` FixedString(1),
 `UTCEventTime` DateTime,
 `Age` UInt8,
 `Sex` UInt8,
 `Income` UInt8,
 `Interests` UInt16,
 `Robotness` UInt8,
 `GeneralInterests` Array(UInt16),
 `RemoteIP` UInt32,
 `RemoteIP6` FixedString(16),
 `WindowName` Int32,
 `OpenerName` Int32,
 `HistoryLength` Int16,
 `BrowserLanguage` FixedString(2),
 `BrowserCountry` FixedString(2),
 `SocialNetwork` String,
 `SocialAction` String,
 `HTTPError` UInt16,
 `SendTiming` Int32,
 `DNSTiming` Int32,
 `ConnectTiming` Int32,
 `ResponseStartTiming` Int32,
 `ResponseEndTiming` Int32,
 `FetchTiming` Int32,
 `RedirectTiming` Int32,
 `DOMInteractiveTiming` Int32,
 `DOMContentLoadedTiming` Int32,
 `DOMCompleteTiming` Int32,
 `LoadEventStartTiming` Int32,
 `LoadEventEndTiming` Int32,
 `NSToDOMContentLoadedTiming` Int32,
 `FirstPaintTiming` Int32,
 `RedirectCount` Int8,
 `SocialSourceNetworkID` UInt8,
 `SocialSourcePage` String,
 `ParamPrice` Int64,
 `ParamOrderID` String,
 `ParamCurrency` FixedString(3),
 `ParamCurrencyID` UInt16,
 `GoalsReached` Array(UInt32),
 `OpenstatServiceName` String,
 `OpenstatCampaignID` String,
 `OpenstatAdID` String,
 `OpenstatSourceID` String,
 `UTMSource` String,
 `UTMMedium` String,
 `UTMCampaign` String,
 `UTMContent` String,
 `UTMTerm` String,
 `FromTag` String,
 `HasGCLID` UInt8,
 `RefererHash` UInt64,
 `URLHash` UInt64,
 `CLID` UInt32,
 `YCLID` UInt64,
 `ShareService` String,
 `ShareURL` String,
 `ShareTitle` String,
 `ParsedParams` Nested(
 Key1 String,
 Key2 String,
 Key3 String,
 Key4 String,
 Key5 String,
 ValueDouble Float64),
 `IslandID` FixedString(16),
 `RequestNum` UInt32,
 `RequestTry` UInt8
)
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard01}/tables/hits', '{replica01}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID);



CREATE TABLE dwh02.hits_shard
(
 `WatchID` UInt64,
 `JavaEnable` UInt8,
 `Title` String,
 `GoodEvent` Int16,
 `EventTime` DateTime,
 `EventDate` Date,
 `CounterID` UInt32,
 `ClientIP` UInt32,
 `ClientIP6` FixedString(16),
 `RegionID` UInt32,
 `UserID` UInt64,
 `CounterClass` Int8,
 `OS` UInt8,
 `UserAgent` UInt8,
 `URL` String,
 `Referer` String,
 `URLDomain` String,
 `RefererDomain` String,
 `Refresh` UInt8,
 `IsRobot` UInt8,
 `RefererCategories` Array(UInt16),
 `URLCategories` Array(UInt16),
 `URLRegions` Array(UInt32),
 `RefererRegions` Array(UInt32),
 `ResolutionWidth` UInt16,
 `ResolutionHeight` UInt16,
 `ResolutionDepth` UInt8,
 `FlashMajor` UInt8,
 `FlashMinor` UInt8,
 `FlashMinor2` String,
 `NetMajor` UInt8,
 `NetMinor` UInt8,
 `UserAgentMajor` UInt16,
 `UserAgentMinor` FixedString(2),
 `CookieEnable` UInt8,
 `JavascriptEnable` UInt8,
 `IsMobile` UInt8,
 `MobilePhone` UInt8,
 `MobilePhoneModel` String,
 `Params` String,
 `IPNetworkID` UInt32,
 `TraficSourceID` Int8,
 `SearchEngineID` UInt16,
 `SearchPhrase` String,
 `AdvEngineID` UInt8,
 `IsArtifical` UInt8,
 `WindowClientWidth` UInt16,
 `WindowClientHeight` UInt16,
 `ClientTimeZone` Int16,
 `ClientEventTime` DateTime,
 `SilverlightVersion1` UInt8,
 `SilverlightVersion2` UInt8,
 `SilverlightVersion3` UInt32,
 `SilverlightVersion4` UInt16,
 `PageCharset` String,
 `CodeVersion` UInt32,
 `IsLink` UInt8,
 `IsDownload` UInt8,
 `IsNotBounce` UInt8,
 `FUniqID` UInt64,
 `HID` UInt32,
 `IsOldCounter` UInt8,
 `IsEvent` UInt8,
 `IsParameter` UInt8,
 `DontCountHits` UInt8,
 `WithHash` UInt8,
 `HitColor` FixedString(1),
 `UTCEventTime` DateTime,
 `Age` UInt8,
 `Sex` UInt8,
 `Income` UInt8,
 `Interests` UInt16,
 `Robotness` UInt8,
 `GeneralInterests` Array(UInt16),
 `RemoteIP` UInt32,
 `RemoteIP6` FixedString(16),
 `WindowName` Int32,
 `OpenerName` Int32,
 `HistoryLength` Int16,
 `BrowserLanguage` FixedString(2),
 `BrowserCountry` FixedString(2),
 `SocialNetwork` String,
 `SocialAction` String,
 `HTTPError` UInt16,
 `SendTiming` Int32,
 `DNSTiming` Int32,
 `ConnectTiming` Int32,
 `ResponseStartTiming` Int32,
 `ResponseEndTiming` Int32,
 `FetchTiming` Int32,
 `RedirectTiming` Int32,
 `DOMInteractiveTiming` Int32,
 `DOMContentLoadedTiming` Int32,
 `DOMCompleteTiming` Int32,
 `LoadEventStartTiming` Int32,
 `LoadEventEndTiming` Int32,
 `NSToDOMContentLoadedTiming` Int32,
 `FirstPaintTiming` Int32,
 `RedirectCount` Int8,
 `SocialSourceNetworkID` UInt8,
 `SocialSourcePage` String,
 `ParamPrice` Int64,
 `ParamOrderID` String,
 `ParamCurrency` FixedString(3),
 `ParamCurrencyID` UInt16,
 `GoalsReached` Array(UInt32),
 `OpenstatServiceName` String,
 `OpenstatCampaignID` String,
 `OpenstatAdID` String,
 `OpenstatSourceID` String,
 `UTMSource` String,
 `UTMMedium` String,
 `UTMCampaign` String,
 `UTMContent` String,
 `UTMTerm` String,
 `FromTag` String,
 `HasGCLID` UInt8,
 `RefererHash` UInt64,
 `URLHash` UInt64,
 `CLID` UInt32,
 `YCLID` UInt64,
 `ShareService` String,
 `ShareURL` String,
 `ShareTitle` String,
 `ParsedParams` Nested(
 Key1 String,
 Key2 String,
 Key3 String,
 Key4 String,
 Key5 String,
 ValueDouble Float64),
 `IslandID` FixedString(16),
 `RequestNum` UInt32,
 `RequestTry` UInt8
)
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard02}/tables/hits', '{replica02}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID);

ch-host1
CREATE TABLE dwh01.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard01}/tables/hits', '{replica01}')
<!----- skipped, copy detail from previous ----->

CREATE TABLE dwh03.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard02}/tables/hits', '{replica02}')
<!----- skipped, copy detail from previous ----->
ch-host2
CREATE TABLE dwh02.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard01}/tables/hits', '{replica01}')
<!----- skipped, copy detail from previous ----->

CREATE TABLE dwh01.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard02}/tables/hits', '{replica02}')
<!----- skipped, copy detail from previous ----->
ch-host3
CREATE TABLE dwh03.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard01}/tables/hits', '{replica01}')
<!----- skipped, copy detail from previous ----->

CREATE TABLE dwh02.hits_shard
<!----- skipped, copy detail from previous ----->
ENGINE=ReplicatedMergeTree('/clickhouse/{cluster01}/{shard02}/tables/hits', '{replica02}')
<!----- skipped, copy detail from previous ----->
檢查表配置是否正確
ch-host1 :) SELECT * FROM system.replicas r ;


SELECT *
FROM system.replicas AS r

Query id: b1549b41-67c9-44fe-8bfd-ea05e8205992

┌─database─┬─table──────┬─engine──────────────┬─is_leader─┬─can_become_leader─┬─is_readonly─┬─is_session_expired─┬─future_parts─┬─parts_to_check─┬─zookeeper_path─────────────────────────────────┬─replica_name─┬─replica_path──────────────────────────────────────────────────────┬─columns_version─┬─queue_size─┬─inserts_in_queue─┬─merges_in_queue─┬─part_mutations_in_queue─┬───queue_oldest_time─┬─inserts_oldest_time─┬──merges_oldest_time─┬─part_mutations_oldest_time─┬─oldest_part_to_get─┬─oldest_part_to_merge_to─┬─oldest_part_to_mutate_to─┬─log_max_index─┬─log_pointer─┬───last_queue_update─┬─absolute_delay─┬─total_replicas─┬─active_replicas─┬─last_queue_update_exception─┬─zookeeper_exception─┬─replica_is_active─────────────┐
│ dwh01    │ hits_shard │ ReplicatedMergeTree │         1 │                 1 │           0 │                  0 │            0 │              0 │ /clickhouse/jiazz_cluster_3s_2r/s1/tables/hits │ ch1_s1_r1    │ /clickhouse/jiazz_cluster_3s_2r/s1/tables/hits/replicas/ch1_s1_r1 │              -1 │          0 │                0 │               0 │                       0 │ 1970-01-01 00:00:00 │ 1970-01-01 00:00:00 │ 1970-01-01 00:00:00 │        1970-01-01 00:00:00 │                    │                         │                          │             0 │           0 │ 1970-01-01 00:00:00 │              0 │              2 │               2 │                             │                     │ {'ch2_s1_r2':1,'ch1_s1_r1':1} │
│ dwh03    │ hits_shard │ ReplicatedMergeTree │         1 │                 1 │           0 │                  0 │            0 │              0 │ /clickhouse/jiazz_cluster_3s_2r/s3/tables/hits │ ch1_s3_r1    │ /clickhouse/jiazz_cluster_3s_2r/s3/tables/hits/replicas/ch1_s3_r1 │              -1 │          0 │                0 │               0 │                       0 │ 1970-01-01 00:00:00 │ 1970-01-01 00:00:00 │ 1970-01-01 00:00:00 │        1970-01-01 00:00:00 │                    │                         │                          │             0 │           0 │ 1970-01-01 00:00:00 │              0 │              2 │               2 │                             │                     │ {'ch1_s3_r1':1,'ch3_s3_r1':1} │
└──────────┴────────────┴─────────────────────┴───────────┴───────────────────┴─────────────┴────────────────────┴──────────────┴────────────────┴────────────────────────────────────────────────┴──────────────┴───────────────────────────────────────────────────────────────────┴─────────────────┴────────────┴──────────────────┴─────────────────┴─────────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────────────────────┴────────────────────┴─────────────────────────┴──────────────────────────┴───────────────┴─────────────┴─────────────────────┴────────────────┴────────────────┴─────────────────┴─────────────────────────────┴─────────────────────┴───────────────────────────────┘

2 rows in set. Elapsed: 0.036 sec.

# 只查重點信息,觀查對應每個節點是否正常
SELECT database, table, zookeeper_path, replica_name,  replica_path, total_replicas, active_replicas, replica_is_active FROM system.replicas r ;

ch-host1

ch-host2

ch-host3

3.3.3.4 創建分布式表

注意:數據庫參數是空字符串“”,它反過來使用每個分片的默認數據庫來查找正確的數據庫(例如dwh01、dwh02、dwh03)。
在正常配置(而不是這種循環復制)中,分片本地數據表和分布式表位於同一數據庫上。

-- execute the follow DDL on ALL 3 nodes (ch-host1, ch-host2, ch-host3)

CREATE TABLE dwh.hits_distributed
(
 `WatchID` UInt64,
 `JavaEnable` UInt8,
 `Title` String,
 `GoodEvent` Int16,
 `EventTime` DateTime,
 `EventDate` Date,
 `CounterID` UInt32,
 `ClientIP` UInt32,
 `ClientIP6` FixedString(16),
 `RegionID` UInt32,
 `UserID` UInt64,
 `CounterClass` Int8,
 `OS` UInt8,
 `UserAgent` UInt8,
 `URL` String,
 `Referer` String,
 `URLDomain` String,
 `RefererDomain` String,
 `Refresh` UInt8,
 `IsRobot` UInt8,
 `RefererCategories` Array(UInt16),
 `URLCategories` Array(UInt16),
 `URLRegions` Array(UInt32),
 `RefererRegions` Array(UInt32),
 `ResolutionWidth` UInt16,
 `ResolutionHeight` UInt16,
 `ResolutionDepth` UInt8,
 `FlashMajor` UInt8,
 `FlashMinor` UInt8,
 `FlashMinor2` String,
 `NetMajor` UInt8,
 `NetMinor` UInt8,
 `UserAgentMajor` UInt16,
 `UserAgentMinor` FixedString(2),
 `CookieEnable` UInt8,
 `JavascriptEnable` UInt8,
 `IsMobile` UInt8,
 `MobilePhone` UInt8,
 `MobilePhoneModel` String,
 `Params` String,
 `IPNetworkID` UInt32,
 `TraficSourceID` Int8,
 `SearchEngineID` UInt16,
 `SearchPhrase` String,
 `AdvEngineID` UInt8,
 `IsArtifical` UInt8,
 `WindowClientWidth` UInt16,
 `WindowClientHeight` UInt16,
 `ClientTimeZone` Int16,
 `ClientEventTime` DateTime,
 `SilverlightVersion1` UInt8,
 `SilverlightVersion2` UInt8,
 `SilverlightVersion3` UInt32,
 `SilverlightVersion4` UInt16,
 `PageCharset` String,
 `CodeVersion` UInt32,
 `IsLink` UInt8,
 `IsDownload` UInt8,
 `IsNotBounce` UInt8,
 `FUniqID` UInt64,
 `HID` UInt32,
 `IsOldCounter` UInt8,
 `IsEvent` UInt8,
 `IsParameter` UInt8,
 `DontCountHits` UInt8,
 `WithHash` UInt8,
 `HitColor` FixedString(1),
 `UTCEventTime` DateTime,
 `Age` UInt8,
 `Sex` UInt8,
 `Income` UInt8,
 `Interests` UInt16,
 `Robotness` UInt8,
 `GeneralInterests` Array(UInt16),
 `RemoteIP` UInt32,
 `RemoteIP6` FixedString(16),
 `WindowName` Int32,
 `OpenerName` Int32,
 `HistoryLength` Int16,
 `BrowserLanguage` FixedString(2),
 `BrowserCountry` FixedString(2),
 `SocialNetwork` String,
 `SocialAction` String,
 `HTTPError` UInt16,
 `SendTiming` Int32,
 `DNSTiming` Int32,
 `ConnectTiming` Int32,
 `ResponseStartTiming` Int32,
 `ResponseEndTiming` Int32,
 `FetchTiming` Int32,
 `RedirectTiming` Int32,
 `DOMInteractiveTiming` Int32,
 `DOMContentLoadedTiming` Int32,
 `DOMCompleteTiming` Int32,
 `LoadEventStartTiming` Int32,
 `LoadEventEndTiming` Int32,
 `NSToDOMContentLoadedTiming` Int32,
  `FirstPaintTiming` Int32,
 `RedirectCount` Int8,
 `SocialSourceNetworkID` UInt8,
 `SocialSourcePage` String,
 `ParamPrice` Int64,
 `ParamOrderID` String,
 `ParamCurrency` FixedString(3),
 `ParamCurrencyID` UInt16,
 `GoalsReached` Array(UInt32),
 `OpenstatServiceName` String,
 `OpenstatCampaignID` String,
 `OpenstatAdID` String,
 `OpenstatSourceID` String,
 `UTMSource` String,
 `UTMMedium` String,
 `UTMCampaign` String,
 `UTMContent` String,
 `UTMTerm` String,
 `FromTag` String,
 `HasGCLID` UInt8,
 `RefererHash` UInt64,
 `URLHash` UInt64,
 `CLID` UInt32,
 `YCLID` UInt64,
 `ShareService` String,
 `ShareURL` String,
 `ShareTitle` String,
 `ParsedParams.Key1` Array(String),
 `ParsedParams.Key2` Array(String),
 `ParsedParams.Key3` Array(String),
 `ParsedParams.Key4` Array(String),
 `ParsedParams.Key5` Array(String),
 `ParsedParams.ValueDouble` Array(Float64),
 `IslandID` FixedString(16),
 `RequestNum` UInt32,
 `RequestTry` UInt8
)
ENGINE = Distributed('jiazz_cluster_3s_2r', '', 'hits_shard', rand());

3.3.3.5 加載數據

數據集 https://clickhouse.com/docs/en/getting-started/tutorial/

  • rand()為sharding key,數據將會隨機分配到每個分片
  • 由於internal_replication=true,一旦數據寫入每個分片的第一個副本,ClickHouse系統將自動將數據復制到第二個副本中
# 在里選擇ch-host1的掛載目錄logs
cd ~/studySpace/docker/clickhouse/ch2-volume1/logs
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv

docker exec -it ch-host1 /bin/bash
root@ch-host1:/#cd /var/log/clickhouse-server
# 數據文件比較大,導入時間要長
root@ch-host1:/var/log/clickhouse-server# clickhouse-client --query "INSERT INTO dwh.hits_distributed FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv

查數據

ch-host1 :) select * from dwh.hits_distributed limit 1;

SELECT *
FROM dwh.hits_distributed
LIMIT 1

Query id: 126b3803-659e-48fb-a12f-9e0a261de1ca

┌─────────────WatchID─┬─JavaEnable─┬─Title───────────────────────────────────────────────────────────┬─GoodEvent─┬───────────EventTime─┬──EventDate─┬─CounterID─┬───ClientIP─┬─ClientIP6─┬─RegionID─┬─────────────UserID─┬─CounterClass─┬─OS─┬─UserAgent─┬─URL────────────────────────────────────────────────────────────┬─Referer────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─URLDomain─────────┬─RefererDomain───────┬─Refresh─┬─IsRobot─┬─RefererCategories─┬─URLCategories────┬─URLRegions─┬─RefererRegions─┬─ResolutionWidth─┬─ResolutionHeight─┬─ResolutionDepth─┬─FlashMajor─┬─FlashMinor─┬─FlashMinor2─┬─NetMajor─┬─NetMinor─┬─UserAgentMajor─┬─UserAgentMinor─┬─CookieEnable─┬─JavascriptEnable─┬─IsMobile─┬─MobilePhone─┬─MobilePhoneModel─┬─Params─┬─IPNetworkID─┬─TraficSourceID─┬─SearchEngineID─┬─SearchPhrase─┬─AdvEngineID─┬─IsArtifical─┬─WindowClientWidth─┬─WindowClientHeight─┬─ClientTimeZone─┬─────ClientEventTime─┬─SilverlightVersion1─┬─SilverlightVersion2─┬─SilverlightVersion3─┬─SilverlightVersion4─┬─PageCharset─┬─CodeVersion─┬─IsLink─┬─IsDownload─┬─IsNotBounce─┬─────────────FUniqID─┬───────HID─┬─IsOldCounter─┬─IsEvent─┬─IsParameter─┬─DontCountHits─┬─WithHash─┬─HitColor─┬────────UTCEventTime─┬─Age─┬─Sex─┬─Income─┬─Interests─┬─Robotness─┬─GeneralInterests───────────────────────┬───RemoteIP─┬─RemoteIP6─┬─WindowName─┬─OpenerName─┬─HistoryLength─┬─BrowserLanguage─┬─BrowserCountry─┬─SocialNetwork─┬─SocialAction─┬─HTTPError─┬─SendTiming─┬─DNSTiming─┬─ConnectTiming─┬─ResponseStartTiming─┬─ResponseEndTiming─┬─FetchTiming─┬─RedirectTiming─┬─DOMInteractiveTiming─┬─DOMContentLoadedTiming─┬─DOMCompleteTiming─┬─LoadEventStartTiming─┬─LoadEventEndTiming─┬─NSToDOMContentLoadedTiming─┬─FirstPaintTiming─┬─RedirectCount─┬─SocialSourceNetworkID─┬─SocialSourcePage─┬─ParamPrice─┬─ParamOrderID─┬─ParamCurrency─┬─ParamCurrencyID─┬─GoalsReached─┬─OpenstatServiceName─┬─OpenstatCampaignID─┬─OpenstatAdID─┬─OpenstatSourceID─┬─UTMSource─┬─UTMMedium─┬─UTMCampaign─┬─UTMContent─┬─UTMTerm─┬─FromTag─┬─HasGCLID─┬─────────RefererHash─┬────────────URLHash─┬─CLID─┬─YCLID─┬─ShareService─┬─ShareURL─┬─ShareTitle─┬─ParsedParams.Key1─┬─ParsedParams.Key2─┬─ParsedParams.Key3─┬─ParsedParams.Key4─┬─ParsedParams.Key5─┬─ParsedParams.ValueDouble─┬─IslandID─┬─RequestNum─┬─RequestTry─┐
│ 4944118417295196513 │          1 │ вышивка в Москве - Образовать фото мочия - Почта Mail.Ru: Силва │         1 │ 2014-03-17 13:19:26 │ 2014-03-17 │        57 │ 1581336367 │ ��:�[�Uc��m���     │       54 │ 610708775678702928 │            0 │  1 │         3 │ http://hurpasspx?EntityType=images.jpg,http:%2F%2Fweb%2Fimages │ http://iconnId=140Z1BmWE9JVEdoQ2Zud01aa0f8b72a2cb141ad2fbb6bc0488a293f1c0b7bbfe6e0921396325_7860c5b30e0216&mb_url=http://svit/realesta.com/?do=news.ru/yandex.php?thread%2FEl0TTQQZIClEGYFcJG1F4XldSeWtvVkFkf38xIAEsQVgWCVtUe15_d34cemhbU0dIfji-RM │ hurpass.uaprod.hu │ incityadspix.com.vn │       0 │       0 │ [6,98,456,8586]   │ [5,92,469,13425] │ [348,1010] │ [267,694]      │            1846 │              952 │              29 │          8 │          0 │ 0.          │        0 │        0 │             24 │ s�              │            1 │                1 │        0 │           0 │                  │        │     3223479 │             -1 │              0 │              │           0 │           1 │              1246 │                906 │            157 │ 2014-03-17 16:32:09 │                   5 │                   1 │               19832 │                   0 │ utf-8       │         291 │      0 │          0 │           0 │ 5970711023083247428 │ 117479153 │            0 │       0 │           0 │             1 │        0 │ E        │ 2014-03-17 13:03:08 │  55 │   1 │      2 │     16014 │         8 │ [5,2,14,925,4,3993,72,6,9,27,1,3,1019] │ 2024189568 │ ���_Y����5Ӵ.    │       8687 │         -1 │             1 │ nD              │ Tp             │               │              │         0 │         -1 │        -1 │            -1 │                  -1 │                -1 │          -1 │             -1 │                   -1 │                     -1 │                -1 │                   -1 │                 -1 │                       7557 │               -1 │            -1 │                     0 │                  │          0 │              │ �              │               0 │ [834016]     │                     │                    │              │                  │           │           │             │            │         │         │        0 │ 7613536979754271517 │ 966296541083783832 │    0 │     0 │              │          │            │ []                │ []                │ []                │ []                │ []                │ []                       │ ���+������bKQ9    │       1324 │          1 │
└─────────────────────┴────────────┴─────────────────────────────────────────────────────────────────┴───────────┴─────────────────────┴────────────┴───────────┴────────────┴───────────┴──────────┴────────────────────┴──────────────┴────┴───────────┴────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴─────────────────────┴─────────┴─────────┴───────────────────┴──────────────────┴────────────┴────────────────┴─────────────────┴──────────────────┴─────────────────┴────────────┴────────────┴─────────────┴──────────┴──────────┴────────────────┴────────────────┴──────────────┴──────────────────┴──────────┴─────────────┴──────────────────┴────────┴─────────────┴────────────────┴────────────────┴──────────────┴─────────────┴─────────────┴───────────────────┴────────────────────┴────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────┴─────────────┴────────┴────────────┴─────────────┴─────────────────────┴───────────┴──────────────┴─────────┴─────────────┴───────────────┴──────────┴──────────┴─────────────────────┴─────┴─────┴────────┴───────────┴───────────┴────────────────────────────────────────┴────────────┴───────────┴────────────┴────────────┴───────────────┴─────────────────┴────────────────┴───────────────┴──────────────┴───────────┴────────────┴───────────┴───────────────┴─────────────────────┴───────────────────┴─────────────┴────────────────┴──────────────────────┴────────────────────────┴───────────────────┴──────────────────────┴────────────────────┴────────────────────────────┴──────────────────┴───────────────┴───────────────────────┴──────────────────┴────────────┴──────────────┴───────────────┴─────────────────┴──────────────┴─────────────────────┴────────────────────┴──────────────┴──────────────────┴───────────┴───────────┴─────────────┴────────────┴─────────┴─────────┴──────────┴─────────────────────┴────────────────────┴──────┴───────┴──────────────┴──────────┴────────────┴───────────────────┴───────────────────┴───────────────────┴───────────────────┴───────────────────┴──────────────────────────┴──────────┴────────────┴────────────┘

1 rows in set. Elapsed: 28.719 sec.

如果以上有錯誤就刪除數據庫並重新開始

DROP DATABASE IF EXISTS dwh
DROP DATABASE IF EXISTS dwh01
DROP DATABASE IF EXISTS dwh02
DROP DATABASE IF EXISTS dwh03


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM