主要分為兩大步驟:1、zookeeper安裝;2、ck集群配置
===========================一、zookeeper安裝========================
1、安裝包下載
https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz
2、解壓並配置
a) tar -zxvf apache-zookeeper-3.6.1-bin.tar.gz -C /usr/java b)cp zoo_sample.cfg zoo.cfg c)vim zoo.cfg ,添加以下三個並修改dataDir目錄 dataDir=/usr/java/zookeeper36/data server.1=hadoop101:2888:3888 server.2=hadoop102:2888:3888 server.3=hadoop103:2888:3888 d)每台機器都要創建並修改myid文件,具體的值對應與server.x echo 1 > dataDir=/usr/java/zookeeper36/data/myid e) scp -r zookeeper36 hadoop102:/usr/java

3、每台機器運行zk
a)每台機器運行 ./bin/zkServer.sh start b)查看每台機器的節點狀態 ./bin/zkServer.sh status


如圖顯示,則表示成功安裝!!!!
===========================二、clickhouse安裝========================
1、首先單機安裝,具體安裝看下面鏈接
2、修改配置文件/etc/clickhouse-server/config.xml ;
listen_host 表示能監聽的主機,:: 表示任意主機都可以訪問 <listen_host>::</listen_host>
3、同步修改的配置文件
scp /etc/clickhouse-server/config.xml hadoop102:/etc/clickhouse-server/
scp /etc/clickhouse-server/config.xml hadoop103:/etc/clickhouse-server/
4、添加配置文件vim /etc/metrika.xml ,具體的主機名和端口按照自己的去修改。
注意:9000端口為/etc/clickhouse-server/config.xml文件中的tcp_port端口
<yandex> <clickhouse_remote_servers> <!-- 3分片1副本 --> <nx_clickhouse_3shards_1replicas> <shard> <!-- 數據自動同步 --> <internal_replication>true</internal_replication> <replica> <host>hadoop101</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>hadoop102</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>hadoop103</host> <port>9000</port> </replica> </shard> </nx_clickhouse_3shards_1replicas> </clickhouse_remote_servers> <!-- zookeeper 自動同步 --> <zookeeper-servers> <node index="1"> <host>hadoop101</host> <port>2181</port> </node> <node index="2"> <host>hadoop102</host> <port>2181</port> </node> <node index="3"> <host>hadoop103</host> <port>2181</port> </node> </zookeeper-servers> <!-- 配置文件中macros若省略,則建復制表時每個分片需指定zookeeper路徑及副本名稱,同一分片 上路徑相同,副本名稱不同;若不省略需每個分片不同配置 --> <macros> <replica>hadoop102</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 配置壓縮 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
5、同步添加的配置文件
scp /etc/metrika.xml hadoop102:/etc/
scp /etc/metrika.xml hadoop103:/etc/
6、分別啟動這三台clickhouse-service
service clickhouse-server start
7、進入客戶端,查看集群配置
1、clickhouse-client -m
2、select * from system.clusters;

如圖所示,表示成功,也可查看其他機器的配置
8、查看錯誤日志
/var/log/clickhouse-server
===========================三、clickhouse集群測試========================
1、在三個節點分別創建本地表cluster3s1r_local
CREATE TABLE default.cluster3s1r_local
(
`id` Int32,
`website` String,
`wechat` String,
`FlightDate` Date,
Year UInt16
)
ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
2、在第一節點創建分布式表
CREATE TABLE default.cluster3s1r_all AS cluster3s1r_local
ENGINE = Distributed(nx_clickhouse_3shards_1replicas, default, cluster3s1r_local, rand());

3、往分布式表cluster3s1r_all中插入數據,cluster3s1r_all 會隨機插入到三個節點的本地表(cluster3s1r_local)中
INSERT INTO default.cluster3s1r_all (id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干貨','2020-11-28',2020);
INSERT INTO default.cluster3s1r_all (id,website,wechat,FlightDate,Year)values(2,'http://www.merryyou.cn/','javaganhuo','2020-11-28',2020);
INSERT INTO default.cluster3s1r_all (id,website,wechat,FlightDate,Year)values(3,'http://www.xxxxx.cn/','xxxxx','2020-11-28',2020);
4、可以查看三個節點本地表的數據和分布式表的數據
# 查詢總量查分布式表
select * from cluster3s1r_all;
#各機器節點的本地表
select * from cluster3s1r_local;



FAQ
如果出現下面的錯誤:
2021.01.26 11:00:38.392034 [ 6633 ] {} <Trace> Application: The memory map of clickhouse executable has been mlock'ed
2021.01.26 11:00:38.392351 [ 6633 ] {} <Error> Application: DB::Exception: Effective user of the process (root) does not match the owner of the data (clickhouse). Run under 'sudo -u clickhouse'.
2021.01.26 11:00:38.392383 [ 6633 ] {} <Information> Application: shutting down
2021.01.26 11:00:38.392389 [ 6633 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2021.01.26 11:00:38.462977 [ 6636 ] {} <Trace> BaseDaemon: Received signal -2
2021.01.26 11:00:38.463026 [ 6636 ] {} <Information> BaseDaemon: Stop SignalListener thread
2021.01.26 11:02:00.469399 [ 6777 ] {} <Information> SentryWriter: Sending crash reports is disabled
2021.01.26 11:02:00.470907 [ 6777 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB
2021.01.26 11:02:00.509282 [ 6777 ] {} <Information> : Starting ClickHouse 20.8.3.18 with revision 54438, no build id, PID 6777
2021.01.26 11:02:00.509359 [ 6777 ] {} <Information> Application: starting up
2021.01.26 11:02:00.512996 [ 6777 ] {} <Trace> Application: Will mlockall to prevent executable memory from being paged out. It may take a few seconds.
2021.01.26 11:02:00.633075 [ 6777 ] {} <Trace> Application: The memory map of clickhouse executable has been mlock'ed
2021.01.26 11:02:00.633349 [ 6777 ] {} <Error> Application: DB::Exception: Effective user of the process (root) does not match the owner of the data (clickhouse). Run under 'sudo -u clickhouse'.
2021.01.26 11:02:00.633365 [ 6777 ] {} <Information> Application: shutting down
2021.01.26 11:02:00.633368 [ 6777 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2021.01.26 11:02:00.682722 [ 6780 ] {} <Trace> BaseDaemon: Received signal -2
2021.01.26 11:02:00.682755 [ 6780 ] {} <Information> BaseDaemon: Stop SignalListener thread
解決方法:
1、暴力方法:
卸載這台機器clickhouse,重新安裝、重新配置該機器的集群設置
具體卸載方法:
可以看 https://www.cnblogs.com/ywjfx/p/14305405.html
2、切換成clickhouse用戶啟動(我測試過,但是不奏效,所以才使用了暴力方法)
sudo -u clickhouse clickhouse-server --config-file=/etc/clickhouse-server/config.xml
