一、集群說明
zookeeper 需要單獨部署在其他機器上,以免clickhouse 節點掛掉,引起zookeeper 掛掉。
0. 高可用原理:zookeeper + ReplicatedMergeTree(復制表) + Distributed(分布式表)
1. 前提准備:所有節點防火牆關閉或者開放端口;所有節點建立互信(免密碼登錄);hosts表和主機名一定要集群保持一致正確配置,因為zookeeper返
回的是主機名,配置錯誤或不配置復制表時會失敗.
clickhouse測試節點3個:10.0.0.236 cdhserver1(clickhouse1), 10.0.0.237 cdhserver2 (clickhouse2),10.0.0.238 cdhserver3 (clickhouse3),10.0.0.239 cdhserver4
(clickhouse4)
zookeeper測試節點3個:10.0.0.237 cdhserver2 (zookeeper),10.0.0.238 cdhserver3 (zookeeper),10.0.0.239 cdhserver4 (zookeeper)
配置方案:4個節點點各配置兩個clickhouse實例,相互備份.
cdhserver1: 實例1, 端口: tcp 9006, http 8123, 同步端口9006, 類型: 主節點
cdhserver2: 實例2, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片1, 副本1
cdhserver3: 實例3, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片2, 副本1
cdhserver4: 實例4, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片3, 副本1
二、環境准備
1)主機配置(根據自身情況)
10.0.0.236 cdhserver1 centos 7.1 32G 200G
10.0.0.237 cdhserver2 centos 7.1 32G 200G
10.0.0.238 cdhserver3 centos 7.1 32G 200G
10.0.0.239 cdhserver4 centos 7.1 32G 200G
2)hosts表和主機名
3)所有節點防火牆關閉或者開放端口;
# 1.關閉防火牆
service iptables stop
chkconfig iptables off
chkconfig ip6tables off
# 2.關閉selinux
修改/etc/selinux/config中的SELINUX=disabled后重啟
[root@cdhserver1 ~]# vim /etc/selinux/config
SELINUX=disabled
4)優化所有節點服務器打開文件個數
在/etc/security/limits.conf、/etc/security/limits.d/90-nproc.conf這2個文件的末尾加入一下內容:
[root@cdhserver1 software]# vim /etc/security/limits.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
[root@cdhserver1 software]# vim /etc/security/limits.d/90-nproc.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
重啟服務器it -n 或者ulimit -a查看設置結果
[root@cdhserver1 ~]# ulimit -n
65536
5)所有節點建立互信(免密碼登錄);
1、在各節點通過ssh-keygen生成RSA密鑰和公鑰
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
2、將所有的公鑰文件匯總到一個總的授權key文件中,在cdhserver1機器執行如下命令,必須逐行執行:
ssh cdhserver1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver4 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3、設置授權key文件的權限,在M01機器執行如下命令:
chmod 600 ~/.ssh/authorized_keys
4、分發授權key文件到所有服務器,必須逐行執行:
scp ~/.ssh/authorized_keys cdhserver1:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver2:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver3:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver4:~/.ssh/
6)檢查系統是否支持SSE 4.2
[root@cdhserver1 ~]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
三、分布式集群安裝
1、安裝
1)rpm方式安裝
說明:
1) 安裝資源獲取:
https://packagecloud.io/Altinity/clickhouse。
https://repo.yandex.ru/clickhouse/rpm/
2)在所有節點安裝clickhouse(centos 7.1 為例)
3) 分別在個服務器上進行如下安裝和配置
安裝clickhouse:
安裝 libicu
mkdir -p /usr/local/icu/
cd /usr/local/icu/libicu-4.2.1-14.el6.x86_64.rpm
rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm
安裝clickhouse
rpm -ivh clickhouse-server-common-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-compressor-1.1.54336-3.el6.x86_64.rpm
rpm -ivh clickhouse-common-static-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-test-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-debuginfo-18.14.12-1.el6.x86_64.rpm
或 rpm -ivh clickhouse-*-.el6.x86_64.rpm
2)yum安裝
# CentOS / RedHat
sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
sudo yum install clickhouse-server clickhouse-client
#使用腳本安裝yum源 curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash #yum 安裝 server 以及 client sudo yum install -y clickhouse-server clickhouse-client #查看是否安裝完成 sudo yum list installed 'clickhouse*'
3)升級
ClickHouse 的基礎上升級也是非常方便,直接下載新版本的 RPM 包,執行如下命令安裝升級(可以不用關閉 ClickHouse 服務),升級的過程中,原有的 config.xml 等配置均會被保留,也可以參考官方資料使用其它方式升級 ClickHouse。
# 查看當前版本 clickhouse-server --version # 升級。從安裝的過程我們也可以看到,新包中的配置以 .rpmnew 后綴,舊的配置文件保留 [root@cdh2 software]# rpm -Uvh clickhouse-*-20.5.4.40-1.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:clickhouse-server-common-20.5.4.4warning: /etc/clickhouse-server/config.xml created as /etc/clickhouse-server/config.xml.rpmnew ################################# [ 13%] warning: /etc/clickhouse-server/users.xml created as /etc/clickhouse-server/users.xml.rpmnew 2:clickhouse-common-static-20.5.4.4################################# [ 25%] 3:clickhouse-server-20.5.4.40-1.el7################################# [ 38%] Create user clickhouse.clickhouse with datadir /var/lib/clickhouse 4:clickhouse-client-20.5.4.40-1.el7################################# [ 50%] Create user clickhouse.clickhouse with datadir /var/lib/clickhouse Cleaning up / removing... 5:clickhouse-client-19.16.3.6-1.el7################################# [ 63%] 6:clickhouse-server-19.16.3.6-1.el7################################# [ 75%] 7:clickhouse-server-common-19.16.3.################################# [ 88%] 8:clickhouse-common-static-19.16.3.################################# [100%]
3、目錄結構
/etc/clickhouse-server:服務端的配置文件目錄,包括全局配置 config.xml 和用戶配置 users.xml
/var/lib/clickhouse:默認的數據存儲目錄,如果是生產環境可以將其修改到空間較大的磁盤掛載路徑。可以通過修改 /etc/clickhouse-server/config.xml 配置文件中 <path> 、<tmp_path> 和 <user_files_path> 標簽值來設置。
/var/log/clickhouse-server:默認的日志保存目錄。同樣可以通過修改 /etc/clickhouse-server/config.xml 配置文件中 <log> 和 <errorlog> 標簽值來設置。
/etc/cron.d/clickhouse-server:clickhouse server 的一個定時配置,用於恢復因異常中斷的 ClickHouse 服務進程。
~/.clickhouse-client-history:client 執行的 sql 歷史記錄。
4、配置文件修改
1)四個節點修改配置文件config.xml
因為集群之間需要互相方位其它節點的服務,需要開放ClickHouse服務的ip和端口,在四個節點機器上配置/etc/clickhouse-server/config.xml文件,在<yandex>標簽下釋放 <listen_host>標簽(大概在69、70行),配置如下:
<?xml version="1.0"?> <yandex> <logger> <level>trace</level> <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <size>1000M</size> <count>10</count> </logger> <http_port>8123</http_port> <tcp_port>9006</tcp_port> <openSSL> <server> <certificateFile>/etc/clickhouse-server/server.crt</certificateFile> <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile> <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile> <verificationMode>none</verificationMode> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> </server> <client> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> <invalidCertificateHandler> <name>RejectCertificateHandler</name> </invalidCertificateHandler> </client> </openSSL> <interserver_http_port>9009</interserver_http_port> <interserver_http_host>hadoop4</interserver_http_host> <listen_host>0.0.0.0</listen_host> <max_connections>4096</max_connections> <keep_alive_timeout>3</keep_alive_timeout> <max_concurrent_queries>100</max_concurrent_queries> <uncompressed_cache_size>8589934592</uncompressed_cache_size> <mark_cache_size>5368709120</mark_cache_size> <path>/data/clickhouse/</path> <tmp_path>/data/clickhouse/tmp/</tmp_path> <user_files_path>/data/clickhouse/user_files/</user_files_path> <users_config>users.xml</users_config> <default_profile>default</default_profile> <default_database>default</default_database> <mlock_executable>false</mlock_executable> <!-- 設置擴展配置文件的路徑,大概在第229行附近--> <include_from>/etc/clickhouse-server/metrika.xml</include_from> <remote_servers incl="clickhouse_remote_servers" > </remote_servers> <zookeeper incl="zookeeper-servers" optional="true" /> <macros incl="macros" optional="true" /> <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval> <max_session_timeout>3600</max_session_timeout> <default_session_timeout>60</default_session_timeout> <query_log> <database>system</database> <table>query_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </query_log> <trace_log> <database>system</database> <table>trace_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </trace_log> <query_thread_log> <database>system</database> <table>query_thread_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </query_thread_log> <dictionaries_config>*_dictionary.xml</dictionaries_config> <compression incl="clickhouse_compression"> </compression> <distributed_ddl> <path>/clickhouse/task_queue/ddl</path> </distributed_ddl> <graphite_rollup_example> <pattern> <regexp>click_cost</regexp> <function>any</function> <retention> <age>0</age> <precision>3600</precision> </retention> <retention> <age>86400</age> <precision>60</precision> </retention> </pattern> <default> <function>max</function> <retention> <age>0</age> <precision>60</precision> </retention> <retention> <age>3600</age> <precision>300</precision> </retention> <retention> <age>86400</age> <precision>3600</precision> </retention> </default> </graphite_rollup_example> <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path> </yandex>
補充:
[root@cdhserver1 ~]# vim /etc/clickhouse-server/config.xml <http_port>8123</http_port> <tcp_port>9006</tcp_port> <listen_host>::</listen_host> <!-- <listen_host>::1</listen_host> --> <!-- <listen_host>127.0.0.1</listen_host> --> <!-- <max_table_size_to_drop>0</max_table_size_to_drop> --> <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> --> <!-- 設置時區為東八區,大概在第144行附近--> <timezone>Asia/Shanghai</timezone> <!-- 設置擴展配置文件的路徑,大概在第226行附近-- 新增>
<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
-->
<include_from>/etc/clickhouse-server/metrika.xml</include_from>
2)四個節點修改配置文件users.xml添加用戶
設置用戶認證。密碼配置有兩種方式,一種是明文方式,一種是密文方式(sha256sum的Hash值),官方推薦使用密文作為密碼配置。
[root@cdhserver1 ~]# vim /etc/clickhouse-server/users.xml
修改/etc/clickhouse-server/users.xml
在<!-- Example of user with readonly access. -->上新增:
<ck> <password_sha256_hex>8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92</password_sha256_hex> <networks incl="networks" replace="replace"> <ip>::/0</ip> </networks> <profile>default</profile> <quota>default</quota> </ck>
生成sha256sum的Hash值可以執行如下命令(第一行),回車后輸出兩行信息(第二行和第三行),其中第二行是原始密碼,第三行是加密的密文,配置文件使用第三行的字符串,客戶端登錄是使用第二行的密碼。
[root@cdhserver1 clickhouse-server]# PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-' zjisCtGb 2451b6a1bd72c59fd96972ae8dd1c4597257679033794d4c65e6a4dd49997dd6
3)四個節點的/etc/clickhouse-server目錄下新建metrika.xml文件集群分片的配置
[root@cdhserver1 ~]# vim /etc/clickhouse-server/metrika.xml
添加如下內容:
<?xml version="1.0"?> <yandex> <!--ck集群節點--> <clickhouse_remote_servers>
<!-- 定義的集群名 idc_cluster--> <idc_cluster> <!--分片1--> <shard>
<!-- 分片權重值, 默認為 1,官方建議這個值不要設置的太大,分一個分片的權重值越大,被寫入數據的就會越多 --> <weight>1</weight> <replica> <host>cdhserver2</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片2--> <shard> <weight>1</weight> <replica> <host>cdhserver3</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片3--> <shard> <weight>1</weight> <replica> <host>cdhserver4</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> <!--zookeeper相關配置--> <zookeeper-servers> <node index="1"> <host>cdhserver2</host> <port>2181</port> </node> <node index="2"> <host>cdhserver3</host> <port>2181</port> </node> <node index="3"> <host>cdhserver4</host> <port>2181</port> </node> </zookeeper-servers> <!--分片和副本配置--> <macros> <replica>cdhserver1</replica> </macros> <!--開啟遠程訪問--> <networks> <ip>::/0</ip> </networks> <!--壓縮相關配置--> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> <!--壓縮算法lz4壓縮比zstd快, 更占磁盤--> </case> </clickhouse_compression> </yandex>
注意:上面標紅的地方需要根據節點不同去修改
cdhserver2 分片1,副本1:
<macros> <layer>01</layer> <shard>01</shard> <replica>cdhserver2</replica> </macros>
cdhserver3 分片1,副本1:
<macros> <layer>01</layer> <shard>02</shard> <replica>cdhserver3</replica> </macros>
cdhserver4 分片1,副本1:
<macros> <layer>01</layer> <shard>03</shard> <replica>cdhserver4</replica> </macros>
補充:clickhouse集群多分片多副本metrika.xml配置(例如:3分片3副本)
ck1: 實例1, 端口: tcp 9006, ,http 8123, 同步端口9006, 類型: 分片1, 副本1
ck2: 實例2, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片2, 副本1
ck3: 實例3, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片3, 副本1
ck4: 實例1, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片1, 副本2(ck1的副本)
ck5: 實例2, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片2, 副本2(ck2的副本)
ck6: 實例3, 端口: tcp 9006, , http 8123, 同步端口9006, 類型: 分片3, 副本2(ck3的副本)
[root@ck1~]# vim /etc/clickhouse-server/metrika.xml
添加如下內容:
<!--所有實例均使用這個集群配置,不用個性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有實例配置都一樣 --> <!-- 集群配置 --> <clickhouse_remote_servers> <!-- 定義的集群名 idc_cluster--> <idc_cluster> <!-- 數據分片1 --> <shard> <!-- 在分布式表中的這個 shard 內只選擇一個合適的 replica 寫入數據。如果為本地表引擎為 ReplicatedMergeTree ,多個副本之間的數據交由引擎自己處理 --> <internal_replication>true</internal_replication> <replica> <host>ck1</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck4</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> <!-- 數據分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>ck2</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck5</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> <!-- 數據分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>ck3</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> <replica> <host>ck6</host> <port>9006</port> <user>ck</user> <password>123456</password> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有實例配置都一樣 --> <zookeeper-servers> <node index="1"> <host>192.168.10.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.10.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.10.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos-分片和副本配置,每個實例配置不一樣 分片1, 副本1 --> <macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-01</replica> </macros> <!--開啟遠程訪問--> <networks> <ip>::/0</ip> </networks> <!-- 數據壓縮算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> <!--壓縮算法lz4壓縮比zstd快, 更占磁盤--> </case> </clickhouse_compression> </yandex>
注意:上面標紅的地方需要根據節點不同去修改
ck2 分片2,副本1:
<macros> <layer>01</layer> <shard>02</shard> <replica>cluster02-01</replica> </macros>
ck3 分片3,副本1:
<macros> <layer>01</layer> <shard>03</shard> <replica>cluster03-01</replica> </macros>
ck4 分片1,副本2:
<macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-02</replica> </macros>
ck5 分片2,副本2:
<macros> <layer>01</layer> <shard>02</shard> <replica>cluster02-02</replica> </macros>
ck6 分片3,副本2:
<macros> <layer>01</layer> <shard>03</shard> <replica>cluster03-02</replica> </macros>
5、 服務啟動腳本
ClickHouse Server服務的啟停命令如下
# 1 啟動。 # 可以在/var/log/clickhouse-server/目錄中查看日志。 #sudo /etc/init.d/clickhouse-server start systemctl start clickhouse-server # 2 查看狀態 systemctl status clickhouse-server # 3 重啟 systemctl restart clickhouse-server # 4 關閉 systemctl stop clickhouse-server
前台啟動:
[root@cdhserver1 software]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
后台啟動:
[root@cdhserver1 software]# nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &
=================================================Clickhouse 單節點多實例=======================================
1)將/etc/clickhouse-server/config.xml文件拷貝一份改名
[root@ck1 clickhouse-server]# cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config9002.xml
2) 編輯/etc/clickhouse-server/config9002.xml更改以下內容將兩個服務區分開來
多實例修改的config9002.xml:原來內容 <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <http_port>8123</http_port> <tcp_port>9000</tcp_port> <mysql_port>9004</mysql_port> <interserver_http_port>9009</interserver_http_port> <path>/data/clickhouse/</path> <tmp_path>/data/clickhouse/tmp/</tmp_path> <user_files_path>/data/clickhouse/user_files/</user_files_path> <access_control_path>/data/clickhouse/access/</access_control_path> <include_from>/etc/clickhouse-server/metrika.xml</include_from> #集群配置文件 多實例修改的config9002.xml:調整后內容 <log>/var/log/clickhouse-server/clickhouse-server-9002.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server-9002.err.log</errorlog> <http_port>8124</http_port> <tcp_port>9002</tcp_port> <mysql_port>9005</mysql_port> <interserver_http_port>9010</interserver_http_port> <path>/data/clickhouse9002/</path> <tmp_path>/data/clickhouse9002/tmp/</tmp_path> <user_files_path>/data/clickhouse9002/user_files/</user_files_path> <access_control_path>/data/clickhouse9002/access/</access_control_path> <include_from>/etc/clickhouse-server/metrika9002.xml</include_from>
3)創建對應的目錄
[root@ck1 clickhouse-server]# mkdir -p /data/clickhouse9002 [root@ck1 clickhouse-server]# chown -R clickhouse:clickhouse /data/clickhouse9002
4)增加實例對應的服務啟動腳本
[root@ck1 init.d]# cp /etc/init.d/clickhouse-server /etc/init.d/clickhouse-server9002 [root@ck1 init.d]# vim /etc/init.d/clickhouse-server9002 調整內容如下: 調整后內容: CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config9002.xml CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-9002.pid" 調整前內容: CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM.pid"
5)啟動高可用clickhouse集群
[root@ck1 init.d]# systemctl start clickhouse-server
[root@ck1 init.d]# systemctl start clickhouse-server9002
說明:其他配置參考安裝步驟
==============================================================================================================
五、客戶端工具
5.1 clickhouse-client
# 1 未設置密碼時 clickhouse-client # 2 指定用戶名和密碼 clickhouse-client -h 127.0.0.1 -u ck --password 123456 clickhouse-client -h 127.0.0.1 --port 9006 -u ck --password 123456 --multiline # 指定sql命令方式 clickhouse-client -h 127.0.0.1 --port 9006 -u ck --password 123456 --multiline -q "SELECT now()"
-- 查看集群信息 cdhserver1 :) SELECT * FROM system.clusters; SELECT * FROM system.clusters ┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──┬─host_address─┬─port─┬─is_local─┬─user─┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ test_ck_cluster │ 1 │ 1 │ 1 │ cdhserver2 │ 10.0.0.237 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 1 │ 1 │ 2 │ cdhserver3 │ 10.0.0.238 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 2 │ 1 │ 1 │ cdhserver3 │ 10.0.0.238 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 2 │ 1 │ 2 │ cdhserver4 │ 10.0.0.239 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 3 │ 1 │ 1 │ cdhserver4 │ 10.0.0.239 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ │ test_ck_cluster │ 3 │ 1 │ 2 │ cdhserver2 │ 10.0.0.237 │ 9000 │ 0 │ ck │ │ 0 │ 0 │ └─────────────────┴───────────┴──────────────┴─────────────┴────────────┴──────────────┴──────┴──────────┴──────┴──────────────────┴──────────────┴─────────────────────────┘ 6 rows in set. Elapsed: 2.683 sec.
5.2 DBeaver
新建連接
All(或者Analytical),選擇ClickHouse,下一步
端口默認是8123,主機選擇ClickHouse的Server節點(如果是集群,隨意一個ClickHouse 服務節點都行)。填寫用戶認證處設置用戶名和密碼。
測試連接,會提示下載驅動,確認下載即可
六、分布式DDL操作
重點是 ON CLUSTER
,后面跟上配置的集群名字,這樣在創建分布式表時 ClickHouse 會根據集群配置在各個節點自動執行 DDL 語句。處理在創建分布式表時可以使用外,在創建本地表(*_local
)時也可以使用 ON CLUSTER
語句,這樣在一個節點執行建表后,集群內其他節點也會創建同樣的本地表。
1、表增加字段
alter TABLE idc.web_initial ON CLUSTER idc_cluster add COLUMN tracerouteip String AFTER jitter;
2、更改列的類型
alter TABLE idc.web_initial ON CLUSTER idc_cluster modify column tracerouteip UInt16;
3、 刪除列
alter TABLE idc.web_initial ON CLUSTER idc_cluster drop column tracerouteip;
4、刪除集群多個節點同一張表
drop table tabl on cluster clickhouse_cluster;
drop TABLE if exists idc.web_initial on CLUSTER idc_cluster ;
5、清理集群表數據
truncate table lmmbase.user_label_uid on cluster crm_4shards_1replicas;
6、創建集群庫
CREATE DATABASE IF NOT EXISTS yhw ON CLUSTER idc_cluster;
7、集群刪除時間分區數據
按時間分區:
toYYYYMM(EventDate):按月分區
toMonday(EventDate):按周分區
toDate(EventDate):按天分區
方法一:
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DELETE WHERE toDate(report_time)='2020-08-03';
方法二:
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DELETE WHERE report_time<=1596470399;
方法三:(當前兩種方法分區數據沒有刪除掉的時候可以用方法三)
ALTER TABLE baip.speed_sdk_info ON CLUSTER idc_cluster DROP PARTITION '2020-08-03';
8、集群創建分布式表
1、創建本地表 drop TABLE if exists idc.web_element_detail_dist on CLUSTER idc_cluster ; drop TABLE if exists idc.web_element_detail on CLUSTER idc_cluster ; CREATE TABLE if not exists idc.web_element_detail on CLUSTER idc_cluster ( `task_id` UInt64 COMMENT '撥測任務id', `target` String COMMENT '域名/網址', `target_name` String COMMENT '網址名稱', `element` String COMMENT '元素名稱', `report_time` DateTime COMMENT '上報時間', `net_type` String COMMENT '網絡接入方式', `probe_id` String COMMENT '探針id', `opt_type` String COMMENT '運營商類型', `opt_name` String COMMENT '運營商名稱', `province_id` UInt32 COMMENT '省份編碼', `province_name` String COMMENT '省份名稱', `city_id` UInt32 COMMENT '地市編碼', `city_name` String COMMENT '地市名稱', `area_id` UInt32 COMMENT '區縣編碼', `area_name` String COMMENT '區縣名稱', `busi_type` String COMMENT '業務類型', `element_num` String COMMENT '元素個數', `idc_ip` String COMMENT '目標ip地址', `idc_delay` Float32 COMMENT 'idc延遲', `idc_size` Float32 COMMENT 'idc大小' , `ip_opt_type` String COMMENT '目標運營商類型', `ip_opt_name` String COMMENT '目標運營商名稱', `ip_province_id` UInt32 COMMENT '目標IP省份編碼', `ip_province_name` String COMMENT '目標IP省份名稱', `ip_city_id` UInt32 COMMENT '目標IP地市編碼', `ip_city_name` String COMMENT '目標IP地市名稱', `ip_area_id` UInt32 COMMENT '目標IP區縣編碼', `ip_area_name` String COMMENT '目標IP區縣名稱', `five_min` UInt32, `ten_min` UInt32, `half_hour` UInt32, `one_hour` UInt32, `four_hour` UInt32, `half_day` UInt32 ) ENGINE = MergeTree() PARTITION BY (task_id, toYYYYMMDD(report_time)) ORDER BY (target, report_time) SETTINGS index_granularity = 8192; 2、創建分布式表 CREATE TABLE idc.web_element_detail_dist on CLUSTER idc_cluster AS idc.web_element_detail ENGINE = Distributed(idc_cluster, idc, web_element_detail, rand());
9、數據導入導出
導入數據:
sql的語法格式為 INSERT INTO 表名 FORMAT 輸出格式
,輸入格式同輸出格式,這里以CSV和JSON兩種方式為例,其它類似。
--max_insert_block_size=100000 #指定批量導入塊大小 --format_csv_delimiter=$'\001' #指定分隔符 --input_format_allow_errors_num : 是允許的錯誤數 --input_format_allow_errors_ratio : 是允許的錯誤率, 范圍是 [0-1] cat ~/csv_fileName.csv| clickhouse-client --host=ip --port=19000 --user=username --password=pwd --max_insert_block_size=100000 --format_csv_delimiter=$'\001' --query="INSERT INTO table_name FORMAT CSVWithNames" cat ~/csv_fileName.csv| clickhouse-client --host=ip --port=19000 --user=username --password=pwd --format_csv_delimiter=$'\001' --query="INSERT INTO table_name FORMAT CSVWithNames" cat tablename.csv | clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --max_insert_block_size=10000 --query="INSERT INTO yhw.speed_sdk_info FORMAT CSVWithNames" cat /web/tablename.csv | clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --format_csv_delimiter=$'\001' --query="INSERT INTO yhw.speed_sdk_info FORMAT CSV" clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query='INSERT INTO yhw.speed_sdk_info FORMAT CSV' < /web/tablename.csv clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query "INSERT INTO yhw.speed_sdk_info JSONEachRow" < /web/tablename.json
導出數據:
-- 1 創建 supplier 表 CREATE TABLE supplier( S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String )ENGINE = MergeTree ORDER BY S_SUPPKEY; -- 2 將數據加載到 supplier 表 clickhouse-client -h 127.0.0.1 --port 9000 -u default --password KavrqeN1 --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
# 1 數據導出 # 說明,--query="SQL",sql的語法格式為 SELECT * FROM 表名 FORMAT 輸出格式 # 1.1 以CSV格式,指定需要導出的某些字段信息 clickhouse-client -h 10.0.0.239 --port 9006 -u ck --password 123456 --query "SELECT S_SUPPKEY, S_NAME, S_ADDRESS,S_CITY,S_NATION,S_REGION,S_PHONE FROM supplier FORMAT CSV" > /opt/supplier.tb0.csv # 1.2 以CSV格式,導出表中所有字段的數據 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT CSV" > /opt/supplier.tb1.csv # 查看導出的數據文件 head -n 5 supplier.tb1.csv 1,"Supplier#000000001","sdrGnXCDRcfriBvY0KL,i","PERU 0","PERU","AMERICA","27-989-741-2988" 2,"Supplier#000000002","TRMhVHz3XiFu","ETHIOPIA 1","ETHIOPIA","AFRICA","15-768-687-3665" 3,"Supplier#000000003","BZ0kXcHUcHjx62L7CjZS","ARGENTINA7","ARGENTINA","AMERICA","11-719-748-3364" 4,"Supplier#000000004","qGTQJXogS83a7MB","MOROCCO 4","MOROCCO","AFRICA","25-128-190-5944" 5,"Supplier#000000005","lONEYAh9sF","IRAQ 5","IRAQ","MIDDLE EAST","21-750-942-6364" # 1.3 以CSV格式帶表頭信息形式,導出表中所有字段的數據 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT CSVWithNames" > /opt/supplier.tb2.csv # 查看導出的數據文件 head -n 5 supplier.tb2.csv "S_SUPPKEY","S_NAME","S_ADDRESS","S_CITY","S_NATION","S_REGION","S_PHONE" 1,"Supplier#000000001","sdrGnXCDRcfriBvY0KL,i","PERU 0","PERU","AMERICA","27-989-741-2988" 2,"Supplier#000000002","TRMhVHz3XiFu","ETHIOPIA 1","ETHIOPIA","AFRICA","15-768-687-3665" 3,"Supplier#000000003","BZ0kXcHUcHjx62L7CjZS","ARGENTINA7","ARGENTINA","AMERICA","11-719-748-3364" 4,"Supplier#000000004","qGTQJXogS83a7MB","MOROCCO 4","MOROCCO","AFRICA","25-128-190-5944" # 1.4 以制表分隔符形式導出數據 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TabSeparated``" > /opt/supplier.tb3.txt # 查看導出的數據文件 head -n 5 supplier.tb3.txt 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 4 Supplier#000000004 qGTQJXogS83a7MB MOROCCO 4 MOROCCO AFRICA 25-128-190-5944 5 Supplier#000000005 lONEYAh9sF IRAQ 5 IRAQ MIDDLE EAST 21-750-942-6364 # 1.5 帶表頭信息的方式,以制表符方式導出數據文件。TabSeparatedWithNames 等價於 TSVWithNames # 在解析這種文件時第一行會被完全忽略 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TSVWithNames" > /opt/supplier.tb4.txt # 查看導出的數據文件 head -n 5 supplier.tb4.txt S_SUPPKEY S_NAME S_ADDRESS S_CITY S_NATION S_REGION S_PHONE 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 4 Supplier#000000004 qGTQJXogS83a7MB MOROCCO 4 MOROCCO AFRICA 25-128-190-5944 # 1.6 帶表頭信息的方式,以制表符方式導出數據文件。TabSeparatedWithNamesAndTypes 等價於 TSVWithNamesAndTypes # 在解析這種文件時前兩行會被完全忽略 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TabSeparatedWithNamesAndTypes" > /opt/supplier.tb5.txt # 查看導出的數據文件 head -n 5 supplier.tb5.txt S_SUPPKEY S_NAME S_ADDRESS S_CITY S_NATION S_REGION S_PHONE UInt32 String String LowCardinality(String) LowCardinality(String) LowCardinality(String) String 1 Supplier#000000001 sdrGnXCDRcfriBvY0KL,i PERU 0 PERU AMERICA 27-989-741-2988 2 Supplier#000000002 TRMhVHz3XiFu ETHIOPIA 1 ETHIOPIA AFRICA 15-768-687-3665 3 Supplier#000000003 BZ0kXcHUcHjx62L7CjZS ARGENTINA7 ARGENTINA AMERICA 11-719-748-3364 # 1.7 以 KV 形式輸出每一行,和前面的 TabSeparated 類似,不過是 name=value 的格式 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT TSKV" > /opt/supplier.tb6.txt # 查看導出的數據文件 head -n 5 supplier.tb6.txt S_SUPPKEY=1 S_NAME=Supplier#000000001 S_ADDRESS=sdrGnXCDRcfriBvY0KL,i S_CITY=PERU 0 S_NATION=PERU S_REGION=AMERICA S_PHONE=27-989-741-2988 S_SUPPKEY=2 S_NAME=Supplier#000000002 S_ADDRESS=TRMhVHz3XiFu S_CITY=ETHIOPIA 1 S_NATION=ETHIOPIA S_REGION=AFRICA S_PHONE=15-768-687-3665 S_SUPPKEY=3 S_NAME=Supplier#000000003 S_ADDRESS=BZ0kXcHUcHjx62L7CjZS S_CITY=ARGENTINA7 S_NATION=ARGENTINA S_REGION=AMERICA S_PHONE=11-719-748-3364 S_SUPPKEY=4 S_NAME=Supplier#000000004 S_ADDRESS=qGTQJXogS83a7MB S_CITY=MOROCCO 4 S_NATION=MOROCCO S_REGION=AFRICA S_PHONE=25-128-190-5944 S_SUPPKEY=5 S_NAME=Supplier#000000005 S_ADDRESS=lONEYAh9sF S_CITY=IRAQ 5 S_NATION=IRAQ S_REGION=MIDDLE EAST S_PHONE=21-750-942-6364 # 1.8 以元組形式打印每一行,每個括號用英文逗號分割 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier LIMIT 3 FORMAT Values" > /opt/supplier.tb7.txt # 查看導出的數據文件 head supplier.tb7.txt (1,'Supplier#000000001','sdrGnXCDRcfriBvY0KL,i','PERU 0','PERU','AMERICA','27-989-741-2988'),(2,'Supplier#000000002','TRMhVHz3XiFu','ETHIOPIA 1','ETHIOPIA','AFRICA','15-768-687-3665'),(3,'Supplier#000000003','BZ0kXcHUcHjx62L7CjZS','ARGENTINA7','ARGENTINA','AMERICA','11-719-748-3364') # 1.9 以 JSON 形式打印每一行 clickhouse-client -h 10.0.0.239 --port 9006 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT JSONEachRow" > /opt/supplier.tb8.json # 查看導出的數據文件 head -n 5 supplier.tb8.json {"S_SUPPKEY":1,"S_NAME":"Supplier#000000001","S_ADDRESS":"sdrGnXCDRcfriBvY0KL,i","S_CITY":"PERU 0","S_NATION":"PERU","S_REGION":"AMERICA","S_PHONE":"27-989-741-2988"} {"S_SUPPKEY":2,"S_NAME":"Supplier#000000002","S_ADDRESS":"TRMhVHz3XiFu","S_CITY":"ETHIOPIA 1","S_NATION":"ETHIOPIA","S_REGION":"AFRICA","S_PHONE":"15-768-687-3665"} {"S_SUPPKEY":3,"S_NAME":"Supplier#000000003","S_ADDRESS":"BZ0kXcHUcHjx62L7CjZS","S_CITY":"ARGENTINA7","S_NATION":"ARGENTINA","S_REGION":"AMERICA","S_PHONE":"11-719-748-3364"} {"S_SUPPKEY":4,"S_NAME":"Supplier#000000004","S_ADDRESS":"qGTQJXogS83a7MB","S_CITY":"MOROCCO 4","S_NATION":"MOROCCO","S_REGION":"AFRICA","S_PHONE":"25-128-190-5944"} {"S_SUPPKEY":5,"S_NAME":"Supplier#000000005","S_ADDRESS":"lONEYAh9sF","S_CITY":"IRAQ 5","S_NATION":"IRAQ","S_REGION":"MIDDLE EAST","S_PHONE":"21-750-942-6364"} # 1.10 以二進制格式逐行格式化和解析數據 clickhouse-client -h 10.0.0.239 --port 90066 -u default --password KavrqeN1 --query "SELECT * FROM supplier FORMAT RowBinary" > /opt/supplier.tb9.dat
七、測試. 創建本地復制表和分布式表
1、分片與副本
創建分布式式表之前我們進一步解下 ClickHouse 的分片與副本,我們配置文件中我們的配置如下,一般一個 replica 對應一個 host,一個 shard 可以配置多個 replica ,一個集群可以定義多個 shard,ClickHouse 還可以定義為多集群方式,多集群方式包好多個子集群,ClickHouse 的配置是比較靈活的。
<?xml version="1.0"?> <yandex> <!--ck集群節點--> <clickhouse_remote_servers> <!-- 定義的集群名 idc_cluster--> <idc_cluster> <!-- 在分布式表中的這個 shard 內只選擇一個合適的 replica 寫入數據。如果為本地表引擎為 ReplicatedMergeTree ,多個副本之間的數據交由引擎自己處理 --> <shard> <!-- 分片權重值, 默認為 1,官方建議這個值不要設置的太大,分一個分片的權重值越大,被寫入數據的就會越多 --> <weight>1</weight> <replica> <host>cdhserver2</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片2--> <shard> <weight>1</weight> <replica> <host>cdhserver3</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> <!--分片3--> <shard> <weight>1</weight> <replica> <host>cdhserver4</host> <port>9006</port> <user>ck</user> <password>123456</password> <compression>true</compression> </replica> </shard> </idc_cluster> </clickhouse_remote_servers> </yandex>
ClickHouse 的分片是以副本的形式表現,當一個分片中定義一個副本,則這個可以理解為就是分片的實現,如果一個分片中定義多個副本,副本中會選舉(通過 ZK 選舉)一個作為主副本,其他同一分片內的副本同步主副本的數據。一個表通過水平切片分為多個分片,寫入多個節點的磁盤上,從而實現水平擴展和分區容錯。副本之間的數據讀寫可以通過 ReplicatedMergeTree 引擎來實現。
2、分布式表的理解
分布表(Distributed)本身不存儲數據,相當於路由,在創建時需要指定集群名、數據庫名、數據表名、分片KEY,這里分片用rand()函數,表示隨機分片。查詢分布式表會根據集群配置信息,路由到具體的數據表,再把結果進行合並。分布式表創建時不會與本地表結構進行一致性的檢查,與 Hive 類似,只有在讀取數據時會驗證數據與表結構是否有誤,如果有誤則會拋出錯誤。
創建分布式表的命名規則:
本地表:表名一般以 _local 為后綴,本地表是承接數據的載體,可以使用非 Distributed 的任一表引擎,在分布式表中,一張本地表對應了一個數據分片。
分布式表:一般以 _all 為后綴命名表名,表引擎只能使用 Distributed,與本地表之間形成一對多的映射,創建完畢后,后期對數據的操作可以通過分布式表操作多張本地表
3、分片規則
分布式表創建的語法如下:
-- clusterName,集群名,既 <perftest_3shards_1replicas> 標簽的名,這個名字配置時可自定義 -- databases,指定對應的庫名 -- table,指定對應的表名 -- sharding_key,分片鍵,可選項 Distributed(clusterName, databases, table[, sharding_key[, policy_name]])
從分布式表定義的語法可以看到,我們可以指定分片鍵,這個參數也就是分片的規則。對於分片鍵,它要求返回一個整型類型的數值(可以為 Intx系列和 UInt 系列):
默認:如果沒有聲明分片鍵,那么分布式表只能包含一個分片,也就是分布式表只能映射一張本地表,否則寫入數據時會報錯。
根據字段值進行分片:Distributed(cluster, databases, table, userId)
隨機進行分片:Distributed(cluster, databases, table, rand())
根據字段hash值進行分片:Distributed(cluster, databases, table, intHash64(userId))
同時還會受到分片權重的的影響(weight),在定義集群分片配置時可以設置,默認 weight=1,分片的權重會影響數據在分片中的傾斜程度,分片權重的值越大,寫入這個分片的數據就會越多
八、優化
1、max_table_size_to_drop
默認情況下, Clickhouse 不允許刪除分區或表的大小大於 50GB 的分區或表. 可以通過修改server的配置文件來永久配置. 也可以臨時設置一下來刪除而不用重啟服務.
永久配置
sudo vim /etc/clickhouse-server/config.xml
然后注釋掉下面兩行
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
0表示不限制. 或者你可以設置為你想限制的最大的大小.
臨時設置
創建個標志文件:
sudo touch '/home/username/clickhouse/flags/force_drop_table' && sudo chmod 666 '/home/username/clickhouse/flags/force_drop_table'
創建好之后, 就可以執行上面的刪除分區或表的命令了.
2、max_memory_usage
此參數在/etc/clickhouse-server/users.xml中,表示檔次query占內存最大值,超過本值query失敗,建議在資源足夠情況盡量調大
<max_memory_usage>25000000000</max_memory_usage>
3、刪除多個節點上的同張表
drop table tabl on cluster clickhouse_cluster
https://blog.csdn.net/github_39577257/article/details/103066747?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v28-1-103066747.nonecase&utm_term=clickhouse%E9%9B%86%E7%BE%A4%E8%A7%86%E5%9B%BE%E5%88%9B%E5%BB%BA&spm=1000.2123.3001.4430