clickhouse三分片一副本集群部署

本文轉載自查看原文 2020-06-16 11:54 1830 Clickhouse

個人學習筆記，謝絕轉載！！！
原文：https://www.cnblogs.com/wshenjin/p/13140191.html

簡單分布式MergeTree + Distributed，三分片一副本

節點IP

192.168.31.101
192.168.31.102
192.168.31.103

部署clickhouse集群

三個節點的安裝省略

config.xml的一些配置：

    <!-- Path to data directory, with trailing slash. -->
    <path>/data/database/clickhouse/</path>
    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/data/database/clickhouse/tmp/</tmp_path>
    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/data/database/clickhouse/user_files/</user_files_path>
    <!-- Directory in <clickhouse-path> containing schema files for various input formats.The directory will be created if it doesn't exist.-->
    <format_schema_path>/data/database/clickhouse/format_schemas/</format_schema_path>
    <!-- Same for hosts with disabled ipv6.-->
    <listen_host>0.0.0.0</listen_host> 
    <timezone>Asia/Shanghai</timezone> 
    <!-- 集群相關的配置，可以用外部依賴文件來配置，因此這里留空即可 -->
    <remote_servers incl="clickhouse_remote_servers" > </remote_servers>
    <!-- 外部依賴配置文件 -->
    <include_from>/etc/clickhouse-server/metrika.xml</include_from>

配置metrika.xml:

<yandex>
    <!-- 集群配置 -->
    <clickhouse_remote_servers>
        <!-- 集群名稱 三分配一副本，名稱可自定義 -->
        <ckcluster_3shards_1replicas>
            <!-- 數據分片1  -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.31.101</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!-- 數據分片2  -->
            <shard>
                <replica>
                <internal_replication>true</internal_replication>
                    <host>192.168.31.102</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!-- 數據分片3  -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.31.103</host>
                    <port>9000</port>
                </replica>
            </shard>
        </ckcluster_3shards_1replicas>
    </clickhouse_remote_servers>
<!--壓縮算法->
    <clickhouse_compression>
        <case>
            <min_part_size>10000000000</min_part_size>
            <min_part_size_ratio>0.01</min_part_size_ratio>
            <method>lz4</method>
        </case>
    </clickhouse_compression>
</yandex>

說明：

clickhouse_remote_servers與config.xml中的incl屬性值對應
cluster_with_replica是集群名，可以自定義。
shard即為數據分片
internal_replication =true 這個參數和數據的寫入，自動復制相關。從生產環境角度考慮，我們都是復制表，通過本地表寫入，這里配置true就好。不推薦也不需要考慮其他情況。
clickhouse_compression數據的壓縮。
無副本時，可以使用node標簽代替shard標簽來定義節點

各個節點重啟后登陸查看：

SELECT *
FROM system.clusters

┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ ckcluster_3shards_1replicas │         1 │            1 │           1 │ 192.168.31.101 │ 192.168.31.101 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ ckcluster_3shards_1replicas │         2 │            1 │           1 │ 192.168.31.102 │ 192.168.31.102 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ ckcluster_3shards_1replicas │         3 │            1 │           1 │ 192.168.31.103 │ 192.168.31.103 │ 9000 │        0 │ default │                  │            0 │                       0 │
└─────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

這樣集群就搭建好了

建庫建表

在各個節點建庫、本地表

:) create database testdb ;
:) create table person_local (ID Int8, Name String, BirthDate Date) ENGINE = MergeTree(BirthDate, (Name, BirthDate), 8192);

在各個節點建分布表

:) create table person_all as person_local ENGINE = Distributed(ckcluster_3shards_1replicas, testdb, person_local, rand());

分布表（Distributed）本身不存儲數據，相當於路由，需要指定集群名、數據庫名、數據表名、分片KEY.
這里分片用rand()函數，表示隨機分片。
查詢分布表，會根據集群配置信息，路由到具體的數據表，再把結果進行合並。

person_local 為本地表，數據只是在本地
person_all 為分布式表，查詢這個表，引擎自動把整個集群數據計算后返回

插入數據,再來查看各個節點的數據量對比

#導入3w的數據量
[root ~]# wc -l /tmp/a.csv  
30000 /tmp/a.csv
[root ~]# clickhouse-client  --host 127.0.0.1 --database testdb  --query="insert into person_all FORMAT CSV"  < /tmp/a.csv 

#對比總表和本地表的數據量
ck1 :) select count(*) from person_all ;
┌─count()─┐
│   30000 │
└─────────┘
ck1 :) select count(*) from person_local ;
┌─count()─┐
│   10092 │
└─────────┘

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Docker快速搭建Clickhouse集群(3分片3副本) clickhouse主從部署(1分片2副本) clickhouse（二）集群搭建-三分片兩副本模式 Clickhouse 3分片2副本 clickhouse兩分片兩副本集群部署 Mongodb副本集＋分片集群環境部署 [原創]在Docker上部署mongodb分片副本集群。 Mongodb副本集分片集群模式環境部署 MongoDB的分片和副本集--部署 mongodb集群方式-分片+副本集方式