【ClickHouse】5:clickhouse集群部署


背景介紹:

有三台CentOS7服務器安裝了ClickHouse

HostName IP 安裝程序 程序端口
centf8118.sharding1.db 192.168.81.18 clickhouse-server,clickhouse-client 9000
centf8119.sharding2.db 192.168.81.19 clickhouse-server,clickhouse-client 9000
centf8120.sharding3.db 192.168.81.20 clickhouse-server,clickhouse-client 9000

 

 

 

 

clickhouse集群部署簡要步驟:

  1. Install ClickHouse server on all machines of the cluster
  2. Set up cluster configs in configuration files
  3. Create local tables on each instance
  4. Create a Distributed table
  1. 在群集的所有計算機上安裝ClickHouse服務器
  2. 在配置文件中設置群集配置
  3. 在每個實例上創建本地表
  4. 創建一個 分布式表

分布式表 實際上是一種 “view” 到ClickHouse集群的本地表。從分布式表中選擇查詢使用集群所有分片的資源執行。您可以為多個集群指定configs,並創建多個分布式表,為不同的集群提供視圖。

 

第一步:在群集的所有計算機上安裝ClickHouse服務器

參考文章:【ClickHouse】1:clickhouse安裝 (CentOS7)

 

第二步:在配置文件中設置集群配置

2.1: 先在/etc/clickhouse-server/config.xml中新增下面一段內容。不配置的話就默認就在/etc/metrika.xml目錄,我這里調整到/etc/clickhouse-server/目錄。

 

2.2: 再配置/etc/clickhouse-server/metrika.xml文件,配置好后同步到其他兩台機器。

<yandex>
<!-- 集群配置 -->
<clickhouse_remote_servers>
    <!-- 3分片1備份 -->
    <cluster_3shards_1replicas>
        <!-- 數據分片1  -->
        <shard>
            <replica>
                <host>centf8118.sharding1.db</host>
                <port>9000</port>
            </replica>
        </shard>
        <!-- 數據分片2  -->
        <shard>
            <replica>
                <host>centf8119.sharding2.db</host>
                <port> 9000</port>
            </replica>
        </shard>
        <!-- 數據分片3  -->
        <shard>
            <replica>
                <host>centf8120.sharding3.db</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster_3shards_1replicas>
</clickhouse_remote_servers>
</yandex>

 

說明:

  • clickhouse_remote_servers與config.xml中的incl屬性值對應;
  • cluster_3shards_1replicas是集群名,可以隨便取名;
  • 共設置3個分片,每個分片只有1個副本;

打開clickhouse-client,查看集群:
centf8118.sharding1.db :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ cluster_3shards_1replicas         │         111 │ centf8118.sharding1.db │ 192.168.81.1890001 │ default │                  │            00 │
│ cluster_3shards_1replicas         │         211 │ centf8119.sharding2.db │ 192.168.81.1990000 │ default │                  │            00 │
│ cluster_3shards_1replicas         │         311 │ centf8120.sharding3.db │ 192.168.81.2090000 │ default │                  │            00 │
│ test_cluster_two_shards           │         111127.0.0.1127.0.0.190001 │ default │                  │            00 │
│ test_cluster_two_shards           │         211127.0.0.2127.0.0.290000 │ default │                  │            00 │
│ test_cluster_two_shards_localhost │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_cluster_two_shards_localhost │         211 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_shard_localhost              │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_shard_localhost_secure       │         111 │ localhost              │ 127.0.0.194400 │ default │                  │            00 │
│ test_unavailable_shard            │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_unavailable_shard            │         211 │ localhost              │ 127.0.0.110 │ default │                  │            00 │
└───────────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

11 rows in set. Elapsed: 0.004 sec. 

 

可以看到cluster_3shards_1replicas就是我們定義的集群名稱,一共有三個分片,每個分片有一份數據。

 

 

第三步:在每個實例上創建本地表

3.1:數據准備

前置條件:這里用到官方提供的OnTime數據集,先下載下來,並按照文檔建表。

教程: https://clickhouse.yandex/docs/en/single/?query=internal_replication#ontime

 有兩種方式:

  • import from raw data
  • download of prepared partitions

第一種方式:import from raw data

 先下載數據

# cd /data/clickhouse/tmp

for s in `seq 1987 2018`
do
for m in `seq 1 12`
do
wget https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_${s}_${m}.zip
done
done

 

再創建表 ontime

CREATE TABLE `ontime` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;
View Code

 

最后導入數據(Loading data)

# cd /data/clickhouse/tmp
for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --host=centf9118.sharding1.db --query="INSERT INTO ontime FORMAT CSVWithNames"; done

  

第二種方式:download of prepared partitions 

經測試,第二種下載快一點,我這里采用的第二種方法,第一種沒驗證過。

 

$ cd /data/clickhouse/tmp/
$ curl -O https://clickhouse-datasets.s3.yandex.net/ontime/partitions/ontime.tar
$ tar xvf ontime.tar -C /data/clickhouse # path to ClickHouse data directory
$ # check permissions of unpacked data, fix if required
$ sudo service clickhouse-server restart
$ clickhouse-client --query "select count(*) from datasets.ontime"

 

tar包解壓后,datasets的目錄所屬用戶組及用戶都是root,需要把所屬用戶組和用戶改為clickhouse。然后重啟服務。

cd /data/clickhouse/data
chown -R clickhouse:clickhouse ./datasets

# 下面也要操作,一開始沒改該文件夾所屬組等導致遇到下面的問題。
cd /data/clickhouse/metadata
chown -R clickhouse:clickhouse ./datasets 

 

執行完會自動創建一個新庫:datasets , 庫里有一張表:ontime

查看數據總量:

[root@centf8119 data]# clickhouse-client --query "select count(*) from datasets.ontime"
183953732

  

3.2:建數據表: ontime_local

原始數據有了,在81.19的datasets.ontime中。

然后在另外兩個服務器(81.18,81.20)上新建datasets庫。

最后三個服務器都在datasets庫上分別建表:ontime_local 。表結構和ontime完全一樣。

CREATE TABLE `ontime_local` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)
View Code

 

18,20的庫表都創建好了,但是在19創建ontime_local表的時候就報錯了,報錯如下:

Received exception from server (version 20.6.4):
Code: 76. DB::Exception: Received from localhost:9000. DB::Exception: Cannot open file /data/clickhouse/metadata/datasets/ontime_local.sql.tmp, errno: 13, strerror: Permission denied. 

0 rows in set. Elapsed: 0.007 sec. 

 

是因為下載后的tar包解壓后的相關文件所屬用戶組和用戶都是root。所以clickhouse賬號操作的時候沒有權限。

[root@centf8119 metadata]# pwd
/data/clickhouse/metadata
[root@centf8119 metadata]# ll
total 8
drwxr-xr-x 2 root       root        24 Aug 27 18:25 datasets
drwxr-x--- 2 clickhouse clickhouse  65 Aug 28 10:01 default
-rw-r----- 1 clickhouse clickhouse  42 Aug 26 09:51 default.sql
drwxr-x--- 2 clickhouse clickhouse 133 Aug 26 13:56 system
drwxr-x--- 2 clickhouse clickhouse  30 Aug 26 13:59 testdb
-rw-r----- 1 clickhouse clickhouse  41 Aug 26 13:59 testdb.sql

  

解決方法如下:執行完后重啟服務。

cd /data/clickhouse/metadata
chown -R clickhouse:clickhouse ./datasets

  

 如果為了杜絕這個問題,可以在導入數據之前,先把數據庫dataserts建好。這樣就不會有這個問題了。

 這里建表的時候會往/data/clickhouse/data/datasets/目錄創建表數據,同時也會往/data/clickhouse/metedata/datasets/目錄中保存建表sql。前面報錯就是在這一步保存建表sql的時候沒有權限。

但是/data/clickhouse/data/datasets/目錄的數據文件已經有了。如果要重新創建之前報錯的表,需要在這個目錄中刪除相應的表名目錄文件。否則會提示該表文件已存在:

Code: 57. DB::Exception: Received from localhost:9000. DB::Exception: Directory for table data data/datasets/ontime_local/ already exists. 

  

3.3:建分布表:ontime_all 

CREATE TABLE ontime_all AS ontime_local
ENGINE = Distributed(cluster_3shards_1replicas, datasets, ontime_local, rand());

 

分布表(Distributed)本身不存儲數據,相當於路由,需要指定集群名、數據庫名、數據表名、分片KEY,這里分片用rand()函數,表示隨機分片。

查詢分布表,會根據集群配置信息,路由到具體的數據表,再把結果進行合並。

ontime_all與ontime在同一個節點上,方便插入數據。這里只要在81.19的datasets庫創建就行了。

centf8119.sharding2.db :) show tables;

SHOW TABLES

┌─name─────────┐
│ ontime       │
│ ontime_all   │
│ ontime_local │
└──────────────┘

3 rows in set. Elapsed: 0.004 sec. 

  

3.4:插入數據

INSERT INTO ontime_all SELECT * FROM ontime;

 

把ontime的數據插入到ontime_all,ontime_all會隨機插入到三個節點的ontime_local里。

插入完成后,查看總數據量:

centf8119.sharding2.db :) INSERT INTO ontime_all SELECT * FROM ontime;

INSERT INTO ontime_all SELECT *
FROM ontime

Ok.

0 rows in set. Elapsed: 486.390 sec. Processed 183.95 million rows, 133.65 GB (378.20 thousand rows/s., 274.78 MB/s.) 

centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) select count(*) from ontime_all;

SELECT count(*)
FROM ontime_all

┌───count()─┐
│ 183953732 │
└───────────┘

1 rows in set. Elapsed: 0.014 sec. 

 

 

查看每個節點的數據量:61322750 + 61311299 + 61319683 = 183953732

可以看到,每個節點大概有1/3的數據。

 

 

 

3.5:性能對比

對比一下分片與不分片的性能差異。

不分片: (執行第一次耗時3.561 sec,第二次耗時1.214 sec。第二次更快可能是緩存的緣故)

centf8119.sharding2.db :) select Carrier, count() as c, round(quantileTDigest(0.99)(DepDelay), 2) as q from ontime group by Carrier order by q desc limit 5;

SELECT 
    Carrier,
    count() AS c,
    round(quantileTDigest(0.99)(DepDelay), 2) AS q
FROM ontime
GROUP BY Carrier
ORDER BY q DESC
LIMIT 5

↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
← Progress: 1.34 million rows, 8.05 MB (10.53 million rows/s., 63.20 MB/s.) 
↖ Progress: 6.59 million rows, 39.53 MB (28.92 million rows/s., 173.53 MB/s.)  3%
↑ Progress: 10.91 million rows, 65.45 MB (33.25 million rows/s., 199.50 MB/s.)  5%
↗ Progress: 15.37 million rows, 92.21 MB (35.88 million rows/s., 215.26 MB/s.)  8%
→ Progress: 20.82 million rows, 124.90 MB (39.38 million rows/s., 236.29 MB/s.) ██████████▋                                                                                      11%
↘ Progress: 25.34 million rows, 152.03 MB (40.29 million rows/s., 241.77 MB/s.) █████████████                                                                                    13%
↓ Progress: 30.17 million rows, 181.01 MB (41.38 million rows/s., 248.29 MB/s.) ███████████████▌                                                                                 16%
┌─Carrier─┬───────c─┬──────q─┐
│ G4      │   80908268.21 │
│ NK      │  559143194.38 │
│ B6      │ 3246853194.14 │
│ EV      │ 6396753187.97 │
│ YV      │ 1882273179.92 │
└─────────┴─────────┴────────┘

5 rows in set. Elapsed: 3.561 sec. Processed 183.95 million rows, 1.10 GB (51.66 million rows/s., 309.97 MB/s.) 

centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) 
centf8119.sharding2.db :) SELECT     Carrier,     count() AS c,     round(quantileTDigest(0.99)(DepDelay), 2) AS q FROM ontime GROUP BY Carrier ORDER BY q DESC LIMIT 5;

SELECT 
    Carrier,
    count() AS c,
    round(quantileTDigest(0.99)(DepDelay), 2) AS q
FROM ontime
GROUP BY Carrier
ORDER BY q DESC
LIMIT 5

┌─Carrier─┬───────c─┬──────q─┐
│ G4      │   80908268.28 │
│ NK      │  559143194.43 │
│ B6      │ 3246853194.06 │
│ EV      │ 6396753188.05 │
│ YV      │ 1882273179.95 │
└─────────┴─────────┴────────┘

5 rows in set. Elapsed: 1.214 sec. Processed 183.95 million rows, 1.10 GB (151.48 million rows/s., 908.92 MB/s.) 

 

第一次執行

 

 第二次執行:

 

分片: (執行第一次耗時17.254 sec,第二次耗時0.892 sec。)

centf8119.sharding2.db :) select Carrier, count() as c, round(quantileTDigest(0.99)(DepDelay), 2) as q from ontime_all group by Carrier order by q desc limit 5;

SELECT 
    Carrier,
    count() AS c,
    round(quantileTDigest(0.99)(DepDelay), 2) AS q
FROM ontime_all
GROUP BY Carrier
ORDER BY q DESC
LIMIT 5

┌─Carrier─┬───────c─┬──────q─┐
│ G4      │   80908268.15 │
│ NK      │  559143194.49 │
│ B6      │ 3246853193.96 │
│ EV      │ 6396753187.98 │
│ YV      │ 1882273179.98 │
└─────────┴─────────┴────────┘

5 rows in set. Elapsed: 17.254 sec. Processed 183.95 million rows, 1.10 GB (10.66 million rows/s., 63.97 MB/s.) 

centf8119.sharding2.db :) select Carrier, count() as c, round(quantileTDigest(0.99)(DepDelay), 2) as q from ontime_all group by Carrier order by q desc limit 5;

SELECT 
    Carrier,
    count() AS c,
    round(quantileTDigest(0.99)(DepDelay), 2) AS q
FROM ontime_all
GROUP BY Carrier
ORDER BY q DESC
LIMIT 5

┌─Carrier─┬───────c─┬──────q─┐
│ G4      │   80908268.5 │
│ NK      │  559143194.41 │
│ B6      │ 3246853194.04 │
│ EV      │ 6396753188.07 │
│ YV      │ 1882273179.88 │
└─────────┴─────────┴────────┘

5 rows in set. Elapsed: 0.892 sec. Processed 183.95 million rows, 1.10 GB (206.26 million rows/s., 1.24 GB/s.) 

為什么第一次耗時這么久,不明白。

第一次執行:

 

第二次執行:

 

 現在,停掉一個節點,會是神馬情況? 停掉20的clickhouse-server。

[root@centf8120 metadata]# service clickhouse-server stop;
Stop clickhouse-server service: DONE

 

然后在81.19上查詢:

centf8119.sharding2.db :) select Carrier, count() as c, round(quantileTDigest(0.99)(DepDelay), 2) as q from ontime_all group by Carrier order by q desc limit 5;

SELECT 
    Carrier,
    count() AS c,
    round(quantileTDigest(0.99)(DepDelay), 2) AS q
FROM ontime_all
GROUP BY Carrier
ORDER BY q DESC
LIMIT 5

↖ Progress: 71.35 million rows, 428.13 MB (129.17 million rows/s., 775.05 MB/s.)  53%
Received exception from server (version 20.6.4):
Code: 279. DB::Exception: Received from localhost:9000. DB::Exception: All connection tries failed. Log: 

Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 20.6.4.44 (official build))
Code: 210, e.displayText() = DB::NetException: Connection refused (centf8120.sharding3.db:9000) (version 20.6.4.44 (official build))
Code: 210, e.displayText() = DB::NetException: Connection refused (centf8120.sharding3.db:9000) (version 20.6.4.44 (official build))

: While executing Remote. 

0 rows in set. Elapsed: 0.655 sec. Processed 71.35 million rows, 428.13 MB (108.91 million rows/s., 653.48 MB/s.) 

 

報錯了,看來clickhouse的處理很嚴格,如果一個分片不可用,就整個分布式表都不可用了。

當然,此時如果查本地表ontime_local還是可以的。

那么,如何解決整個問題呢?這就是前文所說的穩定性問題了,解決方案是:數據備份!

第四步:數據備份

說明一點,數據備份與分片沒有必然聯系,這是兩個方面的問題。但在clickhouse中,replica是掛在shard上的,因此要用多副本,必須先定義shard。

最簡單的情況:1個分片多個副本。

 

4.1:添加集群

像之前一樣,再配置一個集群,叫做cluster_1shards_2replicas,表示1分片2副本,配置信息如下:

 vim /etc/clickhouse-server/metrika.xml

<yandex>
        <!-- 1分片2備份 -->
        <cluster_1shards_2replicas>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>centf8118.sharding1.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8119.sharding2.db</host>
                    <port>9000</port>
                </replica>
            </shard>
        </cluster_1shards_2replicas>
</yandex>

 

 注意,如果配置文件沒有問題,是不用重啟clickhouse-server的,會自動加載!

 

 

4.2:建本地數據表

 建新數據表名為ontime_local_2,在三台機器分別執行。

CREATE TABLE `ontime_local_2` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)
View Code

 

4.3:建分布表

 建新分布表名為ontime_all_2,在81.19機器建一個就行了。

CREATE TABLE ontime_all_2 AS ontime_local_2
ENGINE = Distributed(cluster_1shards_2replicas, datasets, ontime_local_2, rand());

 

4.4:導入數據

INSERT INTO ontime_all_2 SELECT * FROM ontime

 

4.5:查詢數據

 

查詢ontime_local_2,兩個節點都有全量數據。

關掉一個服務器,仍能查詢全量數據,數據副本已經生效。

 

4.6:數據一致性探究

既然有多副本,就有個一致性的問題:加入寫入數據時,掛掉一台機器,會怎樣?

我們來模擬一下:

1:停掉centf8118.sharding1.db服務。

server clickhouse-server stop

 

2:在centf8119.sharding2.db中隊ontime_all_2分布表插入幾條數據。

insert into ontime_all_2 select * from ontime limit 10;

 

3:啟動centf8118.sharding1.db服務。

service clickhouse-server start

 

 4:查看數據驗證。

 

 

 查看兩個機器的ontime_local_2、以及ontime_all_2,發現都是總數據量都增加了100條,說明這種情況下,集群節點之間能夠自動同步。

 

再模擬一個復雜點的場景:(后面三台機器簡稱sharding1,sharding2,sharding3)

  1. 停掉sharding1;
  2. 在sharding2中對ontime_all_2插入10條數據;
  3. 查詢ontime_all_2,此時數據增加了10條;
  4. 停掉sharding2;
  5. 啟動sharding1,此時,整個集群不可用;
  6. 查詢ontime_all_2,此時,集群恢復可用,但數據少了10條;
  7. 啟動sharding2,查詢ontime_all_2、ontime_local_2,數據自動同步;

上邊都是通過ontime_all_2表插入數據的,如果通過ontime_local_2表插入數據,還能同步嗎?

  1. sharding1上往ontime_local_2插入10條數據;
  2. 查詢sharding2,ontime_local_2數據沒有同步
綜上,通過分布表寫入數據,會自動同步數據;而通過數據表寫入數據,不會同步;正常情況沒什么大問題。

 

更復雜的情況沒有模擬出來,但是可能會存在數據不一致的問題,官方文檔描述如下:

Each shard can have the 'internal_replication' parameter defined in the config file.

If this parameter is set to 'true', the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table "looks at" replicated tables. In other words, if the table where data will be written is going to replicate them itself.

If it is set to 'false' (the default), data is written to all replicas. In essence, this means that the Distributed table replicates data itself. This is worse than using replicated tables, because the consistency of replicas is not checked, and over time they will contain slightly different data.

翻譯下:

分片可以設置internal_replication屬性,這個屬性是true或者false,默認是false。

如果設置為true,則往本地表寫入數據時,總是寫入到完整健康的副本里,然后由表自身完成復制,這就要求本地表是能自我復制的。

如果設置為false,則寫入數據時,是寫入到所有副本中。這時,是無法保證一致性的。

 

舉個栗子,一條數據要insert到ontime_all_2中,假設經過rand()實際是要寫入到sharding1的ontime_local_2表中,此時ontime_local_2配置了兩個副本。
如果internal_replication是false,那么就會分別往兩個副本中插入這條數據。注意!!!分別插入,可能一個成功,一個失敗,插入結果不檢驗!這就導致了不一致性;
而如果internal_replication是true,則只往1個副本里寫數據,其他副本則是由ontime_local_2自己進行同步,這樣就解決了寫入一致性問題。

雖然沒有模擬出數據不一致的情況,實際中可能會遇到,所以官方建議使用表自動同步的方式,也就是internal_replication為true。

具體怎么用,下邊具體介紹。



5:自動數據備份

自動數據備份,是表的行為,ReplicatedXXX的表支持自動同步。

Replicated前綴只用於MergeTree系列(MergeTree是最常用的引擎),即clickhouse支持以下幾種自動備份的引擎:

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedGraphiteMergeTree

再強調一遍,Replicated表自動同步與之前的集群自動同步不同,是表的行為,與clickhouse_remote_servers配置沒有關系,只要有zookeeper配置就行了。


為了說明這個問題,先不配置clickhouse_remote_servers,只添加zookeeper配置:
vim /etc/clickhouse-server/metrika.xml
<yandex><zookeeper-servers>
        <node index="1">
            <host>centf8118.sharding1.db</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>centf8119.sharding2.db</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>centf8120.sharding3.db</host>
            <port>2181</port>
        </node>
    </zookeeper-servers></yandex>

 

建數據表:
sharding1:
CREATE TABLE `ontime_replica` (
  ...
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica1', FlightDate, (Year, FlightDate), 8192);

sharding2:
CREATE TABLE `ontime_replica` (
  ...
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica2', FlightDate, (Year, FlightDate), 8192);

sharding3:
CREATE TABLE `ontime_replica` (
  ...
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica3', FlightDate, (Year, FlightDate), 8192);

 

sharding1完整建表SQL:

CREATE TABLE `ontime_replica` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica1', FlightDate, (Year, FlightDate), 8192)
View Code

 

sharding2完整建表SQL:

CREATE TABLE `ontime_replica` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica2', FlightDate, (Year, FlightDate), 8192)
View Code

 

sharding3完整建表SQL:

CREATE TABLE `ontime_replica` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime', 'replica3', FlightDate, (Year, FlightDate), 8192)
View Code

 

查看zk信息:

[zk: localhost:2181(CONNECTED) 0] ls /data/clickhouse/tables/ontime/replicas
[replica2, replica3, replica1]

  

可以看到,zk中已經有了對應的路徑和副本信息。

 

插入數據:

注意!只在一個機器上執行插入操作。(ontime數據在sharding2上,所以我在sharding2上操作)

# sharding2 上操作
INSERT INTO ontime_replica SELECT * FROM ontime
centf8119.sharding2.db :) INSERT INTO ontime_replica SELECT * FROM ontime;

INSERT INTO ontime_replica SELECT *
FROM ontime

Ok.

0 rows in set. Elapsed: 618.798 sec. Processed 183.95 million rows, 133.65 GB (297.28 thousand rows/s., 215.98 MB/s.) 
View Code

 

查看數據:

分別在三個機器上查詢ontime_replica,都有數據,且數據完全一樣。

可以看到,這種方式與之前方式的區別,直接寫入一個節點,其他節點自動同步,完全是表的行為;

而之前的方式必須創建Distributed表,並通過Distributed表寫入數據才能同步(目前我們還沒有給ontime_replica創建對應的Distributed表)。

 

5.1:配置集群

        <!-- 1分片3備份 -->
        <cluster_1shards_3replicas>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>centf8118.sharding1.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8119.sharding2.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8120.sharding3.db</host>
                    <port>9000</port>
                </replica>
            </shard>
        </cluster_1shards_3replicas>

 

集群名字為cluster_1shards_3replicas,1個分片,3個副本;與之前不同,這次設置internal_replication為true,表示要用表自我復制功能,而不用集群的復制功能。

 

5.2:建分布表

CREATE TABLE ontime_replica_all AS ontime_replica
ENGINE = Distributed(cluster_1shards_3replicas, datasets, ontime_replica, rand())

表名為ontime_replica_all,使用cluster_1shards_3replicas集群,數據表為ontime_replica。

 

查詢分布表:

centf8119.sharding2.db :) select count(*) from ontime_replica;

SELECT count(*)
FROM ontime_replica

┌───count()─┐
│ 183953732 │
└───────────┘

1 rows in set. Elapsed: 0.007 sec. 

 

5.3:分布表寫入

前邊說了,一個節點ontime_replica寫入數據時,其他節點自動同步;那如果通過分布表ontime_replica_all寫入數據會如何呢?

其實,前文已經提到過,internal_replication為true,則通過分布表寫入數據時,會自動找到“最健康”的副本寫入,然后其他副本通過表自身的復制功能同步數據,最終達到數據一致。

centf8119.sharding2.db :) select count(*) from ontime_replica;

SELECT count(*)
FROM ontime_replica

┌───count()─┐
│ 183953732 │
└───────────┘

1 rows in set. Elapsed: 0.007 sec. 

centf8119.sharding2.db :) show tables;

SHOW TABLES

┌─name───────────────┐
│ ontime             │
│ ontime_all         │
│ ontime_all_2       │
│ ontime_local       │
│ ontime_local_2     │
│ ontime_replica     │
│ ontime_replica_all │
└────────────────────┘

7 rows in set. Elapsed: 0.004 sec. 

centf8119.sharding2.db :) insert into ontime_replica select * from ontime limit 68;

INSERT INTO ontime_replica SELECT *
FROM ontime
LIMIT 68

Ok.

0 rows in set. Elapsed: 0.575 sec. 

centf8119.sharding2.db :) select count(*) from ontime_replica;

SELECT count(*)
FROM ontime_replica

┌───count()─┐
│ 183953800 │
└───────────┘

1 rows in set. Elapsed: 0.008 sec. 

 

6:clickhouse分片 + 備份

分片,是為了突破單機上限(存儲、計算等上限),備份是為了高可用。

只分片,提升了性能,但是一個分片掛掉,整個服務不可用;只備份,確實高可用了,但是整體還是受限於單機瓶頸(備份1000份與備份2份沒什么區別,除了更浪費機器)。

所以,生產中這兩方面需要同時滿足。

其實,只要把之前的分片和備份整合起來就行了,例如,3分片2備份的配置如下:

...
        <!-- 3分片2備份:使用表備份 -->
        <cluster_3shards_2replicas>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>centf8118.sharding1.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8119.sharding2.db</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>centf8120.sharding3.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8121.sharding4.db</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>centf8122.sharding5.db</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>centf8123.sharding6.db</host>
                    <port>9000</port>
                </replica>
            </shard>
        </cluster_3shards_2replicas>
...

 

這里一共需要建6個數據表:
 
CREATE TABLE `ontime_replica` (
  ...
) ENGINE = ReplicatedMergeTree('/data/clickhouse/tables/ontime/{shard}', '{replica}', FlightDate, (Year, FlightDate), 8192);

其中,{shard}和{replica}是macros配置(相當於環境變量),修改配置文件:


# vi /etc/clickhose-server/metrika.xml

...
    <macros>
        <shard>01</shard>
        <replica>01</replica>
    </macros>
...

 

 每台機器的shard和replica值根據具體情況設置,例如,這里是3分片2副本,則配置比如如下:

centf8118.sharding1.db: shard=01, replica=01
centf8119.sharding2.db: shard=01, replica=02
centf8120.sharding3.db: shard=02, replica=01
centf8121.sharding4.db: shard=02, replica=02
centf8122.sharding5.db: shard=03, replica=01
centf8123.sharding6.db: shard=03, replica=02

 

使用macros只是為了建表方便(每個機器可以使用同樣的建表語句),不是必須的,只要ReplicatedMergeTree指定zk路徑和replica值即可。

由於資源有限,這里不實驗了。

需要提醒一下,每個clickhouse-server實例只能放一個分片的一個備份,也就是3分片2備份需要6台機器(6個不同的clickhouse-server)。

之前為了節省資源,打算循環使用,把shard1的兩個副本放到sharding1、sharding2兩個機器上,shard2的兩個副本放到sharding2、sharding3上,shard3的兩個副本放到sharding3、sharding1上,結果是不行的。

原因是shard+replica對應一個數據表,Distributed查詢規則是每個shard里找一個replica,把結果合並。

假如按照以上設置,可能一個查詢解析結果為:
取shard1的replica1,對應sharding1的ontime_replica;
取shard2的replica2,對應sharding3的ontime_replica;
取shard3的replica2,對應sharding1的ontime_replica;

最后,得到的結果是sharding1的ontime_replica查詢兩次+sharding3的ontime_replica查詢一次,結果是不正確的。



 參考文章: https://www.jianshu.com/p/20639fdfdc99

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM