clickhouse安裝使用文檔


Clickhouse簡介

Clickhouse是什么

1. 開源的列存儲數據庫管理系統

2. 支持線性擴展

3. 簡單方便

4. 高可靠性

5. 容錯(支持多主機異步復制,可以跨多個數據中心部署。 單個節點或整個數據中心的停機時間不會影響系統的讀寫可用性)

 

clickhouse架構及存儲方式

clickhouse架構未開源

clickhouse特點

用於對干凈,結構良好且不可變的事件或日志進行分析。建議將每個這樣的流放入一個帶有預加入尺寸的單一寬事實表中。

 

Clickhouse使用場景

可行的應用程序的一些例子:

  • Web和App分析
  • 廣告網絡和RTB
  • 電信
  • 電子商務和金融
  • 信息安全
  • 監測和遙測
  • 時間序列
  • 商業智能
  • 線上游戲
  • 物聯網
  • 事務性工作負載(OLTP)
  • 高請求率的鍵值訪問
  • Blob或文檔存儲
  • 超標准化的數據

不適用場景

 

 

clickhouse安裝

clickhouse單節點安裝

檢查系統是否支持clickhouse安裝

執行命令:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

若顯示為SSE4.2suported 則可以繼續安裝如為后者:

那么很不幸的告訴你你的電腦cpu不支持sse指令集,請自想辦法。

拉取repo源文件

curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash

或者直接新建:

altinity_clickhouse.repo文件

將此內容插入centos6版本

[altinity_clickhouse]

name=altinity_clickhouse

baseurl=https://packagecloud.io/altinity/clickhouse/el/6/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

 

[altinity_clickhouse-source]

name=altinity_clickhouse-source

baseurl=https://packagecloud.io/altinity/clickhouse/el/6/SRPMS

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

 

centos7版本

 

[altinity_clickhouse]

name=altinity_clickhouse

baseurl=https://packagecloud.io/altinity/clickhouse/el/7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

 

[altinity_clickhouse-source]

name=altinity_clickhouse-source

baseurl=https://packagecloud.io/altinity/clickhouse/el/7/SRPMS

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

 

yum list  ‘clickhouse*’

yum –y install  ‘clickhouse*

 

 

clickhouse多節點安裝

在每台機器上安裝click house數據庫然后,在每台機器上做如下修改

修改host文件

 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.3.251 host1

192.168.3.252 host2

192.168.3.247 host3

~                    

新建文件metrika.xml

在/etc下新建文件cd /etc

vi   metrika.xml

將以下內容修改后粘貼入metrika.xml

<yandex>

<clickhouse_remote_servers>

    <perftest_3shards_1replicas>

        <shard>

             <internal_replication>true</internal_replication>

            <replica>

                <host>192.168.3.247</host>

                <port>9000</port>

            </replica>

        </shard>

        <shard>

            <replica>

                <internal_replication>true</internal_replication>

                <host>192.168.3.252</host>

                <port>9000</port>

            </replica>

        </shard>

                   <shard>

            <replica>

                <internal_replication>true</internal_replication>

                <host>192.168.3.251</host>

                <port>9000</port>

            </replica>

        </shard>

    </perftest_3shards_1replicas>

</clickhouse_remote_servers>

<zookeeper-servers>

  <node index="1">

    <host>192.168.3.251</host>

    <port>2181</port>

  </node>

</zookeeper-servers>

 

<macros>

    <replica>192.168.3.252</replica>

</macros>

 

 

<networks>

   <ip>::/0</ip>

</networks>

 

 

<clickhouse_compression>

<case>

  <min_part_size>10000000000</min_part_size>            

  <min_part_size_ratio>0.01</min_part_size_ratio>

  <method>lz4</method>

</case>

</clickhouse_compression>

 

</yandex>

修改/etc/clickhouse-server下的config.xml文件

  <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->

    <!-- <listen_host>::</listen_host> -->

    <listen_host>::1</listen_host>

    <listen_host>192.168.3.252</listen_host>

 

clickhouse使用

簡單的使用

啟動

 /etc/init.d/clickhouse-server start

 

命令行clickhouse-client –h host –u –p

默認即可:使用clickhouse-client 進入客戶端。

 

DML(data manipulation language)

insert into funtest values(3,'xiaoming',22,'2017-11-09')

insert into funtest values(32,'xiaolan',33,'2017-11-08')

insert into funtest values(35,'xiaotong',33,'2017-11-07')

insert into funtest values(4,'xiaohuang',33,'2017-11-08')

insert into funtest values(44,'xiaolvas',34,'2017-11-05')

insert into funtest values(6,'xiaohuanasg',32,'2017-11-28')

select *  from funtest

select *  from funtest order by id

select * from funtest order by  id desc

select avg(age)  from funtest

select count(name) from funtest

select age from funtest group by age

select round(age/3) FROM funtest

select cast('2015-12-22' as date) from funtest

select cast('2015-12-22' as date)+30 from funtest

select stddev_samp(age) FROM funtest

select upper('hhh') from funtest

select upper(name) from funtest

select abs(-1) from funtest

select * FROM funtest where times =cast('2015-12-22' as date)

select max(age) from funtest

select case when name ='xiaoming' then concat(name,'dddd') else 'ddddfdfdfdf' end  from funtest

select substring(name,1,3) from funtest

select rand() from funtest

 

 

 

DDL(data definition language)

create table funtest(id UInt32, name String ,age UInt32,times Date)ENGINE=Log

drop table funtest

alter table ontime_all add COLUMN name String;

 

 

性能測試

性能測試代碼如下

獲取數據

for s in `seq 1987 2017`

do

for m in `seq 1 12`

do

echo http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip >> a.lst

done

done

解壓上傳至click house數據庫

for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client  --query="INSERT INTO ontime_test FORMAT CSVWithNames"; done

創建hive表

CREATE TABLE ontime

(

    Year int,

    Quarter int,

    Month int,

    DayofMonth int,

    DayOfWeek int,

    FlightDate Date,

    UniqueCarrier String,

    AirlineID int,

    Carrier String,

    TailNum String,

    FlightNum String,

    OriginAirportID int,

    OriginAirportSeqID int,

    OriginCityMarketID int,

    Origin String,

    OriginCityName String,

    OriginState String,

    OriginStateFips String,

    OriginStateName String,

    OriginWac int,

    DestAirportID int,

    DestAirportSeqID int,

    DestCityMarketID int,

    Dest String,

    DestCityName String,

    DestState String,

    DestStateFips String,

    DestStateName String,

    DestWac int,

    CRSDepTime int,

    DepTime int,

    DepDelay int,

    DepDelayMinutes int,

    DepDel15 int,

    DepartureDelayGroups String,

    DepTimeBlk String,

    TaxiOut int,

    WheelsOff int,

    WheelsOn int,

    TaxiIn int,

    CRSArrTime int,

    ArrTime int,

    ArrDelay int,

    ArrDelayMinutes int,

    ArrDel15 int,

    ArrivalDelayGroups int,

    ArrTimeBlk String,

    Cancelled int,

    CancellationCode String,

    Diverted int,

    CRSElapsedTime int,

    ActualElapsedTime int,

    AirTime int,

    Flights int,

    Distance int,

    DistanceGroup int,

    CarrierDelay int,

    WeatherDelay int,

    NASDelay int,

    SecurityDelay int,

    LateAircraftDelay int,

    FirstDepTime String,

    TotalAddGTime String,

    LongestAddGTime String,

    DivAirportLandings String,

    DivReachedDest String,

    DivActualElapsedTime String,

    DivArrDelay String,

    DivDistance String,

    Div1Airport String,

    Div1AirportID int,

    Div1AirportSeqID int,

    Div1WheelsOn String,

    Div1TotalGTime String,

    Div1LongestGTime String,

    Div1WheelsOff String,

    Div1TailNum String,

    Div2Airport String,

    Div2AirportID int,

    Div2AirportSeqID int,

    Div2WheelsOn String,

    Div2TotalGTime String,

    Div2LongestGTime String,

    Div2WheelsOff String,

    Div2TailNum String,

    Div3Airport String,

    Div3AirportID int,

    Div3AirportSeqID int,

    Div3WheelsOn String,

    Div3TotalGTime String,

    Div3LongestGTime String,

    Div3WheelsOff String,

    Div3TailNum String,

    Div4Airport String,

    Div4AirportID int,

    Div4AirportSeqID int,

    Div4WheelsOn String,

    Div4TotalGTime String,

    Div4LongestGTime String,

    Div4WheelsOff String,

    Div4TailNum String,

    Div5Airport String,

    Div5AirportID int,

    Div5AirportSeqID int,

    Div5WheelsOn String,

    Div5TotalGTime String,

    Div5LongestGTime String,

    Div5WheelsOff String,

    Div5TailNum String

)row format delimited

fields terminated by ','

stored as textfile;

load data inpath ‘/data’into table ontime;

 修改hive存儲格式
orc

與spark對比測試

 

 

 

創建clickhouse本地表

CREATE TABLE ontime

(

    Year UInt16,

    Quarter UInt8,

    Month UInt8,

    DayofMonth UInt8,

    DayOfWeek UInt8,

    FlightDate Date,

    UniqueCarrier FixedString(7),

    AirlineID Int32,

    Carrier FixedString(2),

    TailNum String,

    FlightNum String,

    OriginAirportID Int32,

    OriginAirportSeqID Int32,

    OriginCityMarketID Int32,

    Origin FixedString(5),

    OriginCityName String,

    OriginState FixedString(2),

    OriginStateFips String,

    OriginStateName String,

    OriginWac Int32,

    DestAirportID Int32,

    DestAirportSeqID Int32,

    DestCityMarketID Int32,

    Dest FixedString(5),

    DestCityName String,

    DestState FixedString(2),

    DestStateFips String,

    DestStateName String,

    DestWac Int32,

    CRSDepTime Int32,

    DepTime Int32,

    DepDelay Int32,

    DepDelayMinutes Int32,

    DepDel15 Int32,

    DepartureDelayGroups String,

    DepTimeBlk String,

    TaxiOut Int32,

    WheelsOff Int32,

    WheelsOn Int32,

    TaxiIn Int32,

    CRSArrTime Int32,

    ArrTime Int32,

    ArrDelay Int32,

    ArrDelayMinutes Int32,

    ArrDel15 Int32,

    ArrivalDelayGroups Int32,

    ArrTimeBlk String,

    Cancelled UInt8,

    CancellationCode FixedString(1),

    Diverted UInt8,

    CRSElapsedTime Int32,

    ActualElapsedTime Int32,

    AirTime Int32,

    Flights Int32,

    Distance Int32,

    DistanceGroup UInt8,

    CarrierDelay Int32,

    WeatherDelay Int32,

    NASDelay Int32,

    SecurityDelay Int32,

    LateAircraftDelay Int32,

    FirstDepTime String,

    TotalAddGTime String,

    LongestAddGTime String,

    DivAirportLandings String,

    DivReachedDest String,

    DivActualElapsedTime String,

    DivArrDelay String,

    DivDistance String,

    Div1Airport String,

    Div1AirportID Int32,

    Div1AirportSeqID Int32,

    Div1WheelsOn String,

    Div1TotalGTime String,

    Div1LongestGTime String,

    Div1WheelsOff String,

    Div1TailNum String,

    Div2Airport String,

    Div2AirportID Int32,

    Div2AirportSeqID Int32,

    Div2WheelsOn String,

    Div2TotalGTime String,

    Div2LongestGTime String,

    Div2WheelsOff String,

    Div2TailNum String,

    Div3Airport String,

    Div3AirportID Int32,

    Div3AirportSeqID Int32,

    Div3WheelsOn String,

    Div3TotalGTime String,

    Div3LongestGTime String,

    Div3WheelsOff String,

    Div3TailNum String,

    Div4Airport String,

    Div4AirportID Int32,

    Div4AirportSeqID Int32,

    Div4WheelsOn String,

    Div4TotalGTime String,

    Div4LongestGTime String,

    Div4WheelsOff String,

    Div4TailNum String,

    Div5Airport String,

    Div5AirportID Int32,

    Div5AirportSeqID Int32,

    Div5WheelsOn String,

    Div5TotalGTime String,

    Div5LongestGTime String,

    Div5WheelsOff String,

    Div5TailNum String

) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)

 

創建分區表

CREATE TABLE ontimetest AS ontime ENGINE = Distributed(perftest_3shards_1replicas, default, ontime, rand())

注意:

每個節點分別創建本地表,和分區表


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM