1. 環境准備
JDK1.8
2. 集群規划
ip地址 | 機器名 | 角色 |
192.168.1.101 | palo101 | hadoop namenode, hadoop datanode, yarn nodeManager, zookeeper, hive, hbase master,hbase region server, |
192.168.1.102 | palo102 | hadoop secondary namenode, hadoop datanode, yarn nodeManager, yarn resource manager, zookeeper, hive, hbase master,hbase region server |
192.168.1.103 | palo103 | hadoop datanode, yarn nodeManager, zookeeper, hive,hbase region server,mysql |
3. 下載kylin2.6
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz #下載kylin2.6.0二進制文件 tar -xzvf apache-kylin-2.6.0-bin-hbase1x.tar.gz #解壓kylin2.6.0二進制壓縮包 mv apache-kylin-2.6.0-bin apache-kylin-2.6.0 #將kylin解壓過的文件重命名(去掉最后的bin) mkdir /usr/local/kylin/ #創建目標存放路徑 mv apache-kylin-2.6.0 /usr/local/kylin/ #將kylin2.6.0文件夾移動到/usr/local/kylin目錄下
4. 添加系統環境變量
vim /etc/profile
在文件末尾添加
#kylin export KYLIN_HOME=/usr/local/kylin/apache-kylin-2.6.0 export KYLIN_CONF_HOME=$KYLIN_HOME/conf export PATH=:$PATH:$KYLIN_HOME/bin:$CATALINE_HOME/bin export tomcat_root=$KYLIN_HOME/tomcat #變量名小寫 export hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/*:$HCAT_HOME/share/hcatalog/hive-hcatalog-core-2.3.4.jar #變量名小寫
:wq保存退出,並輸入source /etc/profile使環境變量生效
5. 配置kylin
5.1 配置$KYLIN_HOME/bin/kylin.sh
vim $KYLIN_HOME/bin/kylin.sh
在文件開頭添加
export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX
這么做的目的是為了加入$hive_dependency環境,解決后續的兩個問題,都是沒有hive依賴的原因:
a) kylinweb界面load hive表會失敗
b) cube build的第二步會報org/apache/Hadoop/hive/conf/hiveConf的錯誤。
5.2 hadoop壓縮配置
關於snappy壓縮支持問題,如果支持需要事先重新編譯Hadoop源碼,使得native庫支持snappy.使用snappy能夠實現一個適合的壓縮比,使得這個運算的中間結果和最終結果都能占用較小的存儲空間
本例的hadoop不支持snappy壓縮,這個會導致后續cube build報錯。
vim $KYLIN_HOME/conf/Kylin_job_conf.xml
修改配置文件,將配置項mapreduce.map.output.compress,mapreduce.output.fileoutputformat.compress修改為false
<property> <name>mapreduce.map.output.compress</name> <value>false</value> <description>Compress map outputs</description> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> <description>Compress the output of a MapReduce job</description> </property>
還有一個關於壓縮的地方需要修改
vim $KYLIN_HOME/conf/kylin.properties
將kylin.hbase.default.compression.codec設置為none或者注釋掉
#kylin.storage.hbase.compression-codec=none
5.3 主配置$KYLIN_HOME/conf/kylin.properties
vim $KYLIN_HOME/conf/kylin.properties
修改為:
## The metadata store in hbase ##hbase上存儲kylin元數據 kylin.metadata.url=kylin_metadata@hbase ## metadata cache sync retry times ##元數據同步重試次數 kylin.metadata.sync-retries=3 ## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory ##hdfs上kylin工作目錄 kylin.env.hdfs-working-dir=/kylin ## kylin zk base path kylin.env.zookeeper-base-path=/kylin ## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions. #kylin.env=DEV ## Kylin server mode, valid value [all, query, job] ##kylin主節點模式,從節點的模式為query,只有這一點不一樣 kylin.server.mode=all ## List of web servers in use, this enables one web server instance to sync up with other servers. ##集群的信息同步 kylin.server.cluster-servers=192.168.1.131:7070,192.168.1.193:7070,192.168.1.194:7070 ## Display timezone on UI,format like[GMT+N or GMT-N] ##改為中國時間 kylin.web.timezone=GMT+8 ## Timeout value for the queries submitted through the Web UI, in milliseconds ##web查詢超時時間(毫秒) kylin.web.query-timeout=300000 ## Max count of concurrent jobs running ##可並發執行的job數量 kylin.job.max-concurrent-jobs=10 #### ENGINE ### ## Time interval to check hadoop job status ##檢查hdfs job的時間間隔(秒) kylin.engine.mr.yarn-check-interval-seconds=10 ## Hive database name for putting the intermediate flat tables ##build cube 產生的Hive中間表存放的數據庫 kylin.source.hive.database-for-flat-table=kylin_flat_db ## The percentage of the sampling, default 100% kylin.job.cubing.inmem.sampling.percent=100 ## Max job retry on error, default 0: no retry kylin.job.retry=0 ## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4] ##不采用壓縮 kylin.storage.hbase.compression-codec=none ## The cut size for hbase region, in GB. kylin.storage.hbase.region-cut-gb=5 ## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster. ## Set 0 to disable this optimization. kylin.storage.hbase.hfile-size-gb=2 ## The storage for final cube file in hbase kylin.storage.url=hbase ## The prefix of hbase table kylin.storage.hbase.table-name-prefix=KYLIN_ ## The namespace for hbase storage kylin.storage.hbase.namespace=default ###定義kylin用於MR jobs的job.jar包和hbase的協處理jar包,用於提升性能(添加項) kylin.job.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-job-2.6.0.jar kylin.coprocessor.local.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-coprocessor-2.6.0.jar
5.4 將配置好的kylin復制到其他兩台機器上去
scp -r /usr/local/kylin/ 192.168.1.102:/usr/local scp -r /usr/local/kylin/ 192.168.1.103:/usr/local
5.5 將192.168.1.102,192.168.1.103上的kylin.server.mode改為query
vim $KYLIN_HOME/conf/kylin.properties
修改項為
kylin.server.mode=query ###kylin主節點模式,從節點的模式為query,只有這一點不一樣
6. 啟動kylin
6.1 前提條件:依賴服務先啟動
a) 啟動zookeeper,所有節點運行
$ZOO_KEEPER_HOME/bin/zkServer.sh start
b) 啟動hadoop,主節點運行
$HADOOP_HOME/bin/start-all.sh
c) 啟動JobHistoryserver服務,master主節點啟動.
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
d) 啟動hivemetastore服務
nohup $HIVE_HOME/bin/hive --service metastore /dev/null 2>&1 &
e) 啟動hbase集群,主節點啟動
$HBASE_HOME/bin/start-hbase.sh
啟動后的進程為:
192.168.1.101
[root@palo101 apache-kylin-2.6.0]# jps 62403 NameNode #hdfs NameNode 31013 NodeManager #yarn NodeManager 22325 Kafka 54217 QuorumPeerMain #zookeeper 7274 Jps 62589 DataNode #hadoop datanode 28895 HRegionServer #hbase region server 8440 HMaster #hbase master
192.168.1.102
[root@palo102 ~]# jps 47474 QuorumPeerMain #zookeeper 15203 NodeManager #yarn NodeManager 15061 ResourceManager #yarn ResourceManager 49877 Jps 6694 HRegionServer #hbase region server 7673 Kafka 37517 SecondaryNameNode #hdfs SecondaryNameNode 37359 DataNode #hadoop datanode
192.168.1.103
[root@palo103 ~]# jps 1185 RunJar #hive metastore 62404 NodeManager #yarn NodeManager 47365 HRegionServer #hbase region server 62342 QuorumPeerMain #zookeeper 20952 ManagerBootStrap 52440 Kafka 31801 RunJar #hive thrift server 47901 DataNode #hadoop datanode 36494 Jps
6.2 檢查配置是否正確
$KYLIN_HOME/bin/check-env.sh
[root@palo101 bin]# $KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... KYLIN_HOME is set to /usr/local/kylin/apache-kylin-2.6.0 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive依賴檢查find-hive-dependency.sh
hbase依賴檢查find-hbase-dependency.sh
所有的依賴檢查可吃用check-env.sh
6.3 所有節點運行下面命令來啟動kylin
$KYLIN_HOME/bin/kylin.sh start
啟動時如果出現下面的錯誤
Failed to find metadata store by url: kylin_metadata@hbase
解決辦法 為:
1)將$HBASE_HOME/conf/hbase-site.html的屬性hbase.rootdir改成與$HADOOP_HOME/etc/hadoop/core-site.xml中的屬性fs.defaultFS一致
2)進入zk的bin的zkCli,將/hbase刪除,然后重啟hbase可以解決
6.4 登錄kylin
http://192.168.1.101:7070/kylin, 其他幾台也可以登錄,只要切換相應的ip即可
默認登錄名密碼為:admin/KYLIN
登錄后的主頁面為:
7 FAQ
7.1 如果遇到類似下面的錯誤
WARNING: Failed to process JAR
[jar:file:/home/hadoop-2.7.3/contrib/capacity-scheduler/.jar!/] for
這個問題只是一些小bug問題把這個腳本的內容改動一下就好了${HADOOP_HOME}/etc/hadoop/hadoop-env.sh,把下面的這一段循環語句給注釋掉
vim ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
#for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do # if [ "$HADOOP_CLASSPATH" ]; then # export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f # else # export HADOOP_CLASSPATH=$f # fi #done
7.2 如果遇到Caused by: java.lang.ClassCastException: com.fasterxml.jackson.datatype.joda.JodaModule cannot be cast to com.fasterxml.jackson.databind.Module的錯誤
產生這個問題的原因是hive使用的jackson-datatype-joda-2.4.6.jar,而kylin使用的是jackson-databind-2.9.5.jar,jar包版本不一致造成的。
hive:
kylin:
解決辦法為:
mv $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jar $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jarback
即不使用hive的這個jar包,詳情請參見https://issues.apache.org/jira/browse/KYLIN-3129
7.3 如果遇到Failed to load keystore type JKS with path conf/.keystore due to (No such file or directory)
解決辦法為:
打開apache-kylin-2.6.0/tomcat/conf/server.xml文件,把其中的https的配置刪除掉(或者注釋掉)
<!-- <Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol" maxThreads="150" SSLEnabled="true" scheme="https" secure="true" keystoreFile="conf/.keystore" keystorePass="changeit" clientAuth="false" sslProtocol="TLS" /> -->
8. 簡單使用入門
8.1 執行官方發布的樣例數據
$KYLIN_HOME/bin/sample.sh
如果出現Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect,就說明示例cube創建成功了,如圖:
8.2 重啟kylin或者重新加載元數據讓數據生效
本例中選擇重新加載元數據,操作如圖所示
8.3 進入hive,查看kylin cube表結構
$HIVE_HOME/bin/hive #進入hive shell客戶端 hive>show databases; #查詢hive中數據庫列表 hive>use kylin_flat_db; #切換到kylin的hive數據庫 hive>show tables; #查詢kylin hive數據庫中的所有表
輸入如下:
[druid@palo101 kafka_2.12-2.1.0]$ $HIVE_HOME/bin/hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/workspace/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/workspace/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/home/workspace/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default dw_sales kylin_flat_db ods_sales Time taken: 1.609 seconds, Fetched: 4 row(s) hive> use kylin_flat_db; OK Time taken: 0.036 seconds hive> show tables; OK kylin_account kylin_cal_dt kylin_category_groupings kylin_country kylin_sales Time taken: 0.321 seconds, Fetched: 5 row(s) hive>
再來看hbase
[druid@palo101 kafka_2.12-2.1.0]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.3.3, rfd0d55b1e5ef54eb9bf60cce1f0a8e4c1da073ef, Sat Nov 17 21:43:34 CST 2018 hbase(main):001:0> list TABLE dev kylin_metadata test 3 row(s) in 0.3180 seconds => ["dev", "kylin_metadata", "test"]
hbase中多了個叫kylin_metadata的表,說明使用官方示例數據的cube已經創建成功了!
8.4 構建cube
刷新http://192.168.1.101:7070/kylin,我們發現多了個項目learn_kylin
選擇kylin_sales_model,進行構建
可以在monitor里查看構建的進度
Build成功之后model里面會出現storage信息,之前是沒有的,可以到hbase里面去找對應的表,同時cube狀態變為ready,表示可查詢。
8.5 kylin中進行查詢
至此,kylin集群部署結束。