一、Hue 簡介
Hue是一個開源的Apache Hadoop UI系統,最早是由Cloudera Desktop演化而來,由Cloudera貢獻給開源社區,它是基於Python Web框架Django實現的。通過使用Hue我們可以在瀏覽器端的Web控制台上與Hadoop集群進行交互來分析處理數據,例如操作HDFS上的數據,運行MapReduce Job等等。很早以前就聽說過Hue的便利與強大,一直沒能親自嘗試使用,下面先通過官網給出的特性,通過翻譯原文簡單了解一下Hue所支持的功能特性集合:
-
默認基於輕量級sqlite數據庫管理會話數據,用戶認證和授權,可以自定義為MySQL、Postgresql,以及Oracle
-
基於文件瀏覽器(File Browser)訪問HDFS
-
基於Hive編輯器來開發和運行Hive查詢
-
支持基於Solr進行搜索的應用,並提供可視化的數據視圖,以及儀表板(Dashboard)
-
支持基於Impala的應用進行交互式查詢
-
支持Spark編輯器和儀表板(Dashboard)
-
支持Pig編輯器,並能夠提交腳本任務
-
支持Oozie編輯器,可以通過儀表板提交和監控Workflow、Coordinator和Bundle
-
支持HBase瀏覽器,能夠可視化數據、查詢數據、修改HBase表
-
支持Metastore瀏覽器,可以訪問Hive的元數據,以及HCatalog
-
支持Job瀏覽器,能夠訪問MapReduce Job(MR1/MR2-YARN)
-
支持Job設計器,能夠創建MapReduce/Streaming/Java Job
-
支持Sqoop 2編輯器和儀表板(Dashboard)
-
支持ZooKeeper瀏覽器和編輯器
-
支持MySql、PostGresql、Sqlite和Oracle數據庫查詢編輯器
二、Hue的架構
三、安裝與部署
1、下載
hue官網:http://gethue.com/ 配置文檔:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/manual.html#_install_hue 源碼:https://github.com/cloudera/hue 這里我們直接用下載Hue:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6.tar.gz
2、安裝系統包
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel
在安裝系統包的時候遇到的問題:
sqlite-devel不能從鏡像下載,這里我是用了手動下載tar包,安裝編譯。下載地址: http://www.sqlite.org/sqlite-autoconf-3070500.tar.gz
tar zxf sqlite-autoconf-3070500.tar.gz cd sqlite-autoconf-3070500 ./configure make sudo make install
3、編譯Hue
tar -zxvf hue-3.7.0-cdh5.3.6.tar.gz mv hue-3.7.0-cdh5.3.6 hue
cd hue
make apps
編譯Hue時遇到的問題:
a、
OpenSSL/crypto/crl.c:6:23: error: static declaration of ‘X509_REVOKED_dup’ follows non-static declaration static X509_REVOKED * X509_REVOKED_dup(X509_REVOKED *orig) { ^ In file included from /usr/include/openssl/ssl.h:156:0, from OpenSSL/crypto/x509.h:17, from OpenSSL/crypto/crypto.h:30, from OpenSSL/crypto/crl.c:3: /usr/include/openssl/x509.h:751:15: note: previous declaration of ‘X509_REVOKED_dup’ was here X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev); ^ error: command 'gcc' failed with exit status 1 make[2]: *** [/mnt/hue/desktop/core/build/pyopenssl/egg.stamp] Error 1 make[2]: Leaving directory `/mnt/hue/desktop/core' make[1]: *** [.recursive-env-install/core] Error 2 make[1]: Leaving directory `/mnt/hue/desktop' make: *** [desktop] Error 2
解決辦法:
將/usr/include/openssl/x509.h文件下:
這兩行刪除,必須刪除,采用注釋的方式不行:
X509_REVOKED *X509_REVOKED_dup(X509_REVOKED *rev);
X509_REQ *X509_REQ_dup(X509_REQ *req);
4、配置hue.ini文件
cd /mnt/hue/desktop/conf
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o # Webserver listens on this address and port http_host=master http_port=8888 # Time zone name time_zone=Asia/Shanghai
5、啟動
cd /mnt/hue/build/env/bin
啟動的時候遇到的問題:
Couldn't get user id for user hue
首先說明出現此問題的原因是因為你使用的root用戶安裝了hue,然后在root用戶下使用的build/env/bin/supervisor
解決辦法:
a、創建個普通用戶,並給添加密碼:
[root@master bin]# useradd hue [root@master bin]# passwd hue
然后設置好密碼
b、給剛才解壓的hue文件改變擁有者屬性,通過 chown -R 用戶名 文件地址。
[root@master bin]# chown -R hue /mnt/hue
最后,我們使用 su 命令切換用戶,到hue文件夾下執行運行hue的命令就可以了。
然后在頁面上登錄:192.168.200.100:8888
輸入用戶和密碼:
四、Hue與HDFS、MYSQL、Hive、Zookeeper集成配置
Hue集成zookeeper:
進入目錄:/mnt/hue/desktop/conf,配置hue.ini
[zookeeper] [[clusters]] [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 host_ports=master:2181,slave01:2181,slave02:2181 # The URL of the REST contrib service (required for znode browsing) ## rest_url=http://localhost:9998
A、 啟動zk(master、slave01、slave02)
zkServer.sh start
B、 啟動hue
進入目錄/mnt/hue/build/env/bin:
./ supervisor
C、訪問192.168.200.100:8888頁面
Hue集成MYSQL
進入目錄:/mnt/hue/desktop/conf,配置hue.ini
# mysql, oracle, or postgresql configuration. [[[mysql]]] # Name to show in the UI. nice_name="My SQL DB" # For MySQL and PostgreSQL, name is the name of the database. # For Oracle, Name is instance of the Oracle server. For express edition # this is 'xe' by default. ## name=mysqldb # Database backend to use. This can be: # 1. mysql # 2. postgresql # 3. oracle engine=mysql # IP or hostname of the database to connect to. host=master # Port the database server is listening to. Defaults are: # 1. MySQL: 3306 # 2. PostgreSQL: 5432 # 3. Oracle Express Edition: 1521 port=3306 # Username to authenticate with when connecting to the database. user=root # Password matching the username to authenticate with when # connecting to the database. password=010209 # Database options to send to the server when connecting. # https://docs.djangoproject.com/en/1.4/ref/databases/ ## options={}
啟動hue:
對比mysql數據庫:
Hue集成hive
A、進入目錄:/mnt/hue/desktop/conf,配置hue.ini
# Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=master # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located hive_conf_dir=/mnt/hive/conf # Timeout in seconds for thrift calls to Hive service server_conn_timeout=120 # Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs. # If false, Hue will use the FetchResults() thrift call instead. ## use_get_log_api=true
B、配置hue與hive集成需要啟動hiveserver2的相關參數(hive-site.xml):
<property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>master</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://192.168.200.100:9083</value> </property>
C、啟動
1、啟動hive之前先啟動hdfs:start-dfs.sh
2、啟動hive相關服務
hive --service metastore &
hive --service hiveserver2 &
3、啟動hue
配置環境變量之后可以這樣使用,沒有配置的話,請到相關目錄下執行:
4、訪問HUE頁面
當每執行一次查詢成功的時候,hiveserver2就會打印ok
D、Hue集成hive遇到的問題:
啟動hive的hue之后,訪問hue頁面,連接hive數據庫時,始終超時:
相關錯誤信息:
Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found
解決辦法:
查看是否少了cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi中的rpm包,缺少誰就安裝誰
yum install cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi
Hue集成HDFS
A、進入目錄:/mnt/hue/desktop/conf,配置hue.ini
# HA support by using HttpFs [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://master:8020 # NameNode logical name. ## logical_name= # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://master:50070/webhdfs/v1 hadoop_hdfs_home=/mnt/hadoop hadoop_bin=/mnt/hadoop/bin # Change this if your HDFS cluster is Kerberos-secured ## security_enabled=false # Default umask for file and directory creation, specified in an octal value. ## umask=022 # Directory of the Hadoop configuration ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf' hadoop_conf_dir=/mnt/hadoop/etc/hadoop
B、啟動hdfs,和HUE,訪問頁面:
可以看到我們可以通過hue對hdfs上的文件進行操作,刪除等等,還可以直接查看文件:點擊sparktest.txt,如下: