Hue的安裝與部署
Hue 簡介
Hue是一個開源的Apache Hadoop UI系統,最早是由Cloudera Desktop演化而來,由Cloudera貢獻給開源社區,它是基於Python Web框架Django實現的。通過使用Hue我們可以在瀏覽器端的Web控制台上與Hadoop集群進行交互來分析處理數據,例如操作HDFS上的數據,運行MapReduce Job等等。很早以前就聽說過Hue的便利與強大,一直沒能親自嘗試使用,下面先通過官網給出的特性,通過翻譯原文簡單了解一下Hue所支持的功能特性集合:
默認基於輕量級sqlite數據庫管理會話數據,用戶認證和授權,可以自定義為MySQL、Postgresql,以及Oracle
基於文件瀏覽器(File Browser)訪問HDFS
基於Hive編輯器來開發和運行Hive查詢
支持基於Solr進行搜索的應用,並提供可視化的數據視圖,以及儀表板(Dashboard)
支持基於Impala的應用進行交互式查詢
支持Spark編輯器和儀表板(Dashboard)
支持Pig編輯器,並能夠提交腳本任務
支持Oozie編輯器,可以通過儀表板提交和監控Workflow、Coordinator和Bundle
支持HBase瀏覽器,能夠可視化數據、查詢數據、修改HBase表
支持Metastore瀏覽器,可以訪問Hive的元數據,以及HCatalog
支持Job瀏覽器,能夠訪問MapReduce Job(MR1/MR2-YARN)
支持Job設計器,能夠創建MapReduce/Streaming/Java Job
支持Sqoop 2編輯器和儀表板(Dashboard)
支持ZooKeeper瀏覽器和編輯器
支持MySql、PostGresql、Sqlite和Oracle數據庫查詢編輯器
Hue的架構:
hue官網:http://gethue.com/
配置文檔:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/manual.html#_install_hue
源碼:https://github.com/cloudera/hue
這里我們直接用下載Hue:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6.tar.gz
Hue 編譯
-
需要連接互聯網
修改虛擬機網絡配置 -
安裝系統包
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel
在實際安裝的時候,sqlite-devel不能從鏡像下載,這里我是用了手動下載tar包,安裝編譯:
下載地址: http://www.sqlite.org/sqlite-autoconf-3070500.tar.gztar zxf sqlite-autoconf-3070500.tar.gz cd sqlite-autoconf-3070500 ./configure make sudo make install
-
編譯Hue
tar zxf hue-3.7.0-cdh5.3.6.tar.gz /opt/cdh5/ cd /opt/cdh5/hue-3.7.0-cdh5.3.6/ make apps
-
配置Hue
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o # Webserver listens on this address and port http_host=hadoop http_port=8888 # Time zone name time_zone=Asia/Shanghai
-
啟動Hue
${HUE_HOME}/build/env/bin/supervisor
-
打開hue的瀏覽器頁面:hadoop:8888
Hue與HDFS,YARN集成
-
Hue與Hadoop集成時,需要配置啟動HDFS中的webHDFS,在hdfs-site.xml增加下面配置:
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
-
此外,還需要配置Hue訪問HDFS用戶權限,在core-site.xml中配置:
<property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>
完成上述配置后,需重啟HDFS。
-
配置Hue
[[hdfs_clusters]] # HA support by using HttpFs [[[default]]] fs_defaultfs=hdfs://hadoop:8020 # Directory of the Hadoop configuration hadoop_conf_dir=/opt/cdh5/hadoop-2.5.0-cdh5.3.6/etc/hadoop # This is the home of your Hadoop HDFS installation. hadoop_hdfs_home=/opt/cdh5/hadoop-2.5.0-cdh5.3.6 # Use this as the HDFS Hadoop launcher script hadoop_bin=/opt/cdh5/hadoop-2.5.0-cdh5.3.6/bin # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=hadoop # The port where the ResourceManager IPC listens on resourcemanager_port=8032 # Whether to submit jobs to this cluster submit_to=True # URL of the ResourceManager API resourcemanager_api_url=http://hadoop:8088 # URL of the ProxyServer API proxy_api_url=http://hadoop:8088 # URL of the HistoryServer API history_server_api_url=http://hadoop:19888
重啟Hue服務。這里我們可以通在遠程cmd中運行hive,在Hue中查看任務運行狀況
Hue與Hive的集成
-
hive-site.xml:
注:metastore應該作為一個服務起來,然后讓客戶端去連接這個服務,去讀mysql數據庫里面的數據,可以參考hive官網上的Administrator Documentation中的Setting Up MetaStore<property> <name>hive.metastore.uris</name> <value>thrift://hadoop:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property>
配置完成以后,需啟動服務:
nohup {$HIVE_HOME}/bin/hive --service metastore &
nohup {$HIVE_HOME}/bin/hiveserver2 &
-
hue.ini
# Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=hadoop # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located hive_conf_dir=/opt/cdh5/hive-0.13.1-cdh5.3.6/conf # Timeout in seconds for thrift calls to Hive service server_conn_timeout=120
注:重新啟動hive和hue以后,可能在hue中運行sql時會出現錯誤,因為權限問題,hue登陸的用戶和hdfs上創建表的用戶不相同,這個時候需要用hadoop的命令在后台做出更改
bin/hdfs dfs -R o+x /xx
Hue與RDBMS的集成
-
在hue.ini中配置Hue本身的數據庫SQLite
[[[sqlite]]] # Name to show in the UI. nice_name=SQLite # For SQLite, name defines the path to the database. name=/opt/cdh5/hue-3.7.0-cdh5.3.6/desktop/desktop.db # Database backend to use. engine=sqlite
-
在hue.ini中配置Mysql數據庫
# Name to show in the UI. nice_name="My SQL DB" ## nice_name=MySqlDB # For MySQL and PostgreSQL, name is the name of the database. # For Oracle, Name is instance of the Oracle server. For express edition # this is 'xe' by default. ## name=db_track # Database backend to use. This can be: # 1. mysql # 2. postgresql # 3. oracle engine=mysql # IP or hostname of the database to connect to. host=hadoop # Port the database server is listening to. Defaults are: # 1. MySQL: 3306 # 2. PostgreSQL: 5432 # 3. Oracle Express Edition: 1521 port=3306 # Username to authenticate with when connecting to the database. user=root # Password matching the username to authenticate with when # connecting to the database. password=123456
重啟hue服務,可以在頁面中看到配置的數據庫了: