記錄Presto數據查詢引擎的配置過程


配置准備:

1、centos6.4系統的虛擬機4個(master、secondary、node1、node2)

2、准備安裝包

    hadoop-cdh4.4.0、hive-cdh4.4.0、presto、discovery-server、hbase、JDK7.0+64bit、pythin2.4+、postgresql

3、配置規划
    主機:192.168.69.180 master  (hadoop、hbase、discovery-server、hive、presto、postgresql)
    副主機:192.168.69.181 secondary(hadoop、hbase、presto)
    節點:192.168.69.182 node1(hadoop、hbase、presto),192.168.69.183 node2(hadoop、hbase、presto)
 
配置步驟:
1、在每個虛擬機上安裝 Java JDK和python。
2、修改每台虛擬機的hosts文件,如下紅色部分:
[root@master ~]# cat /etc/hosts
127.0.0.1   localhost
::1         localhost 
192.168.69.180     master
192.168.69.181     secondary
192.168.69.182     node1
192.168.69.182     node3
3、關閉所有虛擬機的防火牆,命令如下:
[root@master~]# service iptables stop
4、配置主機與節點機無密碼連接(以master與secondary為例,紅色部分不錄入):
master:
[root@master:~]$mkdir .ssh
[root@master:~]$cd .ssh
[root@master:.ssh]$ssh-keygen  -t  rsa(執行該命令后輸入名稱,這里為方便使用master,輸入執行后繼續按提示輸入密碼,如:123456)
[root@master:.ssh]$cp master.pub authorized_keys(將公鑰加入authorized_keys)
[root@master:.ssh] $scp master.pub secondary:/root/.ssh/(執行該命令將master.pub同步到secondary的/root/.ssh/,然后在secondary中執行cp master.pub authorized_keys命令,同樣,將master.pub同步到node1和node2執行同樣命令)
secondary:
[root@secondary:~]$mkdir .ssh
[root@secondary:~]$cd .ssh
[root@secondary:.ssh] cp master.pub authorized_keys
[root@secondary:.ssh]$ssh-keygen  -t  rsa(執行該命令后輸入名稱,這里為方便使用secondary,輸入執行后繼續按提示輸入密碼,如:123456)
[root@secondary:.ssh]$cat secondary.pub >> authorized_keys(將公鑰追加authorized_keys)
[root@secondary:.ssh]$scp secondary.pub master:/root/.ssh/(執行該命令將secondary.pub同步到master的/root/.ssh/,然后在master中執行cat secondary.pub >> authorized_keys命令將公鑰追加到authorized_keys中)

注:Ssh 權限配置問題:
用戶目錄權限為 755 或者 700就是不能是77x
.ssh目錄權限必須為755
rsa_id.pub 及authorized_keys權限必須為644
rsa_id權限必須為600
最后,在master中測試:ssh master date、ssh secondary date、ssh node1 date、ssh node2 date 不需要密碼,則成功。
如果ssh secondary 、ssh node1、ssh node2 連接速度慢,需要更改/etc/ssh/ssh_config 為GSSAPIAuthentication no

修改root的ssh權限,/etc/ssh/sshd_config,將PermitRootLogin no 改為yes
重啟sshd服務:/etc/init.d/sshd restrat

5、配置環境變量

[root@master~]# gedit .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs
export JAVA_HOME=/usr/java/jdk1.7.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASS_PATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jar:/usr/presto/server/lib:/usr/discovery-server/lib

export HADOOP_HOME=/usr/hadoop
export HIVE_HOME=/usr/hive
export HBASE_HOME=/usr/hbase

export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin
master環境變量配置好后,secondary、node1和node2同樣配置,可以使用scp命令同步到secondary、node1和node2中

6、配置hadoop

a、下載並解壓hadoop-2.2.0-cdh4.4.0.tar.gz,將解壓文件復制到/usr下,即:/usr/ hadoop
b、配置core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--fs.default.name for MRV1 ,fs.defaultFS for MRV2(yarn) -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>10080</value>
</property>
</configuration>

c、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/opt/data/hadoop-${user.name}</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>secondary:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

d、masters(沒有則創建該文件)

 master
secondary
 
e、slaves (沒有則創建該文件)
node1
node2
 
f、mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
<property>
 <name>mapreduce.jobhistory.address</name>
 <value>master:10020</value>
</property>
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>master:19888</value>
</property>
</configuration>

g、yarn-site.xml

<?xml version="1.0"?>
<configuration>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,
    $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
    $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
    $YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*,
    $YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/opt/data/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/opt/data/yarn/logs</value>
  </property>
  <property>
    <description>Where to aggregate logs</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/opt/data/yarn/logs</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/user</value>
 </property>
</configuration>

h、復制hadoop到secondary、node1和node2

i、hadoop第一次運行需要先格式化,命令如下:[root@tamaster hadoop]hadoop namenode -format

j、關閉hadoop安全模式,命令如下:hdfs dfsadmin -safemode leave

k、運行hadoop,命令: [root@tamaster:~]start-all.sh

7、安裝hbase
a、解壓hbase壓縮包,復制到/usr,即:/usr/hbase
b 、regionservers

master
secondary
node1

c、hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master/hbase-${user.name}</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/data/hbase-${user.name}</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,secondary,node1,node2</value>
</property>
</configuration>

d、將hbase同步到secondary、node1、node2中

e、啟動hbase,命令如下:

[root@master:~]# start-hbase.sh

 

8、安裝hive

a、下載hive壓縮包,並將其解壓到/usr,即:/usr/hive

b、hive-site.xml

<?xml version="1.0"?>                
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:postgresql://master/testdb</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.postgresql.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hiveuser</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>redhat</value>
  <description>password to use against metastore database</description>
</property>
<property>
 <name>mapred.job.tracker</name>
 <value>master:8031</value>
</property>
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
<property>
  <name>hive.aux.jars.path</name>
  <value>file:///usr/hive/lib/zookeeper-3.4.5-cdh4.4.0.jar,
    file:///usr/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar,
    file:///usr/hive/lib/hbase-0.94.2-cdh4.4.0.jar,
    file:///usr/hive/lib/guava-11.0.2.jar</value>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/opt/data/warehouse-${user.name}</value>
  <description>location of default database for the warehouse</description>
</property>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/opt/data/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>
<property>
  <name>hive.querylog.location</name>
  <value>/opt/data/querylog-${user.name}</value>
  <description>
    Location of Hive run time structured log file
  </description>
</property>
<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>
<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>node1</value>
</property>
<property>
  <name>hive.hwi.listen.host</name>
  <value>desktop1</value>
  <description>This is the host address the Hive Web Interface will listen on</description>
</property>
<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
  <description>This is the port the Hive Web Interface will listen on</description>
</property>
<property>
  <name>hive.hwi.war.file</name>
  <value>lib/hive-hwi-0.10.0-cdh4.2.0.war</value>
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
</configuration>

9、安裝postgresql(用postgresql作為元數據庫)

a、下載postgresql,並安裝
b、使用pgadmin創建用戶sa
c、使用pgadmin創建數據庫testdb,並指定所屬角色為sa
d、配置pg_hba.conf的訪問地址,允許主機訪問
e、配置postgresql.conf
     standard_conforming_strings = off
f、復制postgres 的jdbc驅動 到 /usr/hive-cdh4.4.0/lib

10、安裝presto
a、下載並解壓到/usr下,即:/usr/presto
b、presto文件夾中創建etc文件夾,並在其中建立以下配置文件

1)node.properties

node.environment=production
node.id=F25B16CB-5D5B-50FD-A30D-B2221D71C882
node.data-dir=/var/presto/data
注意每台服務器node.id必須是唯一的
2)jvm.config
-server
-Xmx16G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:PermSize=150M
-XX:MaxPermSize=150M
-XX:ReservedCodeCacheSize=150M
-Xbootclasspath/p:/var/presto/installation/lib/floatingdecimal-0.1.jar
下載floatingdecimal-0.1.jar包放在/var/presto/installation/lib/目錄下
3)config.properties
coordinator=true
datasources=jmx
http-server.http.port=8080
presto-metastore.db.type=h2
presto-metastore.db.filename=var/db/MetaStore
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://master:8411
以上為master的配置,secondary、node1和node2中需將coordinator=true值改為false,將discovery-server.enabled=true刪除掉
4)log.properties
com.facebook.presto=DEBUG
5)在/usr/presto/etc中創建catalog文件夾,並創建以下配置文件
jmx.properties
connector.name=jmx
hive.propertes
connector.name=hive-cdh4
hive.metastore.uri=thrift://master:9083

11、安裝discovery-service
a、下載並解壓 discovery-service壓縮包,放在/usr下,即:/usr/ discovery-service
b、與presto配置一樣,在 /usr/ discovery-service下創建etc文件夾,並在其中創建以下配置文件

1)node.properties
node.environment=production
node.id=D28C24CF-78A1-CD09-C693-7BDE66A51EFD
node.data-dir=/var/discovery/data
2)jvm.config
-server
-Xmx1G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
3)config.properties
http-server.http.port=8411

運行:

master機器上運行命令如下:

start-all.sh(啟動每台機器上的hadoop)

start-hbase.sh啟動每台機器上的hbase)

轉入usr/disdiscovery-server/bin中啟動disdiscovery-server,命令如下

laucher start //啟動
laucher run //運行
轉入/usr/hive/bin中啟動hive,命令如下:
./hive --service hiveserver -p 9083  //thrift模式
master及每台節點機上運行如下命令:
轉入/usr/presto/server/bin中啟動presto,命令如下:
laucher start //啟動
laucher run //運行
 
命令匯總:

1、啟動hadoop命令:
hadoop namenode -format
hadoop datanode -format
start-all.sh
hadoop dfsadmin -safemode leave
hdfs dfsadmin -safemode leave
2、hive啟動命令:
./hive
./hive --service hiveserver -p 9083 //thrift模式
3、hbase 命令
./start-hbase.sh
4、discovery-server命令:
laucher start //啟動
laucher run //運行
lancher stop //停止
5、presto命令
laucher start //啟動
laucher run //運行
lancher stop //停止
6、presto 客戶端啟動
./presto --server localhost:8080 --catalog hive --schema default

測試:
在master機上運行presto的客戶端,命令如下:
轉入/usr/presto/client啟動客戶端,命令如下:
./presto --server localhost:8080 --catalog hive --schema default
客戶端啟動后執行 show tables;看是否成功
測試結果如下: 
以下測試虛擬機配置為:CentOS64bit  內存3個為2G 1個為1G    數據量為61W
 
 節點數     sql語句                                                                                                          執行時間(秒)

4 nodes   select Count(*) from mytable;                                                                                  10s

4 nodes   select Count(*),num from mytable group by num;                                                      10s

4 nodes   select num from mytable group by num having count(*)>1000;                                    10s

4 nodes   select min(num) from mytable group by num;                                                              9s

4 nodes   select min(num) from mytable;                                                                                   9s

4 nodes   select max(num) from mytable;                                                                                  9s

4 nodes   select min(num) from mytable group by num;                                                               9s

4 nodes   select row_number() over(partition by name order by num) as row_index from mytable;   16s

 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM