Presto实战(转)


原文url: https://my.oschina.net/guol/blog/891156

 

 

介绍

        Presto是一个开源的分布式SQL查询引擎,适用于交互式分析查询,数据量支持GB到PB字节。Presto的设计和编写完全是为了解决像Facebook这样规模的商业数据仓库的交互式分析和处理速度的问题。Presto支持在线数据查询,包括Hive, Cassandra, Mysql关系数据库以及专有数据存储。也支持Redis,Mongodb,Kafak这样的系统通过SQL语句来查询数据。一条Presto查询可以将多个数据源的数据进行合并,可以跨越整个组织进行分析。

        第一次接触Presto,还是0.150,现在版本已经更新到0.174,可见presto的更新还是很活跃的,社区氛围也不错。

依赖

        Mac OS X or Linux
        Java 8 Update 92 or higher (8u92+), 64-bit
        Maven 3.3.9+ (for building)
        Python 2.4+ (for running with the launcher script)

架构

        coordinator

            presto中的coordinator主要是控制worker节点的,一般称为调度节点

        worker

            presto中的worker主要是工作节点,具体的查询都是在worker节点上执行的。

部署

    下载

cd /opt/programs wget 'https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.174/presto-server-0.174.tar.gz'

    coordinator部署

#进入下载目录 cd /opt/programs #解压 tar czvf presto-server-0.174.tar.gz -C presto_174 #进入主目录 cd presto_174 #创建配置目录 mkdir etc #etc目录的基础配置文件,均需要手动创建 ##jvm.config -server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p ##log.properties com.facebook.presto=INFO ##node.properties node.environment=production node.id=109 node.data-dir=/tmp/presto/data ##config.properties coordinator=true node-scheduler.include-coordinator=false http-server.http.port=9999 query.max-memory=20GB query.max-memory-per-node=4GB discovery-server.enabled=true discovery.uri=http://192.168.1.109:9999 

    worker部署

#进入下载目录 cd /opt/programs #解压 tar czvf presto-server-0.174.tar.gz -C presto_174 #进入主目录 cd presto_174 #创建配置目录 mkdir etc #etc目录的基础配置文件,均需要手动创建 ##jvm.config -server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p ##log.properties com.facebook.presto=INFO ##node.properties node.environment=production node.id=135 node.data-dir=/tmp/presto/data ##config.properties coordinator=false http-server.http.port=9999 query.max-memory=20GB query.max-memory-per-node=4GB discovery.uri=http://192.168.1.109:9999 

    Presto CLI部署

#进入bin cd /opt/programs/presto_174/bin  #下载 wget 'https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.174/presto-cli-0.174-executable.jar' mv presto-cli-0.174-executable.jar presto-cli chmod 755 presto-cli

    Presto UI

        http://192.168.1.109:9999

    Presto init.d

#!/bin/bash # #Author: dalu, Date: 2016/12/8 # ################################### # chkconfig read settings # chkconfig: - 99 50 # description: presto start script # processname: presto server ################################### export PATH=/opt/programs/jdk1.8.0_111/bin/:$PATH #java home JAVA_HOME="/opt/programs/jdk1.8.0_111" #app home APP_HOME='/opt/programs/presto_174/' #app name APP_NAME='PrestoServer' #app log dir APP_LOG="$APP_HOME/var/logs/" #java main function APP_MAINCLASS="PrestoServer" #start cmd CMD="/opt/programs/presto_174/bin/launcher" ################################### #check PrestoServer app is running ################################### psid=0 checkpid(){ javaps=`$JAVA_HOME/bin/jps -l | grep $APP_MAINCLASS` if [ -n "$javaps" ];then psid=`echo $javaps | awk '{print $1}'` else psid=0 fi } ################################### # start PrestoServer app ################################## start(){ checkpid if [ $psid -ne 0 ];then echo "================================" echo "warn: $APP_NAME already started! (pid=$psid)" echo "================================" else echo -n "Starting $APP_NAME ..." $CMD --pid-file=$APP_LOG/launcher.pid --launcher-log-file=$APP_LOG/launcher.log --server-log-file=$APP_LOG/server.log start 2>&1 checkpid if [ $psid -ne 0 ];then echo "(pid=$psid) [OK]" else echo "[Failed]" fi fi } ################################## # stop PrestoServer app ################################## stop(){ checkpid if [ $psid -ne 0 ];then echo -n "Stopping $APP_NAME ...(pid=$psid) " presto_pid=`$JAVA_HOME/bin/jps -l | grep $APP_MAINCLASS | awk '{print $1}'` kill $presto_pid if [ $? -eq 0 ];then echo "[OK]" else kill -9 $presto_pid sleep 5 checkpid if [ $psid -ne 0 ];then echo -n "[Failed]" fi fi else echo "================================" echo "warn: $APP_NAME is not running" echo "================================" fi } case "$1" in 'start') start ;; 'stop') stop ;; 'restart') stop start ;; *) echo "Usage: $0 {start|stop|restart}" exit 1 esac exit 0 

redis访问测试

    配置

#创建catalog mkdir -p /opt/programs/presto_174/etc/catalog #进入catalog cd /opt/programs/presto_174/etc/catalog #创建redis.properties connector.name=redis redis.table-names=antnest redis.nodes=192.168.1.109:6379 redis.password=dalu redis.default-schema=redis redis.database-index=0 redis.table-description-dir=etc/redis redis.hide-internal-columns=false

    创建redis映射

#创建redis映射目录 mkdir -p /opt/programs/presto_174/etc/redis #进入映射目录 cd /opt/programs/presto_174/etc/redis #创建映射文件redis.json { "tableName": "antnest", "schemaName": "redis", "key": { "dataFormat": "raw", "fields": [ { "name": "key", "type": "VARCHAR" } ] }, "value": { "dataFormat": "raw", "fields": [ { "name": "value", "type": "VARCHAR" } ] } }

        ps:上述所有配置文件在coordinator和worker都需要拷贝一份,然后重启coordinator和worker上的presto进程    

    测试

cd /opt/programs/presto_174/bin ./presto-cli --server 192.168.1.109:9999 --catalog redis

        

mongodb访问测试

    配置 : 可以创建多个,如 mongoUser.properties  ,       xxxx.properties

#在catalog目录创建mongodb配置mongodb.properties connector.name=mongodb mongodb.seeds=192.168.1.109:12001 mongodb.credentials=admin:dalu@admin mongodb.socket-keep-alive=true mongodb.schema-collection=admin 

    测试

kafka访问测试

    配置

#在catalog目录创建kafka配置 connector.name=kafka kafka.table-names=json_data kafka.nodes=192.168.1.109:9092 kafka.hide-internal-columns=false kafka.table-description-dir=etc/kafka kafka.default-schema=kafka

    配置映射

#创建kafka映射目录 mkdir -p /opt/programs/presto_174/etc/kafka #创建kafka映射文件json_data.json { "tableName": "json_data", "schemaName": "kafka", "topicName": "json_data", "key": { "dataFormat": "raw", "fields": [ { "name": "kafka_key", "type": "BIGINT", "dataFormat": "LONG", "hidden": "false" } ] }, "message": { "dataFormat": "json", "fields": [ { "name": "name", "mapping": "name", "type": "VARCHAR" }, { "name": "phone", "mapping": "phone", "type": "VARCHAR" } ] } } 

    测试

第三方WEB

    airpal

        airpal不建议使用,很久没有更新了,版本还在0.1,一直没有变化,优点是有认证,采用apache shiro认证,UI界面不错。

    yanagishima

        1.x版本的时候,还比较搓,作者比较勤奋,一直在更新,现在已经更新到3.0版本了,界面也比airpal好看了

        下载

cd /opt/programs wget 'wget https://bintray.com/artifact/download/wyukawa/generic/yanagishima-3.0.zip' unzip yanagishima-3.0.zip

        配置

cd /opt/programs/yanagishima-3.0/conf #配置文件yanagishima.properties jetty.port=8088 presto.query.max-run-time-seconds=1800 presto.max-result-file-byte-size=1073741824 presto.datasources=presto1 presto.coordinator.server.presto1=http://192.168.1.109:9999 catalog.presto1=mongodb schema.presto1=webspider select.limit=500 

        访问

            首页

            选择

            结果

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM