Extraction-Transformation-Loading的縮寫,中文名稱為數據提取、轉換和加載。
將數據從ORACLE中抽取數據,經過hive進行分析轉換,最后存放到ORACLE中去。
本案例是純demo級別,練手使用
一、需求
將emp和dept表的數據分析最后存放到result表。
emp和dept表均為oracle自帶的表,表結構如下:
emp表
EMPNO | NUMBER(4) |
ENAME | VARCHAR2(10) |
JOB | VARCHAR2(9) |
MGR | NUMBER(4) |
HIREDATE | DATE |
SAL | NUMBER(7,2) |
COMM | NUMBER(7,2) |
DEPTNO | NUMBER(2) |
dept表
DEPTNO | NUMBER(2) |
DNAME | VARCHAR2(14) |
LOC | VARCHAR2(13) |
result表
EMPNO | 員工號 |
ENAME | 員工姓名 |
COMMN | 津貼 |
DNAME | 部門號 |
二、數據准備
創建hive表
create table emp_etl( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
create table dept_etl( deptno int, dname string, loc string ) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile; create table tmp_result_etl( empno int, ename string, comm double, dname string ) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
create table result_etl( empno int, ename string, comm double, dname string ) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
導入hive數據
sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \ --username SCOTT --password TIGER \ --table EMP \ --hive-overwrite --hive-import --hive-table emp_etl \ --null-string '' --null-non-string '0' \ --fields-terminated-by '\t' --lines-terminated-by '\n' -m 3;
sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \ --username SCOTT --password TIGER \ --table DEPT \ --hive-overwrite --hive-import --hive-table dept_etl \ --null-string '' --null-non-string '0' \ --fields-terminated-by '\t' --lines-terminated-by '\n' -m 3;
三、實現方式
在hive中分析處理,將結果導出到HDFS中,再使用SQOOP將HDFS結果導入到數據庫。
1)抽取:ORACLE數據抽取到HIVE。參見前面兩步。
2)轉換:將查詢結果插入到hive表中
INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno);
3)轉換:將數據導入到HDFS文件系統中
INSERT OVERWRITE DIRECTORY 'RESULT_ETL_HIVE' SELECT * from result_etl;
4)加載:將HDFS系統中的數據加載到ORACLE中(結果表需要手工創建)
創建ORACLE表用於存放ETL結果
CREATE TABLE RESULT_ETL2( empno INT, ename VARCHAR(10), COMM DOUBLE, dname VARCHAR(14) );
sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \ --username SCOTT --password TIGER \ --table RESULT_ETL2 \ --export-dir /user/hadoop/RESULT_ETL_HIVE \ --fields-terminated-by '\001' \ -m 2;
或者將所有的腳本(除ORACLE創建表外)放到shell文件中,一並執行
#!/bin/sh . /etc/profile set -x hql="INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno) " hive -e "$hql" sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \ --username SCOTT --password TIGER \ --table RESULT_ETL2 \ --export-dir /user/hadoop/RESULT_ETL_HIVE \ --fields-terminated-by '\001' \ -m 2;