1 sqlldr
傳統路徑:sqlldr會利用sql插入為我們加載數據
直接路徑加載:sqlldr不適用sql,直接格式化數據塊,繞開undo,避開redo,最快的方法就是並行直接路徑加載
sqlldr只是一個命令行工具,並非一個api,在plsql中不能調用
2 sqlldr體系結構
2.1 控制部分
LOAD DATA—告訴sqlldr做什么,可以用
INFILE *
INTO TABLE BONUS
Insert-----默認
FIELDS TERMINATED BY ","
(ENAME,JOB,SAL)
BEGINDATA
SMITH,CLEAK,3904
ALLEN,SALESMAN,2891
WARD,SALESMAN,3128
KING,PRESIDENT,2523
--
LOAD DATA—告訴sqlldr做什么,可以用CONTINUE_LOAD來繼續加載
infile *——表示數據文件的位置,為*表示數據文件在控制文件ctl中。如果是一個路徑,表示數據與控制文件分離的。
into table bonus——表示插入表bonus,該表在sqlldr命令執行前就已經創建好。
into還有些參數:insert :向表中插入數據,此表必須為空,默認的參數insert
append:向表中追加數據,不管表中是否有數據
replace:替換表中數據,相當於先delete在insert
truncate:先truncate表中數據,在insert
Fields terminated by ‘,’表示數據部分的分隔符是逗號,,也可以替換成其他任何可見字符
(ENAME,JOB,SAL) 要插入表的列名
Bingdata 表示以下為加載的數據,當infile 為*有效
OPTIONALLY ENCLOSED BY 指明定界符
2.2 日志部分
sqlldr在默認情況下,會在sqlldr執行過程中,產生一個與控制文件同名的日志文件,。Log,日志文件中記錄了加載數據的各項統計信息,
錯誤文件,在加載過程中,由於數據不符合規范就會生成一個與控制文件同名的錯誤文件,
廢棄文件,。Dsc默認不會有
3 加載數據及常見問題
sqlldr userid=/ control=demo1.ctl
sqlldr userid/987064@orcl control=demo1.ctl
3.1 加載一個excel文件
將excel文件另存為csv格式的文件
然后控制文件中
LOAD DATA
INFILE 'F:\sqlldr\1024TEST.csv' --指定文件名
BADFILE 'F:\sqlldr\1024TEST.bad'
3.2 加載的文件不是逗號分開
1 可以修改數據文件,將其他分隔符替換為逗號
2 修改控制文件,FIELDS TERMINATED BY ",",","修改為實際的符號
3.3 要加載的數據中包含分隔符
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(DEPTNO, DNAME, LOC )
BEGINDATA
10,Sales,"Virginia,USA"---"Virginia,USA" 及一個字段
結果就是 10 Sales Virginia,USA
20 Accounting Va, "USA"
OPTIONALLY ENCLOSED BY 指明定界符
3.4 數據文件中沒有分隔符
數據文件中沒有分隔符,在控制文件中
(
ENAME position(1:5),
JOB position(7:15),
SAL position(17:20)
) 用position關鍵字來指定列的起始結束位置,比如JOB position(7:15),job從第7個字符開始到第15個字符截止,
position可以position(*+2:15),用*相對偏移量,上次從哪里結束,下個字段就從哪里開始
postion(*)char(9)
3.5 數據文件中的列要比導入到表的列少
SQL> desc dept
Name Null? Type
----------------------------------------- -------- ---------------
DEPTNO NOT NULL NUMBER(2)
DNAME VARCHAR2(14)
LOC VARCHAR2(13)
表dept有3個列
控制文件
LOAD DATA
INFILE ldr_case3.dat
TRUNCATE INTO TABLE dept
(
ENAME position(1:5),
JOB position(7:15),
SAL "0") 在控制文件中增加一列
或者
SAL “substr(:job,1,1)”
3.6 數據文件中的列比表中多
在控制文件中可以用FILLER指定過濾列。
(
ENAME position(1:6),
TCOL FILLER position(8:11),
JOB position(13:21),
SAL position(23:26))
如果數據文件不是定長格式,而是通過分隔符來處理的,
FIELDS TERMINATED BY ","
(ENAME,TCOL FILLER,JOB,SAL) 過濾了數據文件中的第二列?
3.7 提供了多個數據文件要導入到同一張表
LOAD DATA
INFILE ldr_case8_1.dat
INFILE ldr_case8_2.dat
INFILE ldr_case8_3.dat
3.8 同一個數據文件導入到不同的表
LOAD DATA
INFILE ldr_case9.dat
DISCARDFILE ldr_case9.dsc
TRUNCATE
INTO TABLE BONUS
WHEN TAB='BON'
(TAB FILLER POSITION(1:3),
ENAME POSITION(5:9) ,
JOB POSITION(*+1:18),
SAL POSITION(*+1)
)
INTO TABLE MANAGER
WHEN TAB = 'MGR'
(TAB FILLER POSITION(1:3),
MGRNO POSITION(4:5) ,
MNAME POSITION(7:13),
JOB POSITION(*+1))
指定了when關鍵字,when邏輯判斷不知道or,連接條件只能有and,不支持or
When字句不是使用區間大於或小於,沒有or,沒有is null等
3.9 數據文件的前N行不想導入
sqlldr userid/987064@orcl control=demo1.ctl skip=N
sqlldr userid/987064@orcl control=demo1.ctl skip=4 LOAD=6及導入4到9行
3.10 要加載的數據文件中有換行符
Windows下回車+換行 chr(13)+chr(10),linux chr(10)
1 手工指定換行符
LOAD DATA
INFILE ldr_case11_1.dat
TRUNCATE INTO TABLE MANAGER
FIELDS TERMINATED BY ","
(MGRNO,
MNAME,
JOB,
REMARK "replace(:remark,'\\n',chr(10))"
)
2 指定FIX屬性來處理換行符(定長數據專用)
10,SMITH,SALES MANAGER,This is SMITH.\nHe is a Sales Manager.
Ctl
LOAD DATA
INFILE ldr_case11_2.dat "fix 68"—包含換行在內的68個字符
TRUNCATE INTO TABLE MANAGER
(
MGRNO position(1:2),
MNAME position(*+1:10),
JOB position(*+1:24),
REMARK position(*+1:65)
)
3 用var來處理換行
LOAD DATA
INFILE ldr_case11_3.dat "var 3" 通過var屬性來指定每行開頭固定的字符串長度
4 指定str屬性來處理換行
10,SMITH,SALES MANAGER,This is SMITH.
He is a Sales Manager.|
INFILE ldr_case11_4.dat "str '|\r\n'"
Windows 中 select utl_raw.cast_to_raw( '|'||chr(10) ) from dual;
Ctl
INFILE demo.dat "str X'7C0A'"
3.11 導入的字段包含lob
1 數據文件保存在控制文件中
先修改表的列類型為clob
(MGRNO, MNAME, JOB, REMARK char(100000)) 指定列的長度,
2 數據文件保存在獨立的文件中
create table lob_demo
2 ( owner varchar2(255),
3 time_stamp date,
4 filename varchar2(255),
5 data blob
6 )
Ctl
LOAD DATA
INFILE *
REPLACE
INTO TABLE LOB_DEMO
( owner position(17:25),
time_stamp position(44:55) date "Mon DD HH24:MI",
filename position(57:100),
data LOBFILE(filename) TERMINATED BY EOF
)
BEGINDATA
-rw-r--r-- 1 tkyte tkyte 1220342 Jun 17 15:26 classes12.zip
3.12 某些字段為null報錯
FIELDS TERMINATED BY "," TRAILING NULLCOLS
當某行對應的列沒有值時,sqlldr自動賦值為null,而不是報錯
3.13 導入日期格式
lOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
(DEPTNO,
DNAME,
LOC,
LAST_UPDATED date 'dd/mm/yyyy'
)
BEGINDATA
10,Sales,Virginia,1/5/2000
20,Accounting,Virginia,21/6/1999
LAST_UPDATED date 'yyyy-mm-dd hh24:mi:ss'
3.14 如何使用函數加載數據
FIELDS TERMINATED BY ','
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)", ---loc “222”該列所有值都替換成222
LAST_UPDATED
"case
when length(:last_updated) > 9
then to_date(:last_updated,'hh24:mi:ss dd/mm/yyyy')
when instr(:last_updated,':') > 0
then to_date(:last_updated,'hh24:mi:ss')
else to_date(:last_updated,'dd/mm/yyyy')
end"
append
INTO TABLE BULK_NUMBERS
FIELDS TERMINATED BY ','
Optionally enclosed by '"'
trailing nullcols
(id ,
a "substr(upper(:a),1,2)" ,--- a "replace(:a,:a,'000111111')",字符串類型
b "replace(:b,:b,111111)",
c ,
DATE1 date "MM-DD-YYYY HH24:MI:SS")
----------------b "replace(:b,:b,111111)"
-----------------b "222" a " '000222' ",單引號跟雙引號之間有空格
LOAD DATA
INFILE Book1.csv
APPEND INTO TABLE ruoxitest
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(ENAME,
JOB "lower(:job)",
sal "to_number(:sal)"
)
LOAD DATA
APPEND INTO TABLE RUOXITEST
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
(ENAME,JOB,SAL)
BEGINDATA
SMITH,CLEAK,3904
ALLEN,SALESMAN,2891
WARD,SALESMAN,3128
KING,PRESIDENT,2523
3.15 Sqlldr出現704+ora 12514錯誤
在F:\oracle\product\10.2.0\client_2\NETWORK\ADMIN的tns文件中,指定了相同的service name的監聽路徑。
Record 3: Rejected - Error on table RUOXITEST, column TEST.
ORA-01722: invalid number
[root@localhost oracle]# dos2unix Book1.csv
dos2unix: converting file Book1.csv to UNIX format ...
[oracle@localhost oracle]$ sqlldr scott/987064@grs control=case1.ctl
成功
4 加載大量數據
4.1 增加errors參數
>sqlldr scott/cxxxx@orcl control=xxxxx.ctl errors=10
明確指定出現錯誤到10次就停止加載
4.2 指定rows參數
sqlldr常規路徑導入默認是一次 64行,可以適當增加rows
rows=640
有可能rows的值超過了bindsize的值,bingsize的默認值256K,
>sqlldr scott/cxxxx@orcl control=xxxxx.ctl errors=10 rows=5000 bindsize=10485760
Bindsize 10M(1024*1024*10) =10485760
4.3使用直接路徑加載direct
>sqlldr scott/cxxxx@orcl control=xxxxx.ctl direct=true
直接路徑加載默認是讀取全部記錄,不需要rows參數,
直接路徑主要有2個參數:
streamsize 讀取到的數據存入流緩存區
streamsize 10M(1024*1024*10) =10485760
date_cache指定一個轉換后日期格式的緩沖區,以條為單位,默認1000,如有有導入的日期列
date_cache =5000
>sqlldr scott/cxxxx@orcl control=xxxxx.ctl direct=true Streamsize=10485760 date_cache=5000
5 外部表加載數據
5.1 建外部表
外部表導數據的限制:數據文件必須在服務器上,或在服務器上訪問的輸入文件。
多個用戶並發的使用相同的外部表來處理不同的輸入文件
sqlldr scott/xx#orcl demo1.ctl external_table=generate_only
External_table有3個參數值,
Not_used,默認值
Execute,說明sqlldr不會生成並行執行一個sql insert語句,而是會創建一個外部表,並使用一個批量sql語句來加載
Generate_only,sqlldr不加載任何數據,只是生成所執行的sql ddl和dml語句,並放到它創建的日志文件中
5.1.1 手工創建外部表
1 創建一個目錄
conn /as sysdba
create or replace directory xxx as ‘f:\sqlldr\’
grangt read,write on directory xxx to scott
2創建外部表
CREATE TABLE "SYS_SQLLDR_X_EXT_BULK_NUMBERS" ( "ID" NUMBER, "A" VARCHAR2(20 CHAR), "B" NUMBER, "C" VARCHAR(255), "DATE1" DATE ) ORGANIZATION external (TYPE oracle_loader DEFAULT DIRECTORY xxx ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE SKIP 4 FIELDS TERMINATED BY "," (ID,A,B,C,DATE1) ) Location( 'DEOMT.CTL' )
5.1.2 SQLLDR創建
Direct=true會覆蓋external_table=generate_only,
:\sqlldr>sqlldr scott/987064@orcl control=1024TEST.CTL external_table=generate_only
會根據控制文件中生成一個log文件
A 首先會創建一個目錄
CREATE DIRECTORY statements needed for files
------------------------------------------------------------------------
CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 AS 'F:\sqlldr\'
B 創建外部表
CREATE TABLE statement for external table:
------------------------------------------------------------------------
CREATE TABLE "SYS_SQLLDR_X_EXT_BULK_NUMBERS" ( "ID" NUMBER, "A" VARCHAR2(20 CHAR), "B" NUMBER, "C" VARCHAR(255), "DATE1" DATE ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET ZHS16GBK BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'1024TEST.bad' DISCARDFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'1024TEST.dsc' LOGFILE '1024TEST.log_xt' READSIZE 1048576 FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM MISSING FIELD VALUES ARE NULL REJECT ROWS WITH ALL NULL FIELDS ( "ID" CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', "A" CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', "B" CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', "C" CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', "DATE1" CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' DATE_FORMAT DATE MASK "MM-DD-YYYY HH24:MI:SS" ) ) location ( '1024TEST.csv' ) )REJECT LIMIT UNLIMITED
1 type:oracle_loader傳統方式
Oracle_dump數據泵
2 DEFAULT DIRECTORY 指定數據文件所在路徑對於的directory的名稱
3 Records 該字句指定記錄結束標記 默認為:RECORDS DELIMITED BY NEWLINE
4 Badfile 錯誤文件名和路徑
5 Logfile 日志文件名
6 Readsize oracle讀取輸入數據文件所用的默認緩沖區,READSIZE 1048576=1m
7 Skip 跳過的記錄數
8 FIELDS TERMINATED BY ","
9 REJECT ROWS WITH ALL NULL FIELDS 該字句表示如果要加載的字段的所有行都是空值,則外部表並不執行加載,
10 Location 用來指定來源數據,
11 REJECT LIMIT UNLIMITED 用來接受查詢數據時能夠接受的錯誤數,不指定默認是0,UNLIMITED 表示不限制
INSERT statements used to load internal tables:
------------------------------------------------------------------------
INSERT /*+ append */ INTO BULK_NUMBERS ( ID, A, B, C, DATE1 ) SELECT "ID", "A", "B", 0, "DATE1" FROM "SYS_SQLLDR_X_EXT_BULK_NUMBERS"
然后手動執行sql
5.2 指定加載log
alter table xxxx access parameters ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE SKIP 5 LOGFILE '1024TEST.log_xt' FIELDS TERMINATED BY "," (ID,A,B,C,DATE1)
查看日志或錯誤日志
create table et_bad ( text1 varchar2(4000) , text2 varchar2(4000) , text3 varchar2(4000) ) organization external (type oracle_loader default directory SYS_SQLLDR_XT_TMPDIR_00000 access parameters ( records delimited by newline fields missing field values are null ( text1 position(1:4000), text2 position(4001:8000), text3 position(8001:12000) ) ) location ('demo1.bad') );
5.3 使用外部表加載不同的文件
alter table xxxx location(‘xxxx.ctl’,’xxxx.dat’)
5.4 多用戶問題
alter table xxxx location(‘xxxx1.dat’,’xxxx.dat’)
5.5 外部表加載的效率
主要由三方面 CPU,CACHE,I/O
CPU 對於cpu,只要空閑,oracle就會利用它
I/O 需要dba認真規划,是否啟用了歸檔,並行等,對io影響最常見的調整方式
Paralled 設置並行參數
Access parameters中顯示指定nologfile,nobadfile,nodisfile等降低磁盤io
CACHE 中,access parameters中2個參數,bindsize跟date_cache
6 數據泵卸載 10g以后
1 首先創建一個目錄
Create or replace directory as xxx ‘f:\mydb\’
create directory tmp as 'f:\mydb\'
2 然后准備一個簡單的select語句向這個目錄中卸載數據
create table all_objects_unload organization external ( type oracle_datapump default directory xxx location( 'allobjects.dat' ) ) as select * from all_objects;
create table all_objects_unload organization external ( type oracle_datapump default directory tmp location( 'allobjects.dat' ) ) as select * from bulk_numbers;
3 把allobjects.dat 改數據文件移植到另外一個服務器,然后在提取此ddl
select dbms_metadata.get_ddl( 'TABLE', 'ALL_OBJECTS_UNLOAD' ) from dual; select dbms_metadata.get_ddl( 'TABLE', 'ALL_OBJECTS_UNLOAD' ) from dual; CREATE TABLE "SCOTT"."ALL_OBJECTS_UNLOAD" ( "ID" NUMBER, "A" VARCHAR2(20 CHAR), "B" NUMBER, "C" NUMBER, "DATE1" DATE ) ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY "TMP" LOCATION ( 'allobjects.dat' ) )
然后 insert /*+ append */ into some_table select * from all_objects_unload;
CREATE TABLE xxx ( ID NUMBER, A VARCHAR2(20 CHAR), B NUMBER, C NUMBER, DATE1 DATE ); insert /*+ append */ into xxx select * from all_objects_unload;