在SAS中,使用 SET 語句進行數據集縱向合並,用 MERGE 語句進行橫向合並:
DATA new_dataset; SET dataset_1 dataset_n;
DATA new_dataset; MERGE dataset_1 dataset_n; BY variable_list;
縱向合並后,new_dataset 的行數等於每個數據集行數的加總。If one of the data sets has a variable not contained in the other data sets, then the observations from the other data sets will have missing values for that variable.
橫向合並中的 by variable list 是所有數據集共同的變量。
一、縱向合並
例一:合並兩個數據集 southentrance 和 northentrance, 合並后數據集觀測值的順序維持各自不變
DATA both; SET southentrance northentrance; IF Age = . THEN AmountPaid = .; ELSE IF Age < 3 THEN AmountPaid = 0; ELSE IF Age < 65 THEN AmountPaid = 35; ELSE AmountPaid = 27; PROC PRINT DATA = both; TITLE 'Both Entrances'; RUN;
例二:合並后數據集觀測值按照 PassNumer 排序
DATA interleave; SET northentrance southentrance; BY PassNumber; PROC PRINT DATA = interleave; TITLE 'Both Entrances, By Pass Number'; RUN;
二、橫向合並
例一:合並兩個數據集 salesdata 和 descriptions,合並后的數據集包含兩個數據集的所有觀測值,相當於 full join
/*Merge之前必須先對兩個數據集按照 By variables 排序*/
DATA chocolates;
MERGE sales descriptions; BY CodeNum; PROC PRINT DATA = chocolates; TITLE ”Today's Chocolate Sales”; RUN;
合並后的數據集 chocolates 包含兩個數據集的所有 observations, 如果某條 observation 在另外一個數據集中沒有,則對應的variable展示為缺失值。
例二:一對多數據集合並,仍然是 full join
DATA prices; MERGE shoes discount; BY ExerciseType; NewPrice = ROUND(RegularPrice - (RegularPrice * Adjustment), .01); PROC PRINT DATA = prices; TITLE ’Price List for May’; RUN;
例三:數據集合並 - Merge vs. left join/ right join/ inner join
假設有兩個數據集 ICF 和 DM:
ICF data: DM data:
left join :
data New; merge ICF(in=a) DM(in=b); by cn dn; if a; run;
相當於
ICF a
left join DM b
on a.cn=b.cn and a.dn=b.dn
right join :
data New; merge ICF(in=a) DM(in=b); by cn dn; if b ; run;
inner join :
data New; merge ICF(in=a) DM(in=b); by cn dn; if a and b ; run;
相當於
ICF a
inner join DM b
on a.cn=b.cn and a.dn=b.dn
IN= option 生成臨時變量,滿足條件時等於1,否則為0。"If a;" equal to "if a=1". "If a=0" means to select observations not in dataset a but only in dataset b. 這個選項可以用在任何數據集合並中(SET, MERGE, UPDATE)。