set語句:
set語句有什么用?
試想如果要給數據集增加一列(固定列或者計算列),增加新變量或者創建子集
下面給出創建新列和增加固定列data步和sql過程的辦法
data me(keep=name newVariable total); set sashelp.class;
if sex='男'; newVariable=.; total = height+weight; run; proc print noobs;run; proc sql; select name, '.' as newVariable, height+weight as total from sashelp.class
where sex='男'; quit;
Set statement
Type: Executable
Syntax
SET<SAS-data-set(s)<(data-set-option(s))>><options>
Without Arguments
when you do not specify an argument, the SET statement reads an observation from the most recently created data set.
Arguments
SAS-data-set(s): specifies a one-level name, a two-level name, or one of the special SAS data set names.
(data-set-options): specifies actions SAS is to take when it reads variables or observations into the program data vector for processing.
DROP= KEEP= RENAME= (execution sequence: drop>keep>rename)
FIRSTOBS=(first obs to be read)
OBS=(last obs to be read) IN= WHERE=
Options
END: creates and names a temporary variable that contains an end-of-file indicator. The variable, which is initialized to zero, is set to 1 when SET reads the last observation of the last data set listed. This variable is not added to any new data set.
NOBS:creates and names a temporary variable whose value is usually the total number of observations in the input data set or data sets. If more than one data set is listed in the SET statement, NOBS= the total number of observations in the data sets that are listed. The number of observations includes those observations that are marked for deletion but are not yet deleted.
POINT:specifies a temporary variable whose numeric value determines which observation is read. POINT= causes the SET statement to use random (direct) access to read a SAS data set.
Details
What Set Does?
Each time the SET statement is executed, SAS reads one observation into the program data vector. SET reads all variables and all observations from the input data sets unless you tell SAS to do otherwise. A SET statement can contain multiple data sets; a DATA step can contain multiple SET statements
按從前到后的順序縱向堆疊數據集1-n。
sas程序內部執行的過程如下:
1:編譯階段
2:SAS reads the first observation from the first data set into the program data vector. It processes the first observation and executes other statements in the DATA step. It then writes the contents of the program data vector to the new data set.
3:SAS continues to read one observation at a time from the first data set until it finds an end-of-file indicator. The values of the variables in the program data vector are then set to missing, and SAS begins reading observations from the second data set, and so on, until it reads all observations from all data sets
對於帶by的set data1-datan
1:基於前面的描述增加 SAS creates the FIRST.variable and LAST.variable for each variable listed in the BY statement
2:清空變量的方式有不同,The values of the variables in the program data vector are set to missing each time SAS starts to read a new data set and when the BY group changes。
根據by組的改變來清空,當by組改變時會進行清空。
然后根據by進行觀測值的排序
對於兩個已經排好序的數據集,如果想要合並后依然排好序,有兩種方法
第一種:set data1 data2;然后再執行proc sort。
第二種:set data1 data2;by variable;這種效率比第一種高,雖然不知道why...但是書上這么說的。我覺得可能是數據讀取次數的問題吧,第二種只需要讀一次,第一種要讀兩次
set語句從一個或多個sas數據集中讀取觀測值並實現縱向合並,每一個set語句執行時,sas就會讀一個觀測到pdv中,一個data步可以有多個set語句,每個set語句可以跟多個sas數據集,多個set語句含有多個數據指針。
set會將輸入數據集中的所有觀測值和變量讀取,除非你中間執行其他步驟
SET<SAS-data-set(s)<(data-set-options(s) )>><options>;
(data-set-options) specifies actions SAS is to take when it reads variables or observations into the program data vector for processing.
Tip:Data set options that apply to a data set list apply to all of the data sets in the list. Data set options specify actions that apply only to the SAS data set with which they appear. They let you perform the following operations:
主要的功能是以下四天,並給出相關例子
renaming variables ex--> set sashelp.class(rename = (name = name_new));
selecting only the first or last n observations for processing sashelp.class(where =(sex='M')); where和rename要用括號括起來
dropping variables from processing or from the output data set sashelp.class(drop =name sex);sashelp.class(keep=name sex);
specifying a password for a data set
輸出兩個數據集
data d1(keep = name) d2(keep = name sex);
set sashelp.class(keep = name sex);
run;
IN=選項應用
IN本身不是變量,所以不能通過賦值語句獲得,IN=的最大作用是標識不同的數據集
data one; input x y$; cards; 1 a 2 b ; run; data two; input x z$; cards; 3 c 2 d ; run; data me; set one(in=ina)two(in=inb); if ina=1 then flag=1;else flag=0; run;
res:
data me;
set sashelp.class(firstobs=3 obs=6); /*讀取第三到第六個變量*/
run;
*獲取數據集中的總觀測數;
data me(keep = total);
set sashelp.Slkwxl nobs=total_obs; *if 0 then set sashelp.Slkwxl nobs=total_obs;改進語句,因為sas是先編譯再執行,所以可以選擇不執行,只獲取編譯的信息就足夠了
total = total_obs;
output;
stop; *這里用stop是因為,我們只要象征性讀取set中的第一條即可,輸出total變量,然后終止程序;
run;
set的流程是這樣的,先set第一個觀測值,然后往下執行total=total_obs;然后繼續執行,遇到stop則停止,否則在沒遇到錯誤的情況下會返回data步開頭繼續set第二行觀測值,所以,如果不屑stop語句,則會出來總數個相同的值為總數的變量
1:程序編譯時首先讀nobs=選項,該選項在頭文件中,nobs=total_obs將總觀測數傳給臨時變量total_obs
2:pdv讀入數據集,並把所有變量放入pdv。
。。。。省略
POINT=選項 取指定的一條觀測
data me;
n=3;
set sashelp.class point=n;
output;
do n=3,6,10;
set sashelp.class point=n;
output;*獲取多個指定行的觀測;
end;
set sashelp.class nobs = last point=last;
output; *獲取最后一行觀測值;
stop;
run;
point=n對應的是變量,不能直接賦值數字,省略stop后會讓程序進入死循環,不用stop語句sas無法判斷該數據指針是否指向了最后一條觀測,從而會陷入死循環。如果不用output,會得不到數據集,point和stop一般是連在一起使用
_N_的使用
data d1 d2;
set sashelp.class;
if _n_ le 10 then output d1;
else output d2;
run;
set讀取序列數據集合的一些注意事項:
set goods1:;*讀取所有以good1開頭的文件,比如goods12 goods13; set sales1-sales4; set sales1 sales2 sales3 sales4;*這兩條語句等價; /* set sales1-sales99; *合法; set sales001-sales99; *不合法,如果以0開頭,那么后面的文件的數字要比前面的文件的數值的位數多,至少是相等; */ set cost1-cost4 cost2: cost33-37; *可行; /* these two lines are the same */ set sales1 - sales4; set 'sales1'n - 'sales4'n; /* blanks in these statements will cause errors */ set sales 1 - sales 4; set 'sales 1'n - 'sales 4'n; /* trailing blanks in this statement will be ignored */ set 'sales1 'n - 'sales4 'n;
set data1 data2;
其執行順序為先讀取data1,直至data1的最后一條語句后再讀取data2,並將其縱向合並。
雙set語句
set data1;set data2;
data a; input x $ @@; cards; a1 a2 a3 ; run; data b; input y $ @@; cards; b1 b2 ; run;
/*編譯后內存出現兩條數據指針分別指向a b,同時產生一個pdv*/
/*讀取數據集a的第一條觀測進入pdv,數據集b的第一條觀測進入pdv,然后輸出,再返回data步開頭,重復進行,當讀入a的第三行時,b中的指針已經指向了文件尾部,所以跳出data步*/ data ab; set a;set b; run; data ba; set b;set a; run; data c; set a b; run;
Widgets:
1. Get the count of observations.
%macro Get_Obs_Cnt(dsName); Data test; call symput('n_obs', last); if 0 then set &dsName nobs=last; run; %mend Get_Obs_Cnt;
%put 'n_obs=' &n_obs;
2. select random observations.
%macro Generate_Random_Obs(inData, outData, num); data &outData; rand_num = ceil(totalObs*ranui(totalObs)); do i=1 to # set &inData nobs=totalObs point=rand_num; if(_error_) then abort; output; end; stop; run; %mend Generate_Random_Obs;