Understanding Joins
1.Joins combine tables horizontally (side by side) by combining rows. The tables being joined are not required to have the same number of rows or columns. (被join的表不需要行或列與join表的相同)
2.When any type of join is processed, PROC SQL starts by generating a Cartesian product, which contains all possible combinations of rows from all tables.In all types of joins, PROC SQL generates a Cartesian product first, and then eliminates rows that do not meet any subsetting criteria that you have specified.(在所有的join過程中都是先建立笛卡爾積,再去一個個按照你表明的條件去刪除!表中重復的列在join中是不會自動合並的,需手動合並)。
3.連接最多包括32張表,不計算視圖數量,只計算視圖中的表的數量。
4.連接必須要類型相同,變量名不一定的相同

2.最簡單的join,不指定where選擇子集,則會生成一個最基本的笛卡爾積(包括兩個表所有可能的join)

理解連接的過程!!!!!!For all table
builds a Cartesian product of rows from the indicated tables
evaluates each row in the Cartesian product, based on the join conditions specified in
the WHERE clause (along with any other subsetting conditions), and removes any rows
that do not meet the specified conditions
if summary functions are specified, summarizes the applicable rows
returns the rows that are to be displayed in output.
有這個過程后,就能完全了解一對多,多對多,多對一連接后的結果了
反正全部都是進行一次所有行的笛卡爾積的生成,然后再按條件進行篩選,
而笛卡爾積的生成過程是主表對應附表行對行的一一對應(掃描)連接。但是具體過程更為復雜,涉及到分塊等情況
3:inner join
def:An inner join combines and displays only the rows from the first table that match rows from the second table, based on the matching criteria (內連接只會對兩表中基於准則的行進行組合和顯示),In an inner join, a WHERE clause is added to restrict the rows of the Cartesian product that will be displayed in output. (在內連接中,where從句是限制在笛卡爾輸出集中顯示的行的數量)
proc sql; select one.x, a, b /*select one.* , b* one.*表示表one中所有的列/ from one, two where one.x = two.x; quit;
3.1:在標准內連接中,出現兩個表都含有重復的值的情況,內連接會對所有滿足條件的觀測行進行一一對應的笛卡爾積


4:Outer Join
You can think of an outer join as an augmentation of an inner join:an outer join returns all rows generated by an inner join, plus additional (nonmatching) rows.
(外連接是內連接的一個augmentation,除了交的部分,還含有並的某些或全部)


4.1Using a Left Outer Join,左表變量順序保持不變
A left outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the left table (the first table specified in the FROM clause).(左連接會將所有滿足ON條件的行進行連接,並會額外加上左表中所有不滿足條件的行)In all three types of outer joins (left, right, and full), the columns in the result (combined) row that are from the unmatched row are set to missing values. (未滿足條件的右表的行被置為缺失值)

4.2:Using a Right Outer Join,右表變量順序保持不變
A right outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the right table (the second table specified in the FROM clause). (右連接會將所有滿足ON條件的行進行連接,並會額外加上左表中所有不滿足條件的行)

4.3:Using a Full Outer Join
A full outer join retrieves both matching rows and nonmatching rows from both tables. (full join把所有滿足和不滿足條件的行全部列出來)
如果要得出和merge一樣的效果,需要加入coalesce函數

5:Using In-Line Views <<nested query>>(Unlike other queries, an in-line view cannot contain an ORDER BY clause,暫時來說,In-Line Views除了不能使用Order By其余和select語句無任何區別!!!!!!)
An in-line view is a nested query that is specified in the outer query's FROM clause.
與subquery的區別:子查詢返回的是值,In-Line Views返回的是臨時表,子查詢在where后,In-Line Views在From后
優點:使用In-Line Views,在某些情況下效率會更高

6,Merge/Join的比較
join 不用排序、不用名字一樣、並且條件不限制在等號
當是處於一對一的情況時,和Inner Join對應產生的結果一樣。
data merged; merge one two; by x; run; proc print data=merged noobs; title ’Table Merged’; run; proc sql; title ’Table Merged’; select one.x, a, b from one, two where one.x = two.x order by x;
當處於有不對應的情況時,和Outer Join對應。
data merged; merge three four; by x; run; proc print data=merged noobs; title ’Table Merged’; run; proc sql; title ’Table Merged’; select coalesce(three.x, four.x) as X, a, b from three full join four on three.x = four.x;
