ClickHouse 支持的join類型說明


ClickHouse 支持的join類型說明

按照代碼Join.h的說明,ClickHouse支持14種Join,如下所示:

  * JOIN-s could be of these types:
  * - ALL × LEFT/INNER/RIGHT/FULL
  * - ANY × LEFT/INNER/RIGHT
  * - SEMI/ANTI x LEFT/RIGHT
  * - ASOF x LEFT/INNER
  * - CROSS

All和Any的區別如官網文檔所示:

ANYALL

在使用ALL修飾符對JOIN進行修飾時,如果右表中存在多個與左表關聯的數據,那么系統則將右表中所有可以與左表關聯的數據全部返回在結果中。這與SQL標准的JOIN行為相同。
在使用ANY修飾符對JOIN進行修飾時,如果右表中存在多個與左表關聯的數據,那么系統僅返回第一個與左表匹配的結果。如果左表與右表一一對應,不存在多余的行時,ANYALL的結果相同。

以INNER JOIN為例說明ANY和ALL的區別,先准備數據:

1、創建join_test庫

create database join_test engine=Ordinary;

2、創建left_t1和right_t1表

create table left_t1(a UInt16,b UInt16,create_date date)Engine=MergeTree(create_date,a,8192);

create table right_t1(a UInt16,b UInt16,create_date date)Engine=MergeTree(create_date,a,8192);

3、插入數據

insert into left_t1 values(1,11,2020-3-20);

insert into left_t1 values(2,22,2020-3-20);

insert into left_t1 values(3,22,2020-3-20);

insert into right_t1 values(1,111,2020-3-20);

insert into right_t1 values(2,222,2020-3-20);

insert into right_t1 values(2,2222,2020-3-20);

4、查看分別增加ANY和ALL對INNER JOIN輸出結果的影響

ALL INNER JOIN

select * from left_t1 all inner join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ALL INNER JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

3 rows in set. Elapsed: 0.019 sec.

右表right_t1存在兩條與左表left_t1匹配的結果,兩條全部返回。

ANY INNER JOIN

select * from left_t1 any inner join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ANY INNER JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

2 rows in set. Elapsed: 0.023 sec.

右表right_t1存在兩條與左表left_t1匹配的結果,但是只返回一條。

INNER JOIN

內連接,將left_t1表和right_t1表所有滿足left_t1.a=right_t1.a條件的記錄進行連接,如下圖所示:

select * from left_t1 all inner join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ALL INNER JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

3 rows in set. Elapsed: 0.134 sec.

LEFT JOIN

左連接,在內連接的基礎上,對於那些在right_t1表中找不到匹配記錄的left_t1表中的記錄,用空值或0進行連接,如下圖所示:

select * from left_t1 all left join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ALL LEFT JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 3 │ 22 │  1975-06-21 │          0 │          0 │           0000-00-00 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

4 rows in set. Elapsed: 0.013 sec.

RIGHT JOIN

右連接,在內連接的基礎上,對於那些在left_t1表中找不到匹配記錄的right_t1表中的記錄,用空值或0進行連接,如下圖所示:

select * from left_t1 all right join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ALL RIGHT JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

3 rows in set. Elapsed: 0.021 sec.

FULL JOIN

全連接,在內連接的基礎上,對於那些在left_t1表中找不到匹配記錄的right_t1表中的記錄和在right_t1表中找不到匹配記錄的left_t1表中的記錄,都用空值或0進行連接,如下圖所示:

select * from left_t1 all full join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ALL FULL OUTER JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 3 │ 22 │  1975-06-21 │          0 │          0 │           0000-00-00 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

4 rows in set. Elapsed: 0.046 sec.

SEMI LEFT JOIN  和  SEMI RIGHT JOIN    ANTI LEFT JOIN  和  ANTI RIGHT JOIN         Join.h中的解釋如下:

  * SEMI JOIN filter left table by keys that are present in right table for LEFT JOIN, and filter right table by keys from left table
  * for RIGHT JOIN. In other words SEMI JOIN returns only rows which joining keys present in another table.
  * ANTI JOIN is the same as SEMI JOIN but returns rows with joining keys that are NOT present in another table.
  * SEMI/ANTI JOINs allow to get values from both tables. For filter table it gets any row with joining same key. For ANTI JOIN it returns
  * defaults other table columns.

意思是:使用SEMI LEFT JOIN時,使用右表中存在的key去過濾左表中的key,如果左表存在與右表相同的key,則輸出。

            使用SEMI RIGHT JOIN時,使用左表中存在的key去過濾右表中的key,如果右表中存在與左表相同的key,則輸出。

            換句話說,SEMI JOIN返回key在另外一個表中存在的記錄行。

           ANTI JOIN和SEMI JOIN相反,他返回的是key在另外一個表中不存在的記錄行。

           SEMI JOIN和ANTI JOIN都允許從兩個表中獲取數據。對於被過濾的表,返回的是與key相同的記錄行。對於ANTI JOIN,另外一個表返回的是默認值,比如空值或0。

 select * from left_t1 semi left join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
SEMI LEFT JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘
┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

2 rows in set. Elapsed: 0.052 sec.

 

select * from left_t1 semi right join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
SEMI RIGHT JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 1 │ 11 │  1975-06-21 │          1 │        111 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │        222 │           1975-06-21 │
│ 2 │ 22 │  1975-06-21 │          2 │       2222 │           1975-06-21 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

3 rows in set. Elapsed: 1.327 sec.

 

select * from left_t1 anti left join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ANTI LEFT JOIN right_t1 ON left_t1.a = right_t1.a

┌─a─┬──b─┬─create_date─┬─right_t1.a─┬─right_t1.b─┬─right_t1.create_date─┐
│ 3 │ 22 │  1975-06-21 │          3 │          0 │           0000-00-00 │
└───┴────┴─────────────┴────────────┴────────────┴──────────────────────┘

1 rows in set. Elapsed: 0.061 sec.

 

select * from left_t1 anti right join right_t1 on left_t1.a=right_t1.a;

SELECT *
FROM left_t1
ANTI RIGHT JOIN right_t1 ON left_t1.a = right_t1.a

Ok.

0 rows in set. Elapsed: 0.024 sec.

ASOF LEFT  和  ASOF INNER 沒有具體的語法,本來想通過查看執行計划來看看,但是采用下述方式后,沒看到選擇什么方式,暫時不知道怎么能走到這兩個類型的處理方式上來。

clickhouse-client --send_logs_level=trace <<< 'select * from join_test.left_t1,join_test.right_t1 where join_test.left_t1.a<>1 and join_test.right_t1.a<>1' > /dev/null


下面為Join.h中的說明:

  * ASOF JOIN is not-equi join. For one key column it finds nearest value to join according to join inequality.
  * It's expected that ANY|SEMI LEFT JOIN is more efficient that ALL one.
  *
  * If INNER is specified - leave only rows that have matching rows from "right" table.
  * If LEFT is specified - in case when there is no matching row in "right" table, fill it with default values instead.
  * If RIGHT is specified - first process as INNER, but track what rows from the right table was joined,
  *  and at the end, add rows from right table that was not joined and substitute default values for columns of left table.
  * If FULL is specified - first process as LEFT, but track what rows from the right table was joined,
  *  and at the end, add rows from right table that was not joined and substitute default values for columns of left table.
  *
  * Thus, LEFT and RIGHT JOINs are not symmetric in terms of implementation.
  *
  * All JOINs (except CROSS) are done by equality condition on keys (equijoin).
  * Non-equality and other conditions are not supported.

僅支持等值條件的Join,不支持非等值和其他條件的Join。
  *
  * Implementation:實現機制如下:
  *
  * 1. Build hash table in memory from "right" table.
  * This hash table is in form of keys -> row in case of ANY or keys -> [rows...] in case of ALL.
  * This is done in insertFromBlock method.
  *一般將小表作為右表,根據右表在內存中構建hash表。這部分實現在insertFromBlock中完成。
  * 2. Process "left" table and join corresponding rows from "right" table by lookups in the map.
  * This is done in joinBlock methods.
  *遍歷左表,根據右表在內存中的map來連接對應行,這部分實現在joinBlock中完成。
  * In case of ANY LEFT JOIN - form new columns with found values or default values.
  * This is the most simple. Number of rows in left table does not change.
  *ANY LEFT JOIN左表行數量不變,使用匹配的值或默認值填充新列。
  * In case of ANY INNER JOIN - form new columns with found values,
  *  and also build a filter - in what rows nothing was found.
  * Then filter columns of "left" table.
  *ANY INNER JOIN 用滿足條件的值構建新列,用不滿足條件的行構建filter,然后用filter過濾左表。
  * In case of ALL ... JOIN - form new columns with all found rows,
  *  and also fill 'offsets' array, describing how many times we need to replicate values of "left" table.
  * Then replicate columns of "left" table.
  *ALL...JOIN 將所有找到的行合並為新列,並填充offsets數組,並描述需要把左表的值復制多少次,然后復制左表的列。
  * How Nullable keys are processed:
  *如何處理NULL值:
  * NULLs never join to anything, even to each other.

NULL永遠不會和任何值做JOIN,即使是NULL之間。
  * During building of map, we just skip keys with NULL value of any component.

構建Hash表的過程中,跳過任何NULL值。
  * During joining, we simply treat rows with any NULLs in key as non joined.
  *Join期間,將NULL值行視為未JOIN
  * Default values for outer joins (LEFT, RIGHT, FULL):
  *外部連接的默認值
  * Behaviour is controlled by 'join_use_nulls' settings.

行為由join_use_nulls參數控制。
  * If it is false, we substitute (global) default value for the data type, for non-joined rows
  *  (zero, empty string, etc. and NULL for Nullable data types).
  * If it is true, we always generate Nullable column and substitute NULLs for non-joined rows,
  *  as in standard SQL.

分兩種情況:當join_use_nulls參數為false時,用默認值替代未連接的行;當join_use_nulls為true時,用NULL替代未連接的行。

ANTI RIGHT JOIN


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM