舉個例子來解釋一下查詢時,數據庫到底做了哪些操作。

create database TestDB;

CREATE TABLE table1 ( customer_id VARCHAR(10) NOT NULL, city VARCHAR(10) NOT NULL, PRIMARY KEY(customer_id) )ENGINE=INNODB DEFAULT CHARSET=UTF8; CREATE TABLE table2 ( order_id INT NOT NULL auto_increment, customer_id VARCHAR(10), PRIMARY KEY(order_id) )ENGINE=INNODB DEFAULT CHARSET=UTF8;

INSERT INTO table1(customer_id,city) VALUES('163','hangzhou'); INSERT INTO table1(customer_id,city) VALUES('9you','shanghai'); INSERT INTO table1(customer_id,city) VALUES('tx','hangzhou'); INSERT INTO table1(customer_id,city) VALUES('baidu','hangzhou'); INSERT INTO table2(customer_id) VALUES('163'); INSERT INTO table2(customer_id) VALUES('163'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('tx'); INSERT INTO table2(customer_id) VALUES(NULL);

mysql> select * from table1; +-------------+----------+ | customer_id | city | +-------------+----------+ | 163 | hangzhou | | 9you | shanghai | | baidu | hangzhou | | tx | hangzhou | +-------------+----------+ 4 rows in set (0.00 sec) mysql> select * from table2; +----------+-------------+ | order_id | customer_id | +----------+-------------+ | 1 | 163 | | 2 | 163 | | 3 | 9you | | 4 | 9you | | 5 | 9you | | 6 | tx | | 7 | NULL | +----------+-------------+ 7 rows in set (0.00 sec)

SELECT a.customer_id, COUNT(b.order_id) as total_orders FROM table1 AS a LEFT JOIN table2 AS b ON a.customer_id = b.customer_id WHERE a.city = 'hangzhou' GROUP BY a.customer_id HAVING count(b.order_id) < 2 ORDER BY total_orders DESC;
我們得知道這個sql語句做了那些事情。
首先執行的是from,因為我們需要知道最開始從哪個表開始的,這就是FROM
告訴我們的。left join的連表操作導致了笛卡爾積的產生所以我們查到的表是這樣一個笛卡兒積的表:
+-------------+----------+----------+-------------+ | customer_id | city | order_id | customer_id | +-------------+----------+----------+-------------+ | 163 | hangzhou | 1 | 163 | | 9you | shanghai | 1 | 163 | | baidu | hangzhou | 1 | 163 | | tx | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 2 | 163 | | baidu | hangzhou | 2 | 163 | | tx | hangzhou | 2 | 163 | | 163 | hangzhou | 3 | 9you | | 9you | shanghai | 3 | 9you | | baidu | hangzhou | 3 | 9you | | tx | hangzhou | 3 | 9you | | 163 | hangzhou | 4 | 9you | | 9you | shanghai | 4 | 9you | | baidu | hangzhou | 4 | 9you | | tx | hangzhou | 4 | 9you | | 163 | hangzhou | 5 | 9you | | 9you | shanghai | 5 | 9you | | baidu | hangzhou | 5 | 9you | | tx | hangzhou | 5 | 9you | | 163 | hangzhou | 6 | tx | | 9you | shanghai | 6 | tx | | baidu | hangzhou | 6 | tx | | tx | hangzhou | 6 | tx | | 163 | hangzhou | 7 | NULL | | 9you | shanghai | 7 | NULL | | baidu | hangzhou | 7 | NULL | | tx | hangzhou | 7 | NULL | +-------------+----------+----------+-------------+
接着sql語句執行到on,ON a.customer_id = b.customer_id條件過濾,根據ON
中指定的條件,去掉那些不符合條件的數據,得到的臨時表就是from查詢的表。
+-------------+----------+----------+-------------+ | customer_id | city | order_id | customer_id | +-------------+----------+----------+-------------+ | 163 | hangzhou | 1 | 163 | | 163 | hangzhou | 2 | 163 | | 9you | shanghai | 3 | 9you | | 9you | shanghai | 4 | 9you | | 9you | shanghai | 5 | 9you | | tx | hangzhou | 6 | tx | +-------------+----------+----------+-------------+
接着執行where進行過濾,在使用WHERE子句時,需要注意以下兩點:
由於數據還沒有分組,因此現在還不能在WHERE中使用where_condition=MIN(col)
這類對分組統計的過濾;
由於還沒有進行列的選取操作,因此在SELECT中使用列的別名也是不被允許的,如:SELECT city as c FROM t WHERE c='shanghai';
是不允許出現的。
執行group by進行分組,GROU BY子句主要是對使用WHERE
子句得到的虛擬表進行分組操作。我們執行測試語句中的GROUP BY a.customer
_id
,
+-------------+----------+----------+-------------+ | customer_id | city | order_id | customer_id | +-------------+----------+----------+-------------+ | 163 | hangzhou | 1 | 163 | | baidu | hangzhou | NULL | NULL | | tx | hangzhou | 6 | tx | +-------------+----------+----------+-------------+
下一步是進行having過濾函數,HAVING子句主要和GROUP BY
子句配合使用,執行測試語句中的HAVING count(b.order_id) < 2
時,將得到以下內容:
+-------------+----------+----------+-------------+ | customer_id | city | order_id | customer_id | +-------------+----------+----------+-------------+ | baidu | hangzhou | NULL | NULL | | tx | hangzhou | 6 | tx | +-------------+----------+----------+-------------+
最后一步執行到select查的動作,其實select雖然寫在一開始但是卻是最后執行的。select查詢上面得到的表進行匹配查找,上面的臨時表就是我們幾番周折所要查找的。行測試語句中的SELECT a.customer_id, COUNT(b.order_id) as total_orders
,得到結果為:
+-------------+--------------+ | customer_id | total_orders | +-------------+--------------+ | baidu | 0 | | tx | 1 | +-------------+--------------+
得到要查詢的表以后才會執行order by進行排序,然后返回一個新的虛擬表,我們執行測試SQL語句中的ORDER BY total_orders DESC
,就會得到以下內容:
+-------------+--------------+ | customer_id | total_orders | +-------------+--------------+ | tx | 1 | | baidu | 0 | +-------------+--------------+
優先級的底端是limit子句,對於沒有應用ORDER BY的LIMIT子句,得到的結果同樣是無序的,所以,很多時候,我們都會看到LIMIT子句會和ORDER BY子句一起使用。
注:我們通常用limit語句來解決分頁問題。對於小數據,使用LIMIT子句沒有任何問題,當數據量非常大的時候,使用LIMIT n, m
是非常低效的。因為LIMIT的機制是每次都是從頭開始掃描,如果需要從第60萬行開始,讀取3條數據,就需要先掃描定位到60萬行,然后再進行讀取,而掃描的過程是一個非常低效的過程。所以,對於大數據處理時,是非常有必要在應用層建立一定的緩存機制(現在的大數據處理,大都使用緩存)。
通過這個示例簡單的梳理了一下select查詢時的優先級問題。
所以可以簡單的看作執行順序為:
(7) SELECT (8) DISTINCT <select_list> (1) FROM <left_table> (3) <join_type> JOIN <right_table> (2) ON <join_condition> (4) WHERE <where_condition> (5) GROUP BY <group_by_list> (6) HAVING <having_condition> (9) ORDER BY <order_by_condition> (10) LIMIT <limit_number>