PostgreSQL SQL優化之NOT IN問題

本文轉載自查看原文 2019-07-26 14:46 1431 PostgreSQL

在我們平時寫SQL時，如果遇到需要排除某些數據時，往往使用id <> xxx and id <> xxx，進而改進為id not in (xxx, xxx);

這樣寫沒有問題，而且簡化了SQL，但是往往有些極端情況，使用not in就會造成極大的性能損耗，例如：

select * from test where id not in (select id from test_back) and info like '%test%';

這樣的話select id from test_back將成為一個子查詢，而且不會走索引，每次走一遍全表掃描。

每一條滿足info like '%test%'的記錄都會去調用這個方法去判斷id是否不在子查詢中，具體的執行計划見下面的例子。

改進方法：

1）使用test和test_back進行聯合查詢，id <> id明顯是不行的，這樣只會判斷同一關聯條件下的一行中的id是否相同，無法做到排除某些id。

2）正確的方式應該使用not exists，將條件下推到里面，就不會出現子查詢了：

select * from test t1 where info like '%test%' and not exits (select 1 from test_back t2 where t2.id = t1.id);

apple=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 info   | text    |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

apple=# truncate test;
TRUNCATE TABLE
apple=# insert into test select generate_series(1, 100), 'test'||round(random()*10000)::text;
INSERT 0 100
apple=# select * from test limit 1;
 id |   info
----+----------
  1 | test9526
(1 row)

apple=# insert into test select generate_series(101, 200), 'tes'||round(random()*10000)::text;
INSERT 0 100                            
apple=# create table test_back as  select * from test where id between 50 and 70;
SELECT 21
apple=# explain select * from test where id not in (select id from test_back) and info like '%test%';
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on test  (cost=25.88..30.88 rows=49 width=12)
   Filter: ((NOT (hashed SubPlan 1)) AND (info ~~ '%test%'::text))
   SubPlan 1
     ->  Seq Scan on test_back  (cost=0.00..22.70 rows=1270 width=4)
(4 rows)

apple=# explain select * from test t1 where info like '%test%' and not exists (select 1 from test_back t2 where t2.id = t1.id);
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Anti Join  (cost=1.47..7.13 rows=89 width=12)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on test t1  (cost=0.00..4.50 rows=99 width=12)
         Filter: (info ~~ '%test%'::text)
   ->  Hash  (cost=1.21..1.21 rows=21 width=4)
         ->  Seq Scan on test_back t2  (cost=0.00..1.21 rows=21 width=4)
(6 rows)

例子里面沒有建索引，建索引后，這種優化方式效果更好。

那么進一步擴展來說：

1）!= 不是標准的SQL，<>才是，這兩個在PostgreSQL中是等效的。

2）exits和not exits的意思是逐條將條件下放到判斷條件，而jion方式是先對表進行笛卡爾積，然后判斷同行之間的各列值是否滿足關系。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 postgresql in 優化騰訊雲TDSQL PostgreSQL版 -最佳實踐｜優化 SQL 語句 Postgresql分表與優化 SQL 優化——一般步驟、索引問題、優化方法(ANALYZE、CHECK、OPTIMIZE)、常用 SQL 的優化 PostgreSQL 日常SQL記錄 PostgreSQL和Oracle的sql差異 postgresql 如何導入sql文件 Sql注入之postgresql PostgreSQL（一）教程 -----SQL語言 PostgreSQL常用SQL