在使用 awk 腳本;數組是一大利器;在很多場景是用數組能處理。
在 python 中,數據類型list;相當於array類型。
在 Oracle 中,對 array 不夠友好,感覺像是雞肋。但是在 PostgreSQL 中,對array有很多支持,很多場景可以應用到。下面慢慢說
1、any(array) 替換 in(table)
-- 案例1
-- 創建表A;插入1000條記錄;並每條記錄重復4次
postgres=# create table A (id int, info text);
CREATE TABLE
postgres=#
postgres=# insert into A select generate_series(1,1000), 'lottu';
INSERT 0 1000
postgres=#
postgres=# insert into A select generate_series(1,1000), 'lottu';
INSERT 0 1000
postgres=# insert into A select * from A;
INSERT 0 2000
-- 用in的方式去處理重復數據
postgres=# begin;
BEGIN
postgres=# explain (analyze, costs, timing) delete from A where ctid not in (select min(ctid) from A group by id, info);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Delete on a (cost=74.38..131.31 rows=1397 width=6) (actual time=12.619..12.619 rows=0 loops=1)
-> Seq Scan on a (cost=74.38..131.31 rows=1397 width=6) (actual time=5.146..7.129 rows=3000 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 1000
SubPlan 1
-> HashAggregate (cost=70.89..73.69 rows=279 width=42) (actual time=3.762..4.155 rows=1000 loops=1)
Group Key: a_1.id, a_1.info
-> Seq Scan on a a_1 (cost=0.00..49.94 rows=2794 width=42) (actual time=0.017..1.158 rows=4000 loops=1)
Planning Time: 1.923 ms
Execution Time: 44.130 ms
(10 rows)
-- 用any(array)的方式處理
postgres=# explain (analyze, costs, timing) delete from A
postgres-# where ctid = any(array (select ctid
postgres(# from (select "row_number"() over(partition by id, info) as rn,
postgres(# ctid
postgres(# from A) as ad
postgres(# where ad.rn > 1));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Delete on a (cost=300.69..340.79 rows=10 width=6) (actual time=17.686..17.686 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Subquery Scan on ad (cost=209.87..300.68 rows=931 width=6) (actual time=3.995..9.503 rows=3000 loops=1)
Filter: (ad.rn > 1)
Rows Removed by Filter: 1000
-> WindowAgg (cost=209.87..265.75 rows=2794 width=50) (actual time=3.986..8.570 rows=4000 loops=1)
-> Sort (cost=209.87..216.86 rows=2794 width=42) (actual time=3.974..4.577 rows=4000 loops=1)
Sort Key: a_1.id, a_1.info
Sort Method: quicksort Memory: 284kB
-> Seq Scan on a a_1 (cost=0.00..49.94 rows=2794 width=42) (actual time=0.015..1.486 rows=4000 loops=1)
-> Tid Scan on a (cost=0.01..40.11 rows=10 width=6) (actual time=11.130..12.945 rows=3000 loops=1)
TID Cond: (ctid = ANY ($0))
Planning Time: 0.619 ms
Execution Time: 17.808 ms
(14 rows)
結論:
1、效率大大提升;數據量越大提升效果越好;any(array) 的效果 >= in
2、判斷 array 所含元素的方法,有 any / some (any) 還有 all兩種方法
2、array 相關函數
-- string 轉換 array
-- 函數 string_to_array
select array_to_string(array[1, 2, 3], '~^~');
array_to_string
-----------------
1~^~2~^~3
-- 函數 string_to_array
select string_to_array('1~^~2~^~3','~^~');
string_to_array
-----------------
{1,2,3}
-- 函數 regexp_split_to_array;跟string_to_array有點類似
select regexp_split_to_array('1~^~2~^~3','\~\^\~');
regexp_split_to_array
-----------------------
{1,2,3}
-- 函數 unnest
select unnest(array['a', 'b', 'c']);
unnest
--------
a
b
c
-- 還可以結合with ordinality;添加行號
select * from unnest(array['a', 'b', 'c']) with ordinality;
unnest | ordinality
--------+------------
a | 1
b | 2
c | 3
3、數組列支持索引
在PostgreSQL中;在數組列還支持索引。對數組夠友好的吧;其他數據庫至少現在連苗頭還沒有。
在PostgreSQL中數組列支持的索引類型是GIN索引;即俗稱‘倒排索引’,常用於多值列上;例如json類型,數組類型,多列上,以及全文檢索上。可高效檢索某值是否存在。
postgres=> explain analyze SELECT * FROM tbl_contacts WHERE phone @> array['18800001921'::varchar(32)];
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_contacts (cost=29.69..2298.29 rows=1250 width=57) (actual time=0.031..0.031 rows=1 loops=1)
Recheck Cond: (phone @> '{18800001921}'::character varying(32)[])
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_contacts_phone (cost=0.00..29.37 rows=1250 width=0) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: (phone @> '{18800001921}'::character varying(32)[])
Planning Time: 0.097 ms
Execution Time: 0.055 ms
(7 rows)