postgresql-從表中隨機獲取一條記錄

本文轉載自查看原文 2019-05-30 10:37 2393 postgresql/ 隨機獲取

postgresql如何從表中高效的隨機獲取一條記錄
隨機獲取一條記錄random()
改寫1
改寫2
改寫3
對比
注意
結語

postgresql如何從表中高效的隨機獲取一條記錄

select C_BH from db_scld.t_scld_cprscjl  order by `random()` LIMIT 1;
select  c_jdrybm from db_scld.t_jdry 
      where c_bmbm = v_scdd and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13')  order by `random()` limit 1 
      
 db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl  order by random() LIMIT 1;
                                                             QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------
 Limit  (cost=61029.94..61029.94 rows=1 width=41) (actual time=587.193..587.193 rows=1 loops=1)
   ->  Sort  (cost=61029.94..63172.22 rows=856911 width=41) (actual time=587.185..587.185 rows=1 loops=1)
         Sort Key: (random())
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on t_scld_cprscjl  (cost=0.00..56745.39 rows=856911 width=41) (actual time=0.019..380.139 rows=854682 loop
s=1)
 Planning time: 1.179 ms
 Execution time: 587.242 ms
(7 rows)
--表總數量
 db_jdsjpt=# select count(*) from db_scld.t_scld_cprscjl;
 count  
--------
 854682
(1 row)

隨機獲取一條記錄random()

random()耗時：Time: 389.818 ms

--隨機獲取一條耗時
db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl  order by random() LIMIT 1;
               c_bh               
----------------------------------
 6d861b011c854040bf5b731f49d40b48
(1 row)

Time: 389.818 ms

改寫1

offset耗時：Time: 60.022 ms

--offset可以走索引，少了排序操作
db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl  offset floor(random()*854682) LIMIT 1;
               c_bh               
----------------------------------
 f90301bd8ac2485196ffae32ee70345c
(1 row)

Time: 60.022 ms

db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl  offset floor(random()*854682) LIMIT 1;
                                                                         QUERY PLAN                
                                
---------------------------------------------------------------------------------------------------
 Limit  (cost=3747.64..3747.68 rows=1 width=33) (actual time=30.758..30.759 rows=1 loops=1)
   ->  Index Only Scan using i_corscjl_cprscbh_ on t_scld_cprscjl  (cost=0.42..37472.65 rows=854682 width=33) (actual time=0.
047..25.808 rows=81993 loops=1)
         Heap Fetches: 0
 Planning time: 0.228 ms
 Execution time: 30.802 ms
(5 rows)

Time: 31.779 ms

改寫2

pg從9.5開始提供抽樣函數

使用tablesample抽樣的過程中比例不能太低,否則可能獲取不到結果，且不能帶有過濾條件

system耗時： Time: 0.639 ms

system：隨機性較差，效率高

--改寫后耗時
db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl  tablesample system(0.1) limit 1;
               c_bh               
----------------------------------
 e2fce25399db42f0bf49faf8e7214d5f
(1 row)

Time: 0.639 ms

--system隨機抽樣以塊為單位所以更快
db_jdsjpt=# explain analyze  select c_bh from db_scld.t_scld_cprscjl  tablesample system(0.1) limit 1;
                                                      QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.23 rows=1 width=33) (actual time=0.105..0.105 rows=1 loops=1)
   ->  Sample Scan on t_scld_cprscjl  (cost=0.00..192.55 rows=855 width=33) (actual time=0.102..0.102 rows=1 loops=1)
         Sampling: system ('0.1'::real)
 Planning time: 0.190 ms
 Execution time: 0.134 ms
(5 rows)

Time: 1.182 ms

改寫3

bernoulli:隨機性更好，但效率比system要低

bernoullih耗時：Time: 0.822 ms


db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) limit 1;
               c_bh               
----------------------------------
 7ec30761ffd04bd9ad77797a33645a84
(1 row)

Time: 0.822 ms

--bernoulli以行為單位進行抽樣，比system效率低點
db_jdsjpt=# explain analyze select c_bh from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) limit 1;
                                                       QUERY PLAN                                                       
---------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..53.85 rows=1 width=33) (actual time=1.410..1.411 rows=1 loops=1)
   ->  Sample Scan on t_scld_cprscjl  (cost=0.00..46042.55 rows=855 width=33) (actual time=1.408..1.408 rows=1 loops=1)
         Sampling: bernoulli ('0.1'::real)
 Planning time: 0.446 ms
 Execution time: 1.436 ms
(5 rows)

Time: 25.770 ms

同理另外一條sql也可用同樣的方式，且在c_bmbm字段上面加上索引

當有條件的時候可以使用offset獲取，offset的值也可以通過for循環傳入

db_jdsjpt=# select count(*) from db_scld.t_jdry;
 count  
--------
 214819
(1 row)

db_jdsjpt=# select  c_jdrybm from db_scld.t_jdry  where c_bmbm = '4402222804' and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13') offset floor(random()*214819) limit 1; 
 c_jdrybm 
----------
(0 rows)

Time: 1.924 ms

對比

方法	耗時
order by random()	389.818 ms
offset n	60.022 ms-240ms
system()	0.639 ms
bernoulli()	0.822 ms

使用offset的時候和n的大小有關系，當n越大，掃描的索引塊越多，就越大，但是相對於order by random()的方式仍然要快。

注意

system(0.1)等於百分之零點一，也就是抽樣千分之一 854682*0.001=854，大概每次抽取854條記錄

--system
db_jdsjpt=#  select count(*) from db_scld.t_scld_cprscjl  tablesample system(0.1) ;
 count 
-------
   592
(1 row)

Time: 1.499 ms

--bernoulli
db_jdsjpt=#  select count(*) from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) ;
 count 
-------
   840
(1 row)

Time: 86.037 ms
這里可以看出bernoulli效率比system要低

結語

1.隨機獲取表中的一條數據，當表中數據較小時使用random感覺不明顯，當數據量大時random由於每次都需要排序操作，導致隨機獲取一條的成本較高

4.隨機獲取一條記錄可以使用limit 1 offset n-1的方式，或者使用隨機抽樣的方式

5.無論是使用limit 1 offset n還是使用tablesample隨機抽樣方式都需要知道表中的數據量，不能超過表數據量

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PostgreSQL-隨機查詢N條記錄隨機獲取Mysql數據表的一條或多條記錄從mysql數據表中隨機取出一條記錄 postgresql-刪除重復數據保留一條 MySQL數據庫中隨機獲取一條或多條記錄 PostgreSQL-表空間查詢MYSQl數據表中的最后一條記錄使用一條sql查詢多個表中的記錄數 MYSQL中獲取得最后一條記錄的語句 mysql 隨機選取一條記錄