Postgresql單表【插入】/【更新】百萬數據

本文轉載自查看原文 2019-08-13 14:44 979 Postgresql/ Python學習

一、插入數據

說到插入數據，一開始就想到：

insert int A values(*******************)

插入多條數據，最多想到：寫成這樣：

insert into A values(**********),(*************),(*****************)

但是在百萬數據面前，都太慢了。

1、用腳本的方式

 1 #!/bin/bash
 2 strsql="insert into tbl_devaccess8021x (uidrecordid, dtaccesstime, strmac, strusername, strswitchip, strifname, iisauthsuc,iisantipolicy,iisaccessed,strmachinecode,strrandomcode,iaccesstype,straccessfailedcode,uidroleid ,struserdes) values('d71803axxx1','2019-08-02 20:37:35', '1:2:3:4:5:6', 'criss0', '192.168.2.146','FastEthernet0/1',0,0,1,'000000000020A0B01','020A0B01',1,0,'研發','crissxu10')"
 3 
 4 for ((i=1; i <=3000000; i++))
 5 do
 6     strsql=$strsql",('d71803axxx$i',$(date +%s), '1:2:3:4:5:$i', 'criss$i', '192.168.2.$i','FastEthernet0/1',0,0,1,'000000000020A0B01','020A0B01',1,0,'研發','crissxu10')"
 7 
 8 done
 9 echo $strsql
10 #psql -d xxx -U xxx -c "$strsql"

上述在數據量小的時候，可以采用，數據量大的話特別耗時。

2、postgresql提供了copy函數，方便批量導入數據。

copy_from的參數說明：copy_from(file, table, sep='\t', null='\\N', size=8192, columns=None)

 1 import sys
 2 import psycopg2
 3 if sys.version_info.major == 2:
 4     import StringIO as io
 5 else:
 6     import io
 7 from datetime import datetime
 8 if __name__=='__main__':
 9     s = ""
10     start_time = datetime.now()
11     for i in range(0,10):
12         str_i = str(i)
13         temp = "d71803axxx{0}\t{1}\t1:2:3:4:5:{2}\tcriss{3}\t192.168.2.{4}\tFastEthernet0/1\t0\t0\t1\t000000000020A0B01\t020A0B01\t1\t0\t研發\tcrissxu10\n".format(str_i, datetime.now(),str_i,str_i,str_i)
14         s +=temp
15     conn = psycopg2.connect(host='127.0.0.1',user="xxx",password="xxx",database="xxx")
16     cur = conn.cursor()
17     cur.copy_from(io.StringIO(s),'tbl_devaccess8021x',columns=('uidrecordid', 'dtaccesstime', 'strmac', 'strusername', 'strswitchip', 'strifname', 'iisauthsuc','iisantipolicy','iisaccessed','strmachinecode','strrandomcode','iaccesstype','straccessfailedcode','uidroleid' ,'struserdes'))
18     conn.commit()
19     cur.close()
20     conn.close()
21     end_time = datetime.now()
22     print ('done. time:{0}'.format(end_time - start_time))

用copy_from 函數執行三百萬的數據，時間大概7分鍾左右。

3、先往臨時表中插入，然后再同步

1 insert into source_table select  temporary_table

二、更新數據

update table set col = value where col_condition=value;

更新數據的步驟是先找到符合條件的col_condition的數據，然后再執行更新。少量數據的時候，查詢速度快，當表里的數據達到一定量的時候，查詢性能受到影響，從而導致更新效率降低。

解決辦法：

1、對查詢條件加索引。

2、將多條數據合並成一條sql語句

1 update target_table set c2 = t.c2 from (values(1,1),(2,2),(3,3),…(2000,2000)) as t(c1,c2) where target_table.c1=t.c1

Reference:

【1】 http://www.voidcn.com/article/p-stwpqgta-bdq.html

"后來看到葛班長的日志，他通過Python在SQLite中插入100萬條數據只用了4秒，原因在於Python對所有的這100萬條插入語句進行了優化，將所有的插入操作放到了同一個事務中，這樣極大的減少了開啟和取消事務的時間，而正是這部分操作會消耗大量的時間"

這應該可以解釋為什么方法2

【2】http://www.voidcn.com/article/p-vvuwvbyw-yu.html

【3】https://help.aliyun.com/knowledge_detail/59076.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 SqlServer快速插入百萬條數據——表值參數 MongoDB插入百萬數據 PostgreSQL數據庫如果不存在則插入，存在則更新 PostgreSQL數據庫如果不存在則插入，存在則更新 .NET 百萬級大數據插入、更新，支持多種數據庫 PostgreSQL數據庫如果不存在則插入，存在則更新 PostgreSql 使用自定義序列（Sequence）向表插入數據 postgresql連表更新 Postgresql插入或更新操作upsert PostgreSQL連接python，postgresql在python 連接，創建表，創建表內容，插入操作，選擇操作，更新操作，刪除操作。