postgresql 不同數據庫不同模式下的數據遷移

本文轉載自查看原文 2019-04-10 11:17 774 postgersql 數據庫

編寫不容易,轉載請注明出處謝謝,

數據遷移

因為之前爬蟲的時候，一部分數據並沒有上傳到服務器，在本地。本來用的就是postgresql，也沒用多久，數據遷移的時候，也遇到了很多問題，第一次使pg_dump xx > filename進行數據備份遷移的時候，發現恢復后的數據，和原來的數據庫，模式一樣，后來這種方法就被我舍棄了。

后來想到把原來的數據庫中數據使用pandas導出來，再次存入新的數據庫中，可能有一點麻煩,但是我覺得這種方法挺好用的。下邊就介紹這種方法。

獲取需要遷移數據庫中模式下的所有表名

import pandas as pd
import psycopg2

# 連接數據庫
conn = psycopg2.connect(database='58TC',
                        user='postgres',
                        password='123456',
                        host='127.0.0.1',
                        port=5432)

# 獲取模式下的所有表的名字
tables = pd.read_sql_query("select * from pg_tables where schemaname='2019_3_11'",con=conn)
tables.head()

當前模式下的所有表

table_list = tables['tablename']

DataFrame中的數據寫入postgresql

此處我借鑒的網上的一種方法,原文是哪里,我已經忘記了，感謝他的分享，下次找到再補上去。因為單獨使用df.to_sql速度太慢了,我的數據想還挺大的，使用sqlalchemy和copy語句能大幅度提到寫入效率。

# df 寫入數據庫

import io
import pandas as pd
from sqlalchemy import create_engine

def write_to_table(df, table_name, if_exists='fail'):
    db_engine = create_engine('postgresql://postgres:xxxxxx@XXXXX/***')# 初始化引擎
    # db_engine = create_engine('postgresql://user:password@host/database')# 初始化引擎
    string_data_io = io.StringIO()      # 內存緩沖粗進行讀寫操作
    df.to_csv(string_data_io, sep='|', index=False)
    pd_sql_engine = pd.io.sql.pandasSQL_builder(db_engine)
    table = pd.io.sql.SQLTable(table_name, pd_sql_engine, frame=df,
                               index=False, if_exists=if_exists,
                               schema = '2019-3-11-particulars')
    table.create()
    string_data_io.seek(0)
    string_data_io.readline()  # remove header
    
    # 連接數據庫
    with db_engine.connect() as connection:
        with connection.connection.cursor() as cursor:  # 游標
            copy_cmd = '''COPY "2019-3-11-particulars"."%s" FROM STDIN HEADER DELIMITER '|' CSV''' %table_name
            # copy語句, 2019-3-11-particulars新數據庫中的模式名
            print(copy_cmd)
            cursor.copy_expert(copy_cmd, string_data_io)     # 執行語句
        connection.connection.commit()

pd.io.sql.pandasSQL_builder() PandasSQL子類
pd.io.sql.SQLTable() 用於將panda表映射到SQL表

參數說明:

table_name表名,

pd_sql_engine sql引擎,

framedf,

index,索引

if_exists,添加方式參數有
append表存在追加, fail,表存在跳過, replace,表存在刪除重建

schema 模式名

到此為止,基本工作完成,最后就是調用函數,執行遷移

for city_table in city_list:
    # 需要遷移的城市列表
    df = pd.read_sql_query('select * from "2019_3_12"."%s"' % city_table, con=conn)
  
    try:
        write_to_table(df,city_table)
    except Exception as e:
        print('c城市：',city_table,'錯誤',e)
    print(city_table,'導入完成')

原文鏈接: https://www.cnblogs.com/liqk/p/10682274.html
轉載請說明出處.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Postgresql 數據庫遷移步驟 Oracle數據庫遷移PostgreSQL——基礎篇數據庫從PostgreSQL遷移至Oracle指導書（三） postgresql使用pg_dump工具進行數據庫遷移 centos7下postgresql數據庫安裝及配置 Linux下從服務器A遷移mysql數據庫到數據庫B sqlserver數據庫遷移數據庫遷移工具 sqlserver 數據庫遷移 MySQL數據庫遷移