python panda讀寫內存溢出：MemoryError

本文轉載自查看原文 2019-11-19 17:04 507 python數據分析

pandas中read_xxx的塊讀取功能

pandas設計時應該是早就考慮到了這些可能存在的問題，所以在read功能中設計了塊讀取的功能，也就是不會一次性把所有的數據都放到內存中來，而是分塊讀到內存中，最后再將塊合並到一起，形成一個完整的DataFrame。

def read_sql_table(table_name, con, schema=None, index_col=None,
                   coerce_float=True, parse_dates=None, columns=None,
                   chunksize=None):

1.chunksize是在一個每一個chunk塊中有多少行。

2.當chunksize是非None的時候read_xxx返回的是一個迭代器

比如我自己的寫的一個全量同步數據的代碼如下：

gtr=pd.read_sql_table(sync_table, data_from_engine_dict[database],chunksize=20000)
count=0
for df in gtr:
    if count==0:
        df.to_sql(database+"_"+sync_table, data_to_engine, if_exists="replace", index=False)
    else:

        df.to_sql(database + "_" + sync_table, data_to_engine, if_exists="append", index=False)

    count+=1

發現數據庫中的表會被修改，我今天做了如下升級：

其他的read_xxx也有類似的參數

pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, 
                usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, 
                skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, 
                skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True,
                iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, 
                escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, 
                memory_map=False, float_precision=None)[source]

我們再介紹一個不用改的參數：

low_memory : bool, default True
Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. 
To ensure no mixed types either set False, or specify the type with the dtype parameter. 
Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. 
(Only valid with C parser).

low_memory 默認就是True，如果不小心改成了False，chunksize參數不生效。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python模塊之StingIO（讀寫內存緩沖區數據） MemoryStream類讀寫內存 C#使用MemoryStream類讀寫內存 qt 取進程列表，讀寫內存, 寫字節集 python openpyxl內存不主動釋放 ——關閉Excel工作簿后內存依舊（MemoryError） C# 繪制矩形方框讀寫內存類 cs1.6人物透視例子 C++手寫內存池 Python之內存泄漏和內存溢出 python多線程內存溢出--ThreadPoolExecutor內存溢出 python panda::dataframe常用操作