python pandas read_sql_query使用記錄

本文轉載自查看原文 2019-11-02 17:57 1374

版本: 系統win 10 ,python 3.5, pandas:0.25.0

解決問題:
讀取到的數據為科學計數法,然后轉換成整數影響精度.

pandas 使用 read_sql_query:
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)[source]

官方文檔:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query
其中:
參數:coerce_float 解釋為: 不是字符串類型和整形可控制是否強制轉換成 float64 類型

場景:
讀取數據庫數據:

比如:
`test01`
  10
 32861400000003365
None

讀取上述的數據使用 read_sql_query:會自動轉換為 float64, 並且讀出來的數據是科學計數法,然后使用 astype("int64"),或者讀取值進行(int)轉換,值會變化(小1),如下模擬數據

int(float(32861400000003365))
Out[8]: 32861400000003364

所以目標是不能被強制轉換成 float64,在這種場景下使用 coerce_float 不管是設置True還是False, 都不起作用,
而 read_sql_query 也沒有 dtype 強制設置類型的參數,
后來查源代碼;查到轉換類型的方法:

文件路徑; Python35\Lib\site-packages\pandas\core\internals\construction.py 位置大概在 587行左右

    def convert(arr):
        if dtype != object and dtype != np.object:
                  
            arr = lib.maybe_convert_objects(arr, try_float=coerce_float)
            arr = maybe_cast_to_datetime(arr, dtype)
        return arr

改成:

    def convert(arr):
        if dtype != object and dtype != np.object:           
            if coerce_float:
                arr = lib.maybe_convert_objects(arr, try_float=coerce_float)
            arr = maybe_cast_to_datetime(arr, dtype)
        return arr

之后使用 coerce_float 該參數就有效了.

注:

當處理數據的時候,

import pandas as pd
def test02():
    df = pd.DataFrame({
        "A":[32861400000003365,2,None,3],
        "B": [1,2,3,4]
    })
    print(df["A"].dtype,df["B"].dtype)
    print(df["A"][0])
test02()

output:

float64 int64
3.2861400000003364e+16

* 有None值,並且不指定類型.pd是會強制轉換成float64類型

而指定類型后int64:

def test02():
    df = pd.DataFrame({
        "A":[32861400000003365,2,None,3],
        "B": [1,2,3,4]
    },dtype="int64")
    print(df["A"].dtype,df["B"].dtype)
    print(df["A"][0])
test02()

output:
object int64
32861400000003365

因為有None,存在所以會強制轉換成字符型(str/object),所以 不指定類型時,
* 確保輸入的數字沒有None等空值
* 保證轉換后的數據不是浮點型
這個時候不指定類型時,不會影響輸出
確保輸入的數據沒有科學計數法,然后強制轉成int64數據精度不會丟失,

總結:

當輸入的數據有有數字和None 時,使用read_sql_query 中的coerce_float (False) 控制,此時讀取數據是字符型
單輸入數據有None 數字字符,這個時候系統應該會強制轉換成字符型的,也不會影響輸出
在讀取數據之前,把數據控制好,要么全是數字,要么有字符, 保證輸入數據自動轉換為int64 類型或者字符型
輸入整數控制長度,有None也不會影響結果
以上4種都不會因為產生 float64 對長整形的精度影響

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas read_sql與read_sql_table、read_sql_query 的區別 Python Pandas pandas.read_sql函數方法的使用 Python Pandas pandas.read_sql_table函數方法的使用 Python pandas.read_sql_query()常用操作方法代碼 pandas.read_sql_query()讀取數據庫數據用chunksize的坑 Python pandas.DataFrame.query函數方法的使用 python pandas.read_json pandas（2）：Pandas文件讀取——Pandas.read_sql() Python Pandas pandas.DataFrame.to_sql函數方法的使用 pandas read_html使用詳解（一）