pandas操作數據庫


1.pandas需要配合sqlalchemy的使用

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://%s:%s@%s:3306/%s?charset=utf8" % (TEST_DB.user, TEST_DB.password, TEST_DB.host, TEST_DB.db))

exec_sql = ''
source_table = pd.read_sql(exec_sql, engine)

2.遍歷數據

# index代表索引,從0開始
# v代表數據庫中的數據
for index,v in source_table.iterrows():
        print(index, v)



# 精確取出數據,需要注意的是,取出的數據值,是一個series類型數據,不是string,需要string()后才可使用split
v["字段名稱"]即可

 

3.更改數據

# 使用iloc方法,
source_table.iloc[index,2] = int(result)

>>> df = pd.DataFrame(mydict)
        >>> df
              a     b     c     d
        0     1     2     3     4
        1   100   200   300   400
        2  1000  2000  3000  4000

        **Indexing just the rows**

        With a scalar integer.

        >>> type(df.iloc[0])
        <class 'pandas.core.series.Series'>
        >>> df.iloc[0]
        a    1
        b    2
        c    3
        d    4
        Name: 0, dtype: int64

        With a list of integers.

        >>> df.iloc[[0]]
           a  b  c  d
        0  1  2  3  4
        >>> type(df.iloc[[0]])
        <class 'pandas.core.frame.DataFrame'>

        >>> df.iloc[[0, 1]]
             a    b    c    d
        0    1    2    3    4
        1  100  200  300  400

        With a `slice` object.

        >>> df.iloc[:3]
              a     b     c     d
        0     1     2     3     4
        1   100   200   300   400
        2  1000  2000  3000  4000

        With a boolean mask the same length as the index.

        >>> df.iloc[[True, False, True]]
              a     b     c     d
        0     1     2     3     4
        2  1000  2000  3000  4000

        With a callable, useful in method chains. The `x` passed
        to the ``lambda`` is the DataFrame being sliced. This selects
        the rows whose index label even.

        >>> df.iloc[lambda x: x.index % 2 == 0]
              a     b     c     d
        0     1     2     3     4
        2  1000  2000  3000  4000

        **Indexing both axes**

        You can mix the indexer types for the index and columns. Use ``:`` to
        select the entire axis.

        With scalar integers.

        >>> df.iloc[0, 1]
        2

        With lists of integers.

        >>> df.iloc[[0, 2], [1, 3]]
              b     d
        0     2     4
        2  2000  4000

        With `slice` objects.

        >>> df.iloc[1:3, 0:3]
              a     b     c
        1   100   200   300
        2  1000  2000  3000

        With a boolean array whose length matches the columns.

        >>> df.iloc[:, [True, False, True, False]]
              a     c
        0     1     3
        1   100   300
        2  1000  3000

        With a callable function that expects the Series or DataFrame.

        >>> df.iloc[:, lambda df: [0, 2]]
              a     c
        0     1     3
        1   100   300
        2  1000  3000
        """

 

4.刪除字段

# axis=1代表列
values_table = source_table.drop('字段名', axis=1)

 

5.數據庫更新

.to_sql()更新數據時,con必須使用"sqlalchemy",如果使用pymysql會報錯

 6.選擇某些列

import pandas as pd

# 從Excel中讀取數據,生成DataFrame數據
# 導入Excel路徑和sheet name
df = pd.read_excel(excelName, sheet_name=sheetName)

# 讀取某些列,生成新的DataFrame
newDf = pd.DataFrame(df, columns=[column1, column2, column3])

7.讀取某些列,並根據某個列的值篩選行

newDf = pd.DataFrame(df, columns=[column1, column2, column3])[(df.column1 == value1) & (df.column2 == value2)]

8.添加新的列

# 第一種直接賦值
df["newColumn"] = newValue

# 第二種用concat組合兩個DataFrame
pd.concat([oldDf, newDf])

9.更改某一列的值

# 第一種,replace
df["column1"] = df["column1"].replace(oldValue, newValue)

# 第二種,map
df["column1"] = df["column1"].map({oldValue: newValue})

# 第三種,loc
# 將column2 中某些行(通過column1中的value1來過濾出來的)的值為value2
df.loc[df["column1"] == value1, "column2"] = value2

10.填充缺失值

# fillna填充缺失值
df["column1"] = df["column1"].fillna(value1)

11.過濾出某些列

Examples

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
                  index=['mouse', 'rabbit'],
                  columns=['one', 'two', 'three'])
df
        one  two  three
mouse     1    2      3
rabbit    4    5      6
# select columns by name
df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6
# select columns by regular expression
df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6
# select rows containing 'bbi'
df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6

12.mean()用法

Pandas Series.mean()函數返回給定Series對象中基礎數據的平均值。

用法: Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)


參數:
axis:要應用的功能的軸。
skipna:計算結果時排除NA /null值。
level:如果軸是MultiIndex(分層),則沿特定級別計數,並折疊成標量。
numeric_only:僅包括float,int,boolean列。
**kwargs:要傳遞給函數的其他關鍵字參數。

返回:均值:標量或系列(如果指定級別)

# 求和,求平均:

import pandas as pd
student = pd.read_excel("C:/Users/Administrator/Desktop/Students.xlsx",index_col="ID")
temp = student[["Test_1","Test_2","Test_3"]]
student["total"] = temp.sum(axis=1)#axis 0為列,1為行
student["avg"] = temp.mean(axis=1)
print(student)

#算各科成績平均,求和:

col_mean = student[["Test_1","Test_2","Test_3","total","avg"]].mean()
col_mean["Name"]="Summary"
student = student.append(col_mean,ignore_index=True)
student[["Test_1","Test_2","Test_3","total","avg"]] = student[["Test_1","Test_2","Test_3","total","avg"]].astype(int)
print(student)

 

轉自https://www.cnblogs.com/jiangxinyang/p/9672785.html

轉自https://blog.csdn.net/glittledream/article/details/87902161


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM