1.pandas需要配合sqlalchemy的使用
import pandas as pd from sqlalchemy import create_engine engine = create_engine("mysql+pymysql://%s:%s@%s:3306/%s?charset=utf8" % (TEST_DB.user, TEST_DB.password, TEST_DB.host, TEST_DB.db)) exec_sql = '' source_table = pd.read_sql(exec_sql, engine)
2.遍歷數據
# index代表索引,從0開始 # v代表數據庫中的數據 for index,v in source_table.iterrows(): print(index, v) # 精確取出數據,需要注意的是,取出的數據值,是一個series類型數據,不是string,需要string()后才可使用split v["字段名稱"]即可
3.更改數據
# 使用iloc方法, source_table.iloc[index,2] = int(result) >>> df = pd.DataFrame(mydict) >>> df a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000 **Indexing just the rows** With a scalar integer. >>> type(df.iloc[0]) <class 'pandas.core.series.Series'> >>> df.iloc[0] a 1 b 2 c 3 d 4 Name: 0, dtype: int64 With a list of integers. >>> df.iloc[[0]] a b c d 0 1 2 3 4 >>> type(df.iloc[[0]]) <class 'pandas.core.frame.DataFrame'> >>> df.iloc[[0, 1]] a b c d 0 1 2 3 4 1 100 200 300 400 With a `slice` object. >>> df.iloc[:3] a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000 With a boolean mask the same length as the index. >>> df.iloc[[True, False, True]] a b c d 0 1 2 3 4 2 1000 2000 3000 4000 With a callable, useful in method chains. The `x` passed to the ``lambda`` is the DataFrame being sliced. This selects the rows whose index label even. >>> df.iloc[lambda x: x.index % 2 == 0] a b c d 0 1 2 3 4 2 1000 2000 3000 4000 **Indexing both axes** You can mix the indexer types for the index and columns. Use ``:`` to select the entire axis. With scalar integers. >>> df.iloc[0, 1] 2 With lists of integers. >>> df.iloc[[0, 2], [1, 3]] b d 0 2 4 2 2000 4000 With `slice` objects. >>> df.iloc[1:3, 0:3] a b c 1 100 200 300 2 1000 2000 3000 With a boolean array whose length matches the columns. >>> df.iloc[:, [True, False, True, False]] a c 0 1 3 1 100 300 2 1000 3000 With a callable function that expects the Series or DataFrame. >>> df.iloc[:, lambda df: [0, 2]] a c 0 1 3 1 100 300 2 1000 3000 """
4.刪除字段
# axis=1代表列 values_table = source_table.drop('字段名', axis=1)
5.數據庫更新
.to_sql()更新數據時,con必須使用"sqlalchemy",如果使用pymysql會報錯
6.選擇某些列
import pandas as pd # 從Excel中讀取數據,生成DataFrame數據 # 導入Excel路徑和sheet name df = pd.read_excel(excelName, sheet_name=sheetName) # 讀取某些列,生成新的DataFrame newDf = pd.DataFrame(df, columns=[column1, column2, column3])
7.讀取某些列,並根據某個列的值篩選行
newDf = pd.DataFrame(df, columns=[column1, column2, column3])[(df.column1 == value1) & (df.column2 == value2)]
8.添加新的列
# 第一種直接賦值 df["newColumn"] = newValue # 第二種用concat組合兩個DataFrame pd.concat([oldDf, newDf])
9.更改某一列的值
# 第一種,replace df["column1"] = df["column1"].replace(oldValue, newValue) # 第二種,map df["column1"] = df["column1"].map({oldValue: newValue}) # 第三種,loc # 將column2 中某些行(通過column1中的value1來過濾出來的)的值為value2 df.loc[df["column1"] == value1, "column2"] = value2
10.填充缺失值
# fillna填充缺失值 df["column1"] = df["column1"].fillna(value1)
11.過濾出某些列
Examples df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])), index=['mouse', 'rabbit'], columns=['one', 'two', 'three']) df one two three mouse 1 2 3 rabbit 4 5 6 # select columns by name df.filter(items=['one', 'three']) one three mouse 1 3 rabbit 4 6 # select columns by regular expression df.filter(regex='e$', axis=1) one three mouse 1 3 rabbit 4 6 # select rows containing 'bbi' df.filter(like='bbi', axis=0) one two three rabbit 4 5 6
12.mean()用法
Pandas Series.mean()
函數返回給定Series對象中基礎數據的平均值。
用法: Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
參數:
axis:要應用的功能的軸。
skipna:計算結果時排除NA /null值。
level:如果軸是MultiIndex(分層),則沿特定級別計數,並折疊成標量。
numeric_only:僅包括float,int,boolean列。
**kwargs:要傳遞給函數的其他關鍵字參數。
返回:均值:標量或系列(如果指定級別)
# 求和,求平均: import pandas as pd student = pd.read_excel("C:/Users/Administrator/Desktop/Students.xlsx",index_col="ID") temp = student[["Test_1","Test_2","Test_3"]] student["total"] = temp.sum(axis=1)#axis 0為列,1為行 student["avg"] = temp.mean(axis=1) print(student)
#算各科成績平均,求和: col_mean = student[["Test_1","Test_2","Test_3","total","avg"]].mean() col_mean["Name"]="Summary" student = student.append(col_mean,ignore_index=True) student[["Test_1","Test_2","Test_3","total","avg"]] = student[["Test_1","Test_2","Test_3","total","avg"]].astype(int) print(student)
轉自https://www.cnblogs.com/jiangxinyang/p/9672785.html
轉自https://blog.csdn.net/glittledream/article/details/87902161