楔子
有時候DataFrame,我們不一定要保存成文件、或者入數據庫,而是希望保存成其它的格式,比如字典、列表、json等等。當然,讀取DataFrame也不一定非要從文件、或者數據庫,根據現有的數據生成DataFrame也是可以的,那么該怎么做呢?我們來看一下
DataFrame轉成python中的數據格式
轉成json
DataFrame轉成json,可以使用df.to_json()方法
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json())
# {"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}}
我們看到雖然轉化成了json,但是有些不完美,那就是它把索引也算進去了
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
# 如果不想加索引的話,那么指定index=False即可
try:
print(df.to_json(index=False))
except Exception as e:
print(e) # 'index=False' is only valid when 'orient' is 'split' or 'table'
# 但是它報錯了,說如果index=False,那么orient必須指定我split或者table
我們看一下這個orient是什么
首先orient可以有如下取值:split、records、index、columns、values、table
我們分別演示一下,看看orient取不同的值,結果會有什么變化
orient='split'
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json(orient="split"))
"""
{
"columns":["name","age"],
"index":[0,1,2,3],
"data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
}
"""
print(df.to_json(orient="split", index=False))
"""
{
"columns":["name","age"],
"data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
}
"""
我們看到會變成三個鍵值對,分別是列名、索引、數據
orient='records'
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json(orient="records"))
"""
[{"name":"mashiro","age":17},
{"name":"satori","age":17},
{"name":"koishi","age":16},
{"name":"nagisa","age":21}]
"""
這種格式的數據是比較常用的,相當於列名和每一行數據組合成一個字典,然后存在一個列表里面。並且我們看到生成json默認跟索引沒啥關系,所以不需要、也不可以加index=False
orient='index'
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json(orient="index"))
"""
{
"0":{"name":"mashiro","age":17},
"1":{"name":"satori","age":17},
"2":{"name":"koishi","age":16},
"3":{"name":"nagisa","age":21}
}
"""
類似於records,只不過這里把字典作為value放在了外層字典里,其中key為對應的索引。當然這里同樣不可以加index=False
orient='columns'
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json(orient="columns"))
"""
{"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}}
"""
我們看到這個和不指定orient得到結果是一樣的,其實不指定的話orient默認是columns
orient=values
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
print(df.to_json(orient="values"))
"""
[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
"""
# 我們看到當orient指定為values,會只獲取數據
# 另外這個方式類似於to_numpy
print(df.to_numpy())
"""
[['mashiro' 17]
['satori' 17]
['koishi' 16]
['nagisa' 21]]
"""
orient=table
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
# 以數據庫二維表的形式返回
print(df.to_json(orient="table"))
"""
{
"schema": {
"fields": [{"name": "index", "type": "integer"},
{"name": "name", "type": "string"},
{"name": "age", "type": "integer"}],
"primaryKey": ["index"],
"pandas_version": "0.20.0"
},
"data": [{"index": 0, "name": "mashiro", "age": 17},
{"index": 1, "name": "satori", "age": 17},
{"index": 2, "name": "koishi", "age": 16},
{"index": 3, "name": "nagisa", "age": 21}]
}
"""
print(df.to_json(orient="table", index=False))
"""
{
"schema": {
"fields": [{"name": "name", "type": "string"},
{"name": "age", "type": "integer"}],
"pandas_version": "0.20.0"
},
"data": [{"name": "mashiro", "age": 17},
{"name": "satori", "age": 17},
{"name": "koishi", "age": 16},
{"name": "nagisa", "age": 21}]
}
"""
轉成dict
DataFrame也可以轉成字典,轉換成字典里面也有一個orient參數,里面有一部分和to_json是類似的。因為json這個數據結構本身就借鑒了python中的字典,是的你沒有看錯,json這種數據結構參考了python中的字典。
to_dict中的orient可以有如下取值:dict、list、series、split、records、index,默認是dict
orient='dict'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
pprint(df.to_dict(orient="dict"))
"""
{'age': {0: 17, 1: 17, 2: 16, 3: 21},
'name': {0: 'mashiro', 1: 'satori', 2: 'koishi', 3: 'nagisa'}}
"""
orient='list'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
pprint(df.to_dict(orient="list"))
"""
{'age': [17, 17, 16, 21], 'name': ['mashiro', 'satori', 'koishi', 'nagisa']}
"""
orient='series'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
# 這種結構真的不常用,就是一個key對應一個series
pprint(df.to_dict(orient="series"))
"""
{'age':
0 17
1 17
2 16
3 21
Name: age, dtype: int64,
'name': 0 mashiro
1 satori
2 koishi
3 nagisa
Name: name, dtype: object}
"""
orient='split'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
pprint(df.to_dict(orient="split"))
"""
{'columns': ['name', 'age'],
'data': [['mashiro', 17], ['satori', 17], ['koishi', 16], ['nagisa', 21]],
'index': [0, 1, 2, 3]}
"""
orient='records'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
pprint(df.to_dict(orient="records"))
"""
[{'age': 17, 'name': 'mashiro'},
{'age': 17, 'name': 'satori'},
{'age': 16, 'name': 'koishi'},
{'age': 21, 'name': 'nagisa'}]
"""
orient='index'
from pprint import pprint
import pandas as pd
df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
"age": [17, 17, 16, 21]})
pprint(df.to_dict(orient="index"))
"""
{0: {'age': 17, 'name': 'mashiro'},
1: {'age': 17, 'name': 'satori'},
2: {'age': 16, 'name': 'koishi'},
3: {'age': 21, 'name': 'nagisa'}}
"""
python中的數據格式轉成DataFrame
字典轉成DataFrame
import pandas as pd
data = {0: {'age': 17, 'name': 'mashiro'},
1: {'age': 17, 'name': 'satori'},
2: {'age': 16, 'name': 'koishi'},
3: {'age': 21, 'name': 'nagisa'}}
df = pd.DataFrame.from_dict(data)
# 顯然不是我們期待的格式
print(df)
"""
0 1 2 3
age 17 17 16 21
name mashiro satori koishi nagisa
"""
df = pd.DataFrame.from_dict(data, orient="index")
print(df)
"""
age name
0 17 mashiro
1 17 satori
2 16 koishi
3 21 nagisa
"""
所以df.to_dict和pd.DataFrame.from_json實現的是相反的功能,但是from_dict中的orient參數只有兩種選擇,要么是index,要么是columns,默認是columns
from_records
from_records是專門針對外層是列表的數據
import pandas as pd
data = [{'age': 17, 'name': 'mashiro'},
{'age': 17, 'name': 'satori'},
{'age': 16, 'name': 'koishi'},
{'age': 21, 'name': 'nagisa'}]
df = pd.DataFrame.from_records(data)
print(df)
"""
age name
0 17 mashiro
1 17 satori
2 16 koishi
3 21 nagisa
"""
其實這種數據就是to_dict(orient="records")生成的