在使用Pandas進行數據處理的時候,我們通常從CSV或EXCEL中導入數據,但有的時候數據都存在數據庫內,我們並沒有現成的數據文件,這時候可以通過Pymongo這個庫,從mongoDB中讀取數據,然后載入到Pandas中,只需要簡單的三步。
第一步,導入相關的模塊:
import pymongo import pandas as pd
第二步,設置MongoDB連接信息:
client = pymongo.MongoClient('localhost',27017)
db = client['Lottery']
pk10 = db['Pk10']
第三步,加載數據到Pandas中:
data = pd.DataFrame(list(pk10.find()))
刪除mongodb中的_id字段
del data['_id']
選擇需要顯示的字段
data = data[['date','num1','num10']] print(data)
這樣就可以輕松地從MongoDB中讀取數據到Pandas中進行數據分析了。
stackoverflow
import pandas as pd
from pymongo import MongoClient
def _connect_mongo(host, port, username, password, db):
""" A util for making a connection to mongo """
if username and password:
mongo_uri = 'mongodb://%s:%s@%s:%s/%s' % (username, password, host, port, db)
conn = MongoClient(mongo_uri)
else:
conn = MongoClient(host, port)
return conn[db]
def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True):
""" Read from Mongo and Store into DataFrame """
# Connect to MongoDB
db = _connect_mongo(host=host, port=port, username=username, password=password, db=db)
# Make a query to the specific DB and Collection
cursor = db[collection].find(query)
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))
# Delete the _id
if no_id:
del df['_id']
return df
