在使用Pandas進行數據處理的時候,我們通常從CSV或EXCEL中導入數據,但有的時候數據都存在數據庫內,我們並沒有現成的數據文件,這時候可以通過Pymongo這個庫,從mongoDB中讀取數據,然后載入到Pandas中,只需要簡單的三步。
第一步,導入相關的模塊:
import pymongo import pandas as pd
第二步,設置MongoDB連接信息:
client = pymongo.MongoClient('localhost',27017) db = client['Lottery'] pk10 = db['Pk10']
第三步,加載數據到Pandas中:
data = pd.DataFrame(list(pk10.find()))
刪除mongodb中的_id字段
del data['_id']
選擇需要顯示的字段
data = data[['date','num1','num10']] print(data)
這樣就可以輕松地從MongoDB中讀取數據到Pandas中進行數據分析了。
stackoverflow
import pandas as pd from pymongo import MongoClient def _connect_mongo(host, port, username, password, db): """ A util for making a connection to mongo """ if username and password: mongo_uri = 'mongodb://%s:%s@%s:%s/%s' % (username, password, host, port, db) conn = MongoClient(mongo_uri) else: conn = MongoClient(host, port) return conn[db] def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True): """ Read from Mongo and Store into DataFrame """ # Connect to MongoDB db = _connect_mongo(host=host, port=port, username=username, password=password, db=db) # Make a query to the specific DB and Collection cursor = db[collection].find(query) # Expand the cursor and construct the DataFrame df = pd.DataFrame(list(cursor)) # Delete the _id if no_id: del df['_id'] return df