一、Web交互

很多網站都有公開的API，通過JSON或者其它格式提供數據服務。我們可以利用Python的requests庫來訪問這些API。Anaconda環境中默認已經安裝好了requests，如果你的環境中，沒有安裝這個包的話，可以通過pip或者conda進行安裝：

pip install requests
conda install requests

下面我們以獲取Github上的30條關於pandas的問題為例，介紹如何與Web API交互。

In [113]: import requests # 導入包
In [114]: url = 'https://api.github.com/repos/pandas-dev/pandas/issues' # 要訪問的url
In [115]: response = requests.get(url) # 訪問頁面，需要等待一會
In [116]: response # 成功獲取響應的數據
Out[116]: <Response [200]>
In [117]: data = response.json() # 解析為json格式
In [118]: data[0]['title'] # 查看第一個問題的標題
Out[118]: 'python-project-packaging-with-pandas'

data中的每個元素都是一個包含Github問題頁面上的所有數據的字典（注釋除外）。

In [119]: data[0] Out[119]: {'url': 'https://api.github.com/repos/pandas-dev/pandas/issues/24779', 'repository_url': 'https://api.github.com/repos/pandas-dev/pandas', 'labels_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/24779/labels{/name}', 'comments_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/24779/comments', 'events_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/24779/events', 'html_url': 'https://github.com/pandas-dev/pandas/issues/24779', 'id': 399193081, 'node_id': 'MDU6SXNzdWUzOTkxOTMwODE=', 'number': 24779, 'title': 'python-project-packaging-with-pandas', ......后面省略

我們可以將data直接傳給DataFrame，並提取感興趣的字段：

In [120]: issues = pd.DataFrame(data, columns= ['number','title','labels','state']) In [121]: issues Out[121]: number ... state 0 24779 ... open 1    24778 ... open 2    24776 ... open 3    24775 ... open 4    24774 ... open .. ... ... ... 25   24736 ... open 26   24735 ... open 27   24733 ... open 28   24732 ... open 29   24730  ...   open

上面我們截取了'number','title','labels','state'四個字段的內容，並構造了DataFrame對象，下面就可以利用pandas對它進行各種操作了！

二、數據庫交互

在實際使用場景中，大部分數據並不是存儲在文本或者Excel文件中的，而是一些基於SQL語言的關系型數據庫中，比如MySQL。

從SQL中將數據讀取為DataFrame對象是非常簡單直接的，pandas提供了多個函數用於簡化這個過程。

下面以Python內置的sqlite3標准庫為例，介紹一下操作過程。

首先是生成數據庫：

In [123]: import sqlite3  # 導入這個標准內置庫 # 編寫一條創建test表的sql語句
In [124]: query = """ ...: CREATE TABLE test ...: (a VARCHAR(20), b VARCHAR(20), c REAL, d integer);""" In [125]: con  = sqlite3.connect('mydata.sqlite')  # 創建數據庫文件，並連接 In [126]: con.execute(query) # 執行sql語句
Out[126]: <sqlite3.Cursor at 0x2417e5535e0> In [127]: con.commit # 提交事務
Out[127]: <function Connection.commit>

再插入幾行數據：

# 兩個人和一只老鼠的信息
In [129]: data = [('tom', 'male',1.75, 20), ...: ('mary','female',1.60, 18), ...: ('jerry','rat', 0.2, 60)] ...: # 再來一條空數據
In [130]: stmt = "INSERT INTO test VALUES(?,?,?,?)" In [131]: con.executemany(stmt,data) # 執行多個語句
Out[131]: <sqlite3.Cursor at 0x2417e4b9f80> In [132]: con.commit() # 再次提交事務

前面都是往數據庫里寫入內容，下面我們來讀數據：

In [133]: cursor = con.execute('select * from test') # 執行查詢語句
In [134]: rows = cursor.fetchall() # 獲取查詢結果
In [135]: rows Out[135]: [('tom', 'male', 1.75, 20), ('mary', 'female', 1.6, 18), ('jerry', 'rat', 0.2, 60)] In [136]: cursor.description # 看看結果中的列名，其實就是數據表的列名
Out[136]: (('a', None, None, None, None, None, None), ('b', None, None, None, None, None, None), ('c', None, None, None, None, None, None), ('d', None, None, None, None, None, None)) In [137]: pd.DataFrame(rows,columns= [x[0] for x in cursor.description]) Out[137]: a b c d 0 tom male 1.75  20
1   mary  female  1.60  18
2  jerry     rat  0.20  60

上面最后生成DataFrame時，使用了一個列表推導式來構成列名序列。

例子到這里大致就完成了，但是關於數據庫的連接和查詢操作實在是夠繁瑣的，你肯定不想每次都這么來一遍。那么怎么辦呢？使用流行的Python的SQL工具包SQLAlchemy，它能簡化你的數據庫操作。同時，pandas提供了一個read_sql函數，允許你從通用的SQLAlchemy連接中輕松地讀取數據。一個典型的操作如下：

In [138]: import sqlalchemy as sqla In [140]: db = sqla.create_engine('sqlite:///mydata.sqlite') # 創建連接
In [141]: pd.read_sql('select * from test', db) # 查詢數據並轉換為pandas對象
Out[141]: a b c d 0 tom male 1.75  20
1   mary  female  1.60  18
2  jerry     rat  0.20  60

在Anaconda中，已經默認安裝了SQLAlchemy，可以直接使用。如果你的環境中沒有SQLAlchemy，請自定安裝，或者搜索教程進行學習。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 slivelight5和數據庫交互前端與數據庫交互 Android通過Web與后台數據庫交互 flask之數據庫的交互 Python3.7和數據庫MySQL交互(二)SQLyog安裝教程數據庫數據交互詳解（一） jdbc與TiDB數據庫交互的過程 Python和MySQL數據庫交互PyMySQL 數據庫交互之減少IO次數 c# 與 oracle 數據庫交互