概要

最近在用 pandas 庫分析交易數據時，對數據的行列轉換有了進一步的認識。

在做一些縱向的比較分析的時候，數據的行列轉換應該是一個常規的操作，而 pandas 庫提供的方法可以從整體上對數據進行操作，極大的減少代碼的編寫。

實戰過程

通過分析虛擬幣的實際交易數據，掌握 pandas 庫的行列轉換。虛擬幣的交易數據通過火幣網的官方 API 采集而來，定期采集的，對於交易量特別大的幣，可能會有部分的遺漏。

虛擬幣的交易數據量比較大，這里只選取了 1 周的數據。

原始數據格式

根據 API 采集來的數據格式主要包含如下字段，每一行是一次交易：

交易ID,交易數量,價格,交易時間戳,交易方向,幣種
100003775833,4000.56,0.014146,1621148710076,sell,creusdt
......

目前火幣上和 USDT 交易的幣種大約 200 多種，這些幣種一周的所有交易次數大致在 1~2 億之間。
僅僅一周的數據其實也是很龐大的，需要根據實際分析內容進行一些預處理。

漲跌幅分析

對於交易數據數據，最常見的分析就是每天的漲跌幅。
漲跌幅的計算方法就是：

(當天收盤價 - 上一日收盤價) / 上一日收盤價 * 100%

結果為正就是漲，反之就是跌。

所以第一步就是收集每個幣種每天的收盤價，這樣每天就 200 多條數據，一周的數據也就 2000 不到。
這一步不是用 pandas 做的，暫時略過，整理后的數據格式如下：

交易日,幣種,最高價,最低價,收盤價
2021-05-09,1inchusdt,7.704231,6.72144,7.049687
2021-05-09,aacusdt,0.015569,0.0137,0.014916
2021-05-09,aaveusdt,475.2571,440.6347,456.919
... ...

為了簡化后面的分析，我將數據精簡了，只保留 3 個幣種的數據。（全部數據也是一樣的分析）

分析步驟 01：去除多余的列

我們的目的是分析每個幣種每天的漲跌幅，只需要收盤價信息，
所以，第一步，去除最高價和最低價的列。

首先，進入 Ipython，導入 pandas 庫。

In [1]: import pandas as pd
In [2]: f_data = pd.read_csv("./data.csv")
In [3]: f_data
Out[3]:
           交易日         幣種       最高價       最低價       收盤價
0   2021-05-09  1inchusdt  7.704231  6.721440  7.049687
1   2021-05-09    aacusdt  0.015569  0.013700  0.014916
2   2021-05-09    zrxusdt  1.985400  1.837000  1.949000
3   2021-05-10  1inchusdt  7.205859  6.488946  6.693635
4   2021-05-10    aacusdt  0.016108  0.014216  0.016097
5   2021-05-10    zrxusdt  2.020500  1.870900  1.936400
6   2021-05-11  1inchusdt  6.706469  5.588000  6.330097
7   2021-05-11    aacusdt  0.023999  0.014787  0.017725
8   2021-05-11    zrxusdt  1.938000  1.638800  1.835800
9   2021-05-12  1inchusdt  6.914203  6.247467  6.478842
10  2021-05-12    aacusdt  0.019471  0.015502  0.016901
11  2021-05-12    zrxusdt  2.037200  1.799600  1.882900
12  2021-05-13  1inchusdt  6.523776  5.266794  5.377795
13  2021-05-13    aacusdt  0.017724  0.013801  0.014344
14  2021-05-13    zrxusdt  1.941800  1.542800  1.625700
15  2021-05-14  1inchusdt  5.939128  5.291762  5.868618
16  2021-05-14    aacusdt  0.015567  0.013966  0.015385
17  2021-05-14    zrxusdt  1.760300  1.543100  1.744100
18  2021-05-15  1inchusdt  6.047498  5.499000  5.646441
19  2021-05-15    aacusdt  0.016356  0.014308  0.015849
20  2021-05-15    zrxusdt  1.767000  1.570000  1.631500
21  2021-05-16  1inchusdt  5.645512  5.193545  5.273476
22  2021-05-16    aacusdt  0.015847  0.014411  0.015002
23  2021-05-16    zrxusdt  1.700000  1.533400  1.616500
24  2021-05-17  1inchusdt  5.339175  4.441147  4.701934
25  2021-05-17    aacusdt  0.018000  0.014000  0.016289
26  2021-05-17    zrxusdt  1.638000  1.391900  1.468800
27  2021-05-18  1inchusdt  4.884853  4.528847  4.855229
28  2021-05-18    aacusdt  0.016356  0.015042  0.015705
29  2021-05-18    zrxusdt  1.524300  1.405600  1.506800

這里顯示所有用來實驗的數據。
去除最高價和最低價的列只要一行代碼：

In [4]: f_data = f_data[["交易日", "幣種", "收盤價"]]

In [5]: f_data
Out[5]:
           交易日         幣種       收盤價
0   2021-05-09  1inchusdt  7.049687
1   2021-05-09    aacusdt  0.014916
2   2021-05-09    zrxusdt  1.949000
3   2021-05-10  1inchusdt  6.693635
4   2021-05-10    aacusdt  0.016097
5   2021-05-10    zrxusdt  1.936400
6   2021-05-11  1inchusdt  6.330097
7   2021-05-11    aacusdt  0.017725
8   2021-05-11    zrxusdt  1.835800
9   2021-05-12  1inchusdt  6.478842
10  2021-05-12    aacusdt  0.016901
11  2021-05-12    zrxusdt  1.882900
12  2021-05-13  1inchusdt  5.377795
13  2021-05-13    aacusdt  0.014344
14  2021-05-13    zrxusdt  1.625700
15  2021-05-14  1inchusdt  5.868618
16  2021-05-14    aacusdt  0.015385
17  2021-05-14    zrxusdt  1.744100
18  2021-05-15  1inchusdt  5.646441
19  2021-05-15    aacusdt  0.015849
20  2021-05-15    zrxusdt  1.631500
21  2021-05-16  1inchusdt  5.273476
22  2021-05-16    aacusdt  0.015002
23  2021-05-16    zrxusdt  1.616500
24  2021-05-17  1inchusdt  4.701934
25  2021-05-17    aacusdt  0.016289
26  2021-05-17    zrxusdt  1.468800
27  2021-05-18  1inchusdt  4.855229
28  2021-05-18    aacusdt  0.015705
29  2021-05-18    zrxusdt  1.506800

分析步驟 02：幣種列轉為行，每個交易日一行數據

為了分析每個幣種每天的變化，按照交易日順序，排列每個幣種的收盤價。
要將幣種轉成行，先將交易日設置成 1 級索引，幣種作為 2 級索引，然后將 2 級索引轉成列

In [6]: f_data = f_data.set_index(["交易日", "幣種"])["收盤價"]

In [7]: f_data = f_data.unstack()

In [8]: f_data
Out[8]:
幣種          1inchusdt   aacusdt  zrxusdt
交易日
2021-05-09   7.049687  0.014916   1.9490
2021-05-10   6.693635  0.016097   1.9364
2021-05-11   6.330097  0.017725   1.8358
2021-05-12   6.478842  0.016901   1.8829
2021-05-13   5.377795  0.014344   1.6257
2021-05-14   5.868618  0.015385   1.7441
2021-05-15   5.646441  0.015849   1.6315
2021-05-16   5.273476  0.015002   1.6165
2021-05-17   4.701934  0.016289   1.4688
2021-05-18   4.855229  0.015705   1.5068

通過上面可以看出，column 的 name 是幣種 index 的 name 是 交易日 column 的 name 不需要，可以用下面的代碼去除 column 的 name

In [9]: f_data = f_data.rename_axis(columns=None)

In [10]: f_data
Out[10]:
            1inchusdt   aacusdt  zrxusdt
交易日
2021-05-09   7.049687  0.014916   1.9490
2021-05-10   6.693635  0.016097   1.9364
2021-05-11   6.330097  0.017725   1.8358
2021-05-12   6.478842  0.016901   1.8829
2021-05-13   5.377795  0.014344   1.6257
2021-05-14   5.868618  0.015385   1.7441
2021-05-15   5.646441  0.015849   1.6315
2021-05-16   5.273476  0.015002   1.6165
2021-05-17   4.701934  0.016289   1.4688
2021-05-18   4.855229  0.015705   1.5068

分析步驟 03：就是每天的漲跌幅

數據變成上面格式之后，計算漲跌幅只需一行代碼。

In [11]: f_data = f_data.pct_change()

In [12]: f_data
Out[12]:
            1inchusdt   aacusdt   zrxusdt
交易日
2021-05-09        NaN       NaN       NaN
2021-05-10  -0.050506  0.079177 -0.006465
2021-05-11  -0.054311  0.101137 -0.051952
2021-05-12   0.023498 -0.046488  0.025656
2021-05-13  -0.169945 -0.151293 -0.136598
2021-05-14   0.091268  0.072574  0.072830
2021-05-15  -0.037858  0.030159 -0.064561
2021-05-16  -0.066053 -0.053442 -0.009194
2021-05-17  -0.108381  0.085789 -0.091370
2021-05-18   0.032603 -0.035852  0.025871

第一條數據由於沒有上一日的數據，所以沒有漲跌幅。

分析步驟 04：刪除掉第一天的無效數據

In [13]: f_data = f_data.drop(index=["2021-05-09"])

In [14]: f_data
Out[14]:
            1inchusdt   aacusdt   zrxusdt
交易日
2021-05-10  -0.050506  0.079177 -0.006465
2021-05-11  -0.054311  0.101137 -0.051952
2021-05-12   0.023498 -0.046488  0.025656
2021-05-13  -0.169945 -0.151293 -0.136598
2021-05-14   0.091268  0.072574  0.072830
2021-05-15  -0.037858  0.030159 -0.064561
2021-05-16  -0.066053 -0.053442 -0.009194
2021-05-17  -0.108381  0.085789 -0.091370
2021-05-18   0.032603 -0.035852  0.025871

分析步驟 05：幣種行轉列，方便按照漲跌幅排序

再轉回原來的格式，方便進行漲跌幅排序。

行列轉回去之后，需要重置 index，將 交易日 作為數據的一列，而不是 index.
現在 交易日 是作為數據的 index 的。

In [16]: f_data = f_data.stack()
In [17]: f_data = f_data.reset_index()
Out[17]:
           交易日    level_1         0
0   2021-05-10  1inchusdt -0.050506
1   2021-05-10    aacusdt  0.079177
2   2021-05-10    zrxusdt -0.006465
3   2021-05-11  1inchusdt -0.054311
4   2021-05-11    aacusdt  0.101137
5   2021-05-11    zrxusdt -0.051952
6   2021-05-12  1inchusdt  0.023498
7   2021-05-12    aacusdt -0.046488
8   2021-05-12    zrxusdt  0.025656
9   2021-05-13  1inchusdt -0.169945
10  2021-05-13    aacusdt -0.151293
11  2021-05-13    zrxusdt -0.136598
12  2021-05-14  1inchusdt  0.091268
13  2021-05-14    aacusdt  0.072574
14  2021-05-14    zrxusdt  0.072830
15  2021-05-15  1inchusdt -0.037858
16  2021-05-15    aacusdt  0.030159
17  2021-05-15    zrxusdt -0.064561
18  2021-05-16  1inchusdt -0.066053
19  2021-05-16    aacusdt -0.053442
20  2021-05-16    zrxusdt -0.009194
21  2021-05-17  1inchusdt -0.108381
22  2021-05-17    aacusdt  0.085789
23  2021-05-17    zrxusdt -0.091370
24  2021-05-18  1inchusdt  0.032603
25  2021-05-18    aacusdt -0.035852
26  2021-05-18    zrxusdt  0.025871

重命名列的名稱 level_1 -> 幣種，0 -> 漲跌幅。

In [20]: f_data = f_data.rename(columns={"level_1": "幣種", 0: "漲跌幅"})

In [21]: f_data
Out[21]:
           交易日         幣種       漲跌幅
0   2021-05-10  1inchusdt -0.050506
1   2021-05-10    aacusdt  0.079177
2   2021-05-10    zrxusdt -0.006465
3   2021-05-11  1inchusdt -0.054311
4   2021-05-11    aacusdt  0.101137
5   2021-05-11    zrxusdt -0.051952
6   2021-05-12  1inchusdt  0.023498
7   2021-05-12    aacusdt -0.046488
8   2021-05-12    zrxusdt  0.025656
9   2021-05-13  1inchusdt -0.169945
10  2021-05-13    aacusdt -0.151293
11  2021-05-13    zrxusdt -0.136598
12  2021-05-14  1inchusdt  0.091268
13  2021-05-14    aacusdt  0.072574
14  2021-05-14    zrxusdt  0.072830
15  2021-05-15  1inchusdt -0.037858
16  2021-05-15    aacusdt  0.030159
17  2021-05-15    zrxusdt -0.064561
18  2021-05-16  1inchusdt -0.066053
19  2021-05-16    aacusdt -0.053442
20  2021-05-16    zrxusdt -0.009194
21  2021-05-17  1inchusdt -0.108381
22  2021-05-17    aacusdt  0.085789
23  2021-05-17    zrxusdt -0.091370
24  2021-05-18  1inchusdt  0.032603
25  2021-05-18    aacusdt -0.035852
26  2021-05-18    zrxusdt  0.025871

每天各幣種的漲跌幅按照從小到大排序。

In [22]: f_data = f_data.sort_values(by=["漲跌幅"])

In [23]: f_data
Out[23]:
           交易日         幣種       漲跌幅
9   2021-05-13  1inchusdt -0.169945
10  2021-05-13    aacusdt -0.151293
11  2021-05-13    zrxusdt -0.136598
21  2021-05-17  1inchusdt -0.108381
23  2021-05-17    zrxusdt -0.091370
18  2021-05-16  1inchusdt -0.066053
17  2021-05-15    zrxusdt -0.064561
3   2021-05-11  1inchusdt -0.054311
19  2021-05-16    aacusdt -0.053442
5   2021-05-11    zrxusdt -0.051952
0   2021-05-10  1inchusdt -0.050506
7   2021-05-12    aacusdt -0.046488
15  2021-05-15  1inchusdt -0.037858
25  2021-05-18    aacusdt -0.035852
20  2021-05-16    zrxusdt -0.009194
2   2021-05-10    zrxusdt -0.006465
6   2021-05-12  1inchusdt  0.023498
8   2021-05-12    zrxusdt  0.025656
26  2021-05-18    zrxusdt  0.025871
16  2021-05-15    aacusdt  0.030159
24  2021-05-18  1inchusdt  0.032603
13  2021-05-14    aacusdt  0.072574
14  2021-05-14    zrxusdt  0.072830
1   2021-05-10    aacusdt  0.079177
22  2021-05-17    aacusdt  0.085789
12  2021-05-14  1inchusdt  0.091268
4   2021-05-11    aacusdt  0.101137

可以看出，排序之后，index 順序亂了。不過沒什么關系，我們最后只要導出數據，不用導出 index。

分析步驟 06：導出數據

In [24]: f_data.to_csv("./data-result.csv", index=False)

In [25]: cat ./data-result.csv
交易日,幣種,漲跌幅
2021-05-13,1inchusdt,-0.16994503030016783
2021-05-13,aacusdt,-0.15129282290988688
2021-05-13,zrxusdt,-0.13659780126400767
2021-05-17,1inchusdt,-0.10838050651979836
2021-05-17,zrxusdt,-0.09137024435508811
2021-05-16,1inchusdt,-0.0660531120399559
2021-05-15,zrxusdt,-0.06456051831890375
2021-05-11,1inchusdt,-0.054310998433586444
2021-05-16,aacusdt,-0.053441857530443504
2021-05-11,zrxusdt,-0.051952076017351634
2021-05-10,1inchusdt,-0.05050607211355629
2021-05-12,aacusdt,-0.046488011283498
2021-05-15,1inchusdt,-0.03785848729632757
2021-05-18,aacusdt,-0.035852415740683985
2021-05-16,zrxusdt,-0.009193993257738176
2021-05-10,zrxusdt,-0.006464853771164791
2021-05-12,1inchusdt,0.023498060140310528
2021-05-12,zrxusdt,0.025656389584922
2021-05-18,zrxusdt,0.02587145969498894
2021-05-15,aacusdt,0.03015924601884956
2021-05-18,1inchusdt,0.03260254184767364
2021-05-14,aacusdt,0.07257389849414375
2021-05-14,zrxusdt,0.07283016546718346
2021-05-10,aacusdt,0.0791767229820326
2021-05-17,aacusdt,0.08578856152513015
2021-05-14,1inchusdt,0.09126844738410433
2021-05-11,aacusdt,0.10113685779959014

可視化

到這里基本分析結束，可以用導出的數據去做可視化展示了，使用 antd 做的動態展示效果見視頻號：
databook 視頻號

總結

越深入了解 pandas 庫，就會遇到越來越多的驚喜。在使用 pandas 的過程中，我最深的體會是，要用整體的視角來處理數據，通過操作數據的索引和列來完成數據的變換和計算。
要拋棄寫代碼的思路，不要想着去解析每行數據，得到每個單元格中的數據，然后再循環處理之類的。
這似乎還有點類似於寫 SQL 查數據庫的感覺，不過，pandas 的 DataFrame 結構比數據庫中 table 結構要強大的多。

從上面的使用還可以看出，pandas 庫雖然是處理表格類的數據，但是可以通過創建多級索引來處理二維以上的數據。
通過索引，能夠讓數據表現出更多的層次。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 利用python進行數據分析之pandas入門利用Python進行數據分析(10) pandas基礎: 處理缺失數據電商交易數據分析利用Python進行數據分析_Pandas_匯總和計算描述統計利用Python進行數據分析(7) pandas基礎: Series和DataFrame的簡單介紹《利用python進行數據分析》讀書筆記--第五章 pandas入門利用Python進行數據分析-Pandas(第六部分-數據聚合與分組運算) 【Python】利用pandas將數據寫入csv表格鏈上數據分析對加密貨幣交易的意義銷量預測和用戶行為的分析--基於ERP的交易數據