1. 創建MultiIndex

MultiIndex對象是標准索引Index對象的擴展，可以將MultiIndex看作一個元組數組，其中每個元組都是唯一的。可以從數組列表（MultiIndex.from_arrays()）、元組數組（MultiIndex.from_tuples()）、交叉迭代器集（MultiIndex.from_product()）或DaTaFrame（使用using MultiIndex.from_frame）創建多索引。當傳遞一個元組列表時，索引構造函數將嘗試返回一個MultiIndex。

1. from_tuples

先創建一個元組構成的列表

import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
            ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
print(tuples)

[('bar', 'one'), ('bar', 'two'), ('baz', 'one'), ('baz', 'two'), ('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')]

將元組列表轉換為MultiIndex：

MultiIndex.from_tuples(tuples, sortorder=None, names=None)

index = pd.MultiIndex.from_tuples(tuples, names=('first', 'second'))
index

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])

創建一個Series，並設置它的index：

import numpy as np

s = pd.Series(np.random.randn(8), index=index)
s

first  second
bar    one      -1.219790
       two      -1.299911
baz    one      -0.707801
       two       0.280277
foo    one       0.683006
       two       1.279083
qux    one      -0.659377
       two       0.095253
dtype: float64

創建一個DataFrame，並設置它的index：

df=pd.DataFrame(np.random.randint(1,10,(8, 5)),index=index)
df

2. from_arrays

如果說from_tuples接受的參數是“行”的列表，那么from_arrays接受的參數就是“列”的列表：

MultiIndex.from_arrays(arrays, sortorder=None, names=None)

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
            ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays)
s = pd.Series(np.random.randn(8), index=index)
s

bar  one    0.817429
     two    0.248518
baz  one   -0.684833
     two    0.437428
foo  one   -0.019753
     two   -1.035943
qux  one    1.602173
     two   -1.592012
dtype: float64

為了方便，通常可以直接在Series的構造函數中使用：

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
            ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

s = pd.Series(np.random.randn(8), index=arrays)
s

bar  one   -0.674630
     two   -0.823605
baz  one    0.553978
     two   -1.315951
foo  one    1.318207
     two   -3.419469
qux  one   -0.618415
     two    1.216639
dtype: float64

3. from_product

假如有兩個list，這兩個list內的元素相互交叉，兩兩搭配，這就是兩個list的product：

MultiIndex.from_product(iterables, sortorder=None, names=None)

lists = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]

index = pd.MultiIndex.from_product(lists, names=['first', 'second'])
s = pd.Series(np.random.randn(len(index)), index=index)
s

first  second
bar    one       1.131558
       two      -2.186842
baz    one      -0.045946
       two       1.376054
foo    one       1.384699
       two      -0.141007
qux    one      -0.474400
       two       1.402611
dtype: float64

2. 特征選取作為pipeline（管道）的一部分

特征選擇通常在實際的學習之前用來做預處理。在scikit-learn中推薦的方式是使用：sklearn.pipeline.Pipeline

clf = Pipeline([
  ('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))),
  ('classification', RandomForestClassifier())
])
clf.fit(X, y)

在這段代碼中，利用 sklearn.svm.LinearSVC 和 sklearn.feature_selection.SelectFromModel 來評估特征的重要性並選擇相互相關的特征。

來自：Python開發最佳實踐

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas（3）：索引Index/MultiIndex pandas 重命名MultiIndex列 python庫--pandas--MultiIndex pandas中多重索引multiIndex的使用【pandas-07】分組統計、groupby聚合后不同列數據統計和合並、分層索引MultiIndex和stack及pivot實現數據透視 pd.MultiIndex()多層次索引 pandas數組(pandas Series)-(1) pandas數組(pandas Series)-(2) Pandas之:Pandas簡潔教程 pandas groupby