Scipy 學習第二篇：計算距離

本文轉載自查看原文 2018-12-26 10:58 1939 SciPy/ distance/ Python

Scipy中計算距離的模塊是scipy.spatial.distance，最常用的方法是計算距離矩陣，換句話說，從存儲在矩形數組中的觀測向量集合中進行距離矩陣的計算。

一，兩兩距離

在n維空間中的觀測值，計算兩兩之間的距離。距離值越大，相關度越小。

scipy.spatial.distance.pdist(X, metric='euclidean', **kwargs)

函數名是Pairwise DISTance的簡寫，pairwise是指兩兩的，對於一個二維數組，pdist()計算任意兩行之間的距離。

參數注釋：

X：ndarray類型，n維空間中m個觀測值構成的 m行*n列的數組
metric：計算距離的函數，有效值是 ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’
**kwargs：dick類型，metric的額外參數，大致有：
- p : scalar The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.
- w : ndarray The weight vector for metrics that support weights (e.g., Minkowski).
- V : ndarray The variance vector for standardized Euclidean. Default: var(X, axis=0, ddof=1)
- VI : ndarray The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(X.T)).T
- out : ndarray. The output array If not None, condensed distance matrix Y is stored in this array. Note: metric independent, it will become a regular keyword arg in a future scipy version

二，配對記算距離

從兩個輸入的集合中配對，計算每對中兩個數據集的距離：

scipy.spatial.distance.cdist(XA, XB, metric='euclidean', *args, **kwargs)

XA和XB都是ndarray類型，在n維空間中，XA和XB進行配對，計算一隊中兩個數據集合之間的距離。

舉個例子，XA是8行3列的數組，XB是1行3列的數組，XA的每一行都和XB進行配對，計算對中兩個數據之間的距離：

>>> a = np.array([[0, 0, 0],
...               [0, 0, 1],
...               [0, 1, 0],
...               [0, 1, 1],
...               [1, 0, 0],
...               [1, 0, 1],
...               [1, 1, 0],
...               [1, 1, 1]])
>>> b = np.array([[ 0.1,  0.2,  0.4]])
>>> distance.cdist(a, b, 'cityblock')
array([[ 0.7],
       [ 0.9],
       [ 1.3],
       [ 1.5],
       [ 1.5],
       [ 1.7],
       [ 2.1],
       [ 2.3]])

三，距離向量和距離矩陣的轉換

把向量形式的距離向量表示轉換成方形的距離矩陣（dense matrix）形式，也可以把方形的距離矩陣轉換為距離向量：

scipy.spatial.distance.squareform(X, force='no', checks=True)

在計算樣本集中的樣本之間的距離矩陣時，squareform()函數和 pdist()函數經常同時出現， squareform()函數的參數就是 pdist()函數的的返回值，把 pdist() 返回的一維形式，拓展為矩陣。

舉個例子，對於矩陣a，沿對角線對稱，一般是距離矩陣，對角線元素都為0，

from scipy.spatial import distance as dist

a 
array([[ 0,  2,  3,  4],
       [ 2,  0,  7,  8],
       [ 3,  7,  0, 12],
       [ 4,  8, 12,  0]])

調用函數squareform()，按照a的下三角線的元素一列一列展開成一維數組，首先是下三角的第一列元素，2,3,4；其次是第二列元素7,8；最后是第三列元素12，所以輸出v是array([ 2, 3, 4, 7, 8, 12])

v=dist.squareform(a)
v
array([ 2,  3,  4,  7,  8, 12])

相反，把v作為輸入參數傳遞給函數squareform()，得到冗余矩陣b，也就是把數組還原為距離矩陣a。

b=dist.squareform(v)
b
array([[ 0,  2,  3,  4],
       [ 2,  0,  7,  8],
       [ 3,  7,  0, 12],
       [ 4,  8, 12,  0]])

參考文檔：

scipy.spatial.distance

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Scipy 學習第3篇：數字向量的距離計算 php+GTK2 學習第二篇 Android基礎學習第二篇—Activity 學習KnockOut第二篇之Counter 計算機視覺【第二篇】面部識別 DAX 第二篇：計算上下文第二篇：從 GPU 的角度理解並行計算【計算機網絡】學習筆記，第二篇：物理層（謝希仁版）學習python第二篇 if語句及邏輯判斷 sklearn 學習第二篇：特征預處理