(from:http://en.wikipedia.org/wiki/Mahalanobis_distance)
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs fromEuclidean distance in that it takes into account the correlations of the data set and is scale-invariant. In other words, it is a multivariateeffect size.
Definition
Formally, the Mahalanobis distance of a multivariate vector from a group of values with mean
and covariance matrix
is defined as:
(注:1.這個是X和總體均值的馬氏距離。2.這里的S是可逆的,那么協方差矩陣不可逆的話怎么辦?)
Mahalanobis distance (or "generalized squared interpoint distance" for its squared value) can also be defined as a dissimilarity measure between two random vectors and
of the same distribution with the covariance matrix
:
If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance measure is called the normalized Euclidean distance:
where is the standard deviation of the
(
) over the sample set.
(源自:百度百科)
馬氏優缺點: