R語言scale與unscale函數


一、scale函數

R語言base庫中自帶數據標准化接口scale函數,函數介紹如下

Usage

scale(x, center = TRUE, scale = TRUE)

 

Arguments

x: a numeric matrix(like object).

center: either a logical value or a numeric vector of length equal to the number of columns of x.

scale: either a logical value or a numeric vector of length equal to the number of columns of x.

 

Details

The value of center determines how column centering is performed. If center is a numeric vector with length equal to the number of columns of x, then each column of x has the corresponding value from center subtracted from it. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.

The value of scale determines how column scaling is performed (after centering). If scale is a numeric vector with length equal to the number of columns of x, then each column of x is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done.

The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)

 

Value

For scale.default, the centered, scaled matrix. The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale"

 

scale方法默認進行z-score標准化,先減去均值,再除以標准差

z-score 標准化(zero-mean normalization)

也叫標准差標准化,這種方法給予原始數據的均值(mean)和標准差(standard deviation)進行數據的標准化。

經過處理的數據符合標准正態分布,即均值為0,標准差為1,其轉化函數為:

 

其中μ為所有樣本數據的均值,σ為所有樣本數據的標准差。

 

二、unscale函數

DMwR中函數unscale可以根據scale的返回對象,還原數據

Usage

unscale(vals, norm.data, col.ids)

 

Arguments

vals: A numeric matrix with the values to un-scale

norm.data: A numeric and scaled matrix. This should be an object to which the function scale() was applied.

col.ids: The columns of the vals matrix that are to be un-scaled (defaults to all of them).

 

Value

An object with the same dimension as the parameter vals

 

三、使用示例

> df<-data.frame(x=c(1,2,3),y=c(2,4,6),z=c(3,6,9))

> df

  x y z

1 1 2 3

2 2 4 6

3 3 6 9

> scaledData<-scale(df)

> scaledData

      x  y  z

[1,] -1 -1 -1

[2,]  0  0  0

[3,]  1  1  1

attr(,"scaled:center")

x y z

2 4 6

attr(,"scaled:scale")

x y z

1 2 3

> unscale(scaledData,scaledData)

     x y z

[1,] 1 2 3

[2,] 2 4 6

[3,] 3 6 9

> ndf<-data.frame(x=c(1,2),y=c(2,4),z=c(3,6))

> ndf

  x y z

1 1 2 3

2 2 4 6

> scale(ndf,center=attr(scaledData, "scaled:center"),scale=attr(scaledData, "scaled:scale"))

      x  y  z

[1,] -1 -1 -1

[2,]  0  0  0

attr(,"scaled:center")

x y z

2 4 6

attr(,"scaled:scale")

x y z

1 2 3


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM