match
pmatch
intersect
%in%
setdiff
===================================================
match package:base R Documentation
Value Matching
Description:
'match' returns a vector of the positions of (first) matches of
its first argument in its second.
'%in%' is a more intuitive interface as a binary operator, which
returns a logical vector indicating if there is a match or not for
its left operand.
Usage:
match(x, table, nomatch = NA_integer_, incomparables = NULL)
x: 向量, 要匹配的值;
table: 向量, 被匹配的值;
nomatch: 沒匹配上的返回值, 必須是整數;
incomparables: 指定不能用來匹配的值.
x %in% table
這個返回的是TRUE和FALSE
> rep(1, 3) %in% rep(1, 5) [1] TRUE TRUE TRUE
match返回的是位置
> match(rep(1, 3), rep(1, 5)) [1] 1 1 1
Arguments:
x: vector or 'NULL': the values to be matched. Long vectors are
supported.
table: vector or 'NULL': the values to be matched against. Long
vectors are not supported.
nomatch: the value to be returned in the case when no match is found.
Note that it is coerced to 'integer'.
incomparables: a vector of values that cannot be matched. Any value in
'x' matching a value in this vector is assigned the 'nomatch'
value. For historical reasons, 'FALSE' is equivalent to
'NULL'.
Details:
'%in%' is currently defined as
'"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
原來這個函數是這樣定義的
> "%in%" <- function(x, table) match(x, table, nomatch = 0) > 1:10 %in% c(1,3,5,9) [1] 1 0 2 0 3 0 0 0 4 0 > "%in%" <- function(x, table) match(x, table, nomatch = 0)>0 > 1:10 %in% c(1,3,5,9) [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
左邊的值在右邊的位置
Factors, raw vectors and lists are converted to character vectors,
and then 'x' and 'table' are coerced to a common type (the later
of the two types in R's ordering, logical < integer < numeric <
complex < character) before matching. If 'incomparables' has
positive length it is coerced to the common type.
Matching for lists is potentially very slow and best avoided
except in simple cases.
Exactly what matches what is to some extent a matter of
definition. For all types, 'NA' matches 'NA' and no other value.
For real and complex values, 'NaN' values are regarded as matching
any other 'NaN' value, but not matching 'NA'.
That '%in%' never returns 'NA' makes it particularly useful in
'if' conditions.
Character strings will be compared as byte sequences if any input
is marked as '"bytes"' (see 'Encoding').
Value:
A vector of the same length as 'x'.
'match': An integer vector giving the position in 'table' of the
first match if there is a match, otherwise 'nomatch'.
If 'x[i]' is found to equal 'table[j]' then the value returned in
the 'i'-th position of the return value is 'j', for the smallest
possible 'j'. If no match is found, the value is 'nomatch'.
'%in%': A logical vector, indicating if a match was located for
each element of 'x': thus the values are 'TRUE' or 'FALSE' and
never 'NA'.
References:
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.
See Also:
'pmatch' and 'charmatch' for (_partial_) string matching,
'match.arg', etc for function argument matching. 'findInterval'
similarly returns a vector of positions, but finds numbers within
intervals, rather than exact matches.
'is.element' for an S-compatible equivalent of '%in%'.
Examples:
## The intersection of two sets can be defined via match():
## Simple version:
## intersect <- function(x, y) y[match(x, y, nomatch = 0)]
intersect # the R function in base is slightly more careful
intersect(1:10, 7:20)

1:10 %in% c(1,3,5,9)
sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%")
sstr[sstr %in% c(letters, LETTERS)]
> sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%")
> sstr[sstr %in% c(letters, LETTERS)]
[1] "c" "B" "c" "a"
c(letters, LETTERS)
大小寫字母這么表示
"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y
(1:10) %w/o% c(3,7,12)
## Note that setdiff() is very similar and typically makes more sense:
c(1:6,7:2) %w/o% c(3,7,12) # -> keeps duplicates
setdiff(c(1:6,7:2), c(3,7,12)) # -> unique values
> setdiff(c(1:6,7:2), c(3,7,12)) [1] 1 2 4 5 6
setdiff是集合
#=====================================================
> ?pmatch
> pmatch("", "") # returns NA
[1] NA
> pmatch("m", c("mean", "median", "mode")) # returns NA [1] NA
#因為不是完全匹配,也不是唯一匹配
> pmatch("med", c("mean", "median", "mode")) # returns 2 [1] 2
#匹配上多個返回NA
>
> pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = FALSE)
[1] NA 2 1
#“”沒匹配上,去掉;“ab”匹配上2,去掉x和table該位置的ab,“ab”不完全匹配上“abc”,返回第一個位置;
感覺這個用的不多
> pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = TRUE)
[1] NA 2 2
> ## compare
> charmatch(c("", "ab", "ab"), c("abc", "ab"))
[1] 0 2 2
pmatch函數是一個部分匹配函數, 依次從x里面挑出元素, 對照table進行匹配, 若匹配上則剔除匹配上的值, 不再參與下次匹配, duplicate.ok可設置是否剔除; 對於某一個元素, 匹配一共分成三步:
1. 如果可以完全匹配, 則認為匹配上了, 返回table中的位置;
2. 不滿足上述條件, 如果是唯一部分匹配, 則返回table中的位置;
3. 不滿足上述條件, 則認為沒有值與其匹配上.
#===========================================================================
本文引用至
Rbase Documentation
http://blog.sina.com.cn/s/blog_73206f7b0102vyox.html
