1、概率密度函數
定義:對任一個隨機變量X,存在一個函數f(x),滿足以上條件,那么就說,f(x)是X的概率密度函數:
意義說明:描述隨機變量在某一個確定取值點的可能性的函數,或者說是瞬時增幅的一個函數,用微分定義如下:
2、累積分布函數
定義:對任一隨機變量X,對於任意給定值a,所有小於值a出現的概率和,就是隨機變量X的分布函數,分布函數可以唯一決定一個隨機變量,定義公式:
性質:(1)有界性;(2)單調性;(3)右連續性。
累積分布函數由於英文為Cumulative Distribution Function,所以經常簡稱為CDF。
3、分位數函數
定義:分位數函數是累積分布函數的反函數,也就是說,給定概率值,計算出隨機變量的取值(左側分位數)。
常用的有四個分布的分位數:
標准正態分布,qnorm(p, mean=0, sd=1)
Student’s (t) , qt(p,df=N,ncp=0)
卡方分布:qchisq(p, df=N,ncp=0)
Fisher-Snedecor:qf(p, df1,df2,ncp=0)
特例:四分位數
定義:四分位數是統計學中分位數的一種,即把所有的數值從小到大朴烈並分為四等分,處於三個分割點的數就是四分位數。
選值原則:樣本總量N,分位數y(百分數),令
(1)L是整數,取第L和L+1的平均值
(2)L不是整數,取下一個最近的整數(1.2取2)
4、隨機數函數
定義,從一個給定函數的的取值中隨機挑出一個自變量,輸出的是因變量的值。
5、幾個常見的隨機變量的四種函數形式:
(1)The Normal Distribution
Usage:
dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
Arguments:
x,q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If length(n) > 1, the length is taken to be the number required. |
mean |
vector of means. |
sd |
vector of standard deviations. |
log, log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x] |
(2)卡方分布
Usage:
dchisq(x, df, ncp=0, log = FALSE)
pchisq(q, df, ncp=0, lower.tail = TRUE, log.p = FALSE)
qchisq(p, df, ncp=0, lower.tail = TRUE, log.p = FALSE)
rchisq(n, df, ncp=0)
Arguments:
x, q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If length(n) > 1, the length is taken to be the number required. |
df |
degrees of freedom (non-negative, but can be non-integer). |
ncp |
non-centrality parameter (non-negative). |
log, log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. |
(3)F分布
Usage:
df(x, df1, df2, ncp, log = FALSE)
pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
rf(n, df1, df2, ncp)
Arguments:
x, q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If length(n) > 1, the length is taken to be the number required. |
df1, df2 |
degrees of freedom. Inf is allowed. |
ncp |
non-centrality parameter. If omitted the central F is assumed. |
log, log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. |
(4)T分布
Usage:
dt(x, df, ncp, log = FALSE)
pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
rt(n, df, ncp)
Arguments:
x, q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If length(n) > 1, the length is taken to be the number required. |
df |
degrees of freedom (> 0, maybe non-integer). df = Inf is allowed. |
ncp |
non-centrality parameter delta; currently except for rt(), only for abs(ncp) <= 37.62. If omitted, use the central t distribution. |
log, log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. |