[原創博文] 用Python做統計分析 (Scipy.stats的文檔)


[轉自] 用Python做統計分析 (Scipy.stats的文檔)

對scipy.stats的詳細介紹:

這個文檔說了以下內容,對python如何做統計分析感興趣的人可以看看,畢竟Python的庫也有點亂。有的看上去應該在一起的內容分散在scipy,pandas,sympy等庫中。這里是一般統計功能的使用,在scipy庫中。像什么時間序列之類的當然在其他地方,而且它們反過來就沒這些功能。

隨機變量樣本抽取
84個連續性分布(告訴你有那么多,沒具體介紹)
12個離散型分布
分布的密度分布函數,累計分布函數,殘存函數,分位點函數,逆殘存函數
分布的統計量:均值,方差,峰度,偏度,矩
分布的線性變換生成
數據的分布擬合
分布構造
描述統計
t檢驗,ks檢驗,卡方檢驗,正態性檢,同分布檢驗
核密度估計(從樣本估計概率密度分布函數)


Statistics (scipy.stats)
Introduction
介紹
In this tutorial we discuss many, but certainly not all, features of scipy.stats. The intention here is to provide a user with a working knowledge of this package. We refer to the reference manual for further details.
在這個教程我們討論一些而非全部的scipy.stats模塊的特性。這里我們的意圖是提供給使用者一個關於這個包的實用性知識。我們推薦reference manual來介紹更多的細節。
Note: This documentation is work in progress.
注意:這個文檔還在發展中。
Random Variables
隨機變量
There are two general distribution classes that have been implemented for encapsulating continuous random variables anddiscrete random variables . Over 80 continuous random variables (RVs) and 10 discrete random variables have been implemented using these classes. Besides this, new routines and distributions can easily added by the end user. (If you create one, please contribute it).
有一些通用的分布類被封裝在continuous random variables以及discrete random variables中。有80多個連續性隨機變量(RVs)以及10個離散隨機變量已經用這些類建立。同樣,新的程序和分布可以被用戶新創建(如果你創建了一個,請提供它幫助發展這個包)。
All of the statistics functions are located in the sub-package scipy.stats and a fairly complete listing of these functions can be obtained using info(stats). The list of the random variables available can also be obtained from the docstring for the stats sub-package.
所有統計函數被放在子包scipy.stats中,且有這些函數的一個幾乎完整的列表可以使用info(stats)獲得。這個列表里的隨機變量也可以從stats子包的docstring中獲得介紹。
In the discussion below we mostly focus on continuous RVs. Nearly all applies to discrete variables also, but we point out some differences here: Specific Points for Discrete Distributions.
在接下來的討論中,沃恩着重於連續性隨機變量(RVs)。幾乎所有離散變量也符合下面的討論,但是我們也要指出一些區別在Specific Points for Discrete Distributions中。

Getting Help
獲得幫助
First of all, all distributions are accompanied with help functions. To obtain just some basic information we can call
在開始前,所有分布可以使用help函數得到解釋。為獲得這些信息只需要使用簡單的調用:
>>>
>>> from scipy import stats
>>> from scipy.stats import norm
>>> print norm.__doc__

To find the support, i.e., upper and lower bound of the distribution, call:
為了找到支持,作為例子,我們用這種方式找分布的上下界
>>>
>>> print 'bounds of distribution lower: %s, upper: %s' % (norm.a,norm.b)
bounds of distribution lower: -inf, upper: inf

We can list all methods and properties of the distribution with dir(norm). As it turns out, some of the methods are private methods although they are not named as such (their name does not start with a leading underscore), for example veccdf, are only available for internal calculation (those methods will give warnings when one tries to use them, and will be removed at some point).
我們可以通過調用dir(norm)來獲得關於這個(正態)分布的所有方法和屬性。應該看到,一些方法是私有方法盡管其並沒有以名稱表示出來(比如它們前面沒有以下划線開頭),比如veccdf就只用於內部計算(試圖使用那些方法將引發警告,它們可能會在后續開發中被移除)
To obtain the real main methods, we list the methods of the frozen distribution. (We explain the meaning of a frozen distribution below).
為了獲得真正的主要方法,我們列舉凍結分布的方法(我們將在下文解釋何謂“凍結分布”)
>>>
>>> rv = norm()
>>> dir(rv)  # reformatted
    ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',
    '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',
    '__repr__', '__setattr__', '__str__', '__weakref__', 'args', 'cdf', 'dist',
    'entropy', 'isf', 'kwds', 'moment', 'pdf', 'pmf', 'ppf', 'rvs', 'sf', 'stats']

Finally, we can obtain the list of available distribution through introspection:
最后,我們能通過內省獲得所有的可用分布。
>>>
>>> import warnings
>>> warnings.simplefilter('ignore', DeprecationWarning)
>>> dist_continu = [d for d in dir(stats) if
...                 isinstance(getattr(stats,d), stats.rv_continuous)]
>>> dist_discrete = [d for d in dir(stats) if
...                  isinstance(getattr(stats,d), stats.rv_discrete)]
>>> print 'number of continuous distributions:', len(dist_continu)
number of continuous distributions: 84
>>> print 'number of discrete distributions:  ', len(dist_discrete)
number of discrete distributions:   12

Common Methods
通用方法
The main public methods for continuous RVs are:
連續隨機變量的主要公共方法如下:
rvs: Random Variates
pdf: Probability Density Function
cdf: Cumulative Distribution Function
sf: Survival Function (1-CDF)
ppf: Percent Point Function (Inverse of CDF)
isf: Inverse Survival Function (Inverse of SF)
stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
moment: non-central moments of the distribution
rvs:隨機變量
pdf:概率密度函。
cdf:累計分布函數
sf:殘存函數(1-CDF)
ppf:分位點函數(CDF的逆)
isf:逆殘存函數(sf的逆)
stats:返回均值,方差,(費舍爾)偏態,(費舍爾)峰度。
moment:分布的非中心矩。
Let’s take a normal RV as an example.
讓我們取得一個標准的RV作為例子。
>>>
>>> norm.cdf(0)
0.5

To compute the cdf at a number of points, we can pass a list or a numpy array.
為了計算在一個點上的cdf,我們可以傳遞一個列表或一個numpy數組。
>>>
>>> norm.cdf([-1., 0, 1])
array([ 0.15865525,  0.5       ,  0.84134475])
>>> import numpy as np
>>> norm.cdf(np.array([-1., 0, 1]))
array([ 0.15865525,  0.5       ,  0.84134475])

Thus, the basic methods such as pdf, cdf, and so on are vectorized with np.vectorize.
Other generally useful methods are supported too:
相應的,像pdf,cdf之類的簡單方法可以被矢量化通過np.vectorize.
其他游泳的方法可以像這樣使用。
>>>
>>> norm.mean(), norm.std(), norm.var()
(0.0, 1.0, 1.0)
>>> norm.stats(moments = "mv")
(array(0.0), array(1.0))

To find the median of a distribution we can use the percent point function ppf, which is the inverse of the cdf:
為了找到一個分部的中心,我們可以使用分位數函數ppf,其是cdf的逆。
>>>
>>> norm.ppf(0.5)
0.0

To generate a set of random variates:
為了產生一個隨機變量集合。
>>>
>>> norm.rvs(size=5)
array([-0.35687759,  1.34347647, -0.11710531, -1.00725181, -0.51275702])

Don’t think that norm.rvs(5) generates 5 variates:
不要認為norm.rvs(5)產生了五個變量。
>>>
>>> norm.rvs(5)
7.131624370075814

This brings us, in fact, to the topic of the next subsection.
這個引導我們可以得以進入下一部分的內容。
Shifting and Scaling
位移與縮放(線性變換)
All continuous distributions take loc and scale as keyword parameters to adjust the location and scale of the distribution, e.g. for the standard normal distribution the location is the mean and the scale is the standard deviation.
所有連續分布可以操縱loc以及scale參數作為修正location和scale的方式。作為例子,標准正態分布的location是均值而scale是標准差。
>>>
>>> norm.stats(loc = 3, scale = 4, moments = "mv")
(array(3.0), array(16.0))

In general the standardized distribution for a random variable X is obtained through the transformation (X - loc) / scale. The default values are loc = 0 and scale = 1.
通常經標准化的分布的隨機變量X可以通過變換(X-loc)/scale獲得。它們的默認值是loc=0以及scale=1.
Smart use of loc and scale can help modify the standard distributions in many ways. To illustrate the scaling further, the cdf of an exponentially distributed RV with mean 1is given by
F(x)=1−exp(−λx)
By applying the scaling rule above, it can be seen that by taking scale  = 1./lambda we get the proper scale.
聰明的使用loc與scale可以幫助以靈活的方式調整標准分布。為了進一步說明縮放的效果,下面給出期望為1/λ指數分布的cdf。
F(x)=1−exp(−λx)
通過像上面那樣使用scale,可以看到得到想要的期望值。
>>>
>>> from scipy.stats import expon
>>> expon.mean(scale=3.)
3.0

The uniform distribution is also interesting:
均勻分布也是令人感興趣的:
>>>
>>> from scipy.stats import uniform
>>> uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)
array([ 0.  ,  0.  ,  0.25,  0.5 ,  0.75,  1.  ])

Finally, recall from the previous paragraph that we are left with the problem of the meaning of norm.rvs(5). As it turns out, calling a distribution like this, the first argument, i.e., the 5, gets passed to set the loc parameter. Let’s see:
最后,聯系起我們在前面段落中留下的norm.rvs(5)的問題。事實上,像這樣調用一個分布,其第一個參數,在這里是5,是把loc參數調到了5,讓我們看:
>>>
>>> np.mean(norm.rvs(5, size=500))
4.983550784784704

Thus, to explain the output of the example of the last section: norm.rvs(5) generates a normally distributed random variate with mean loc=5.
I prefer to set the loc and scale parameter explicitly, by passing the values as keywords rather than as arguments. This is less of a hassle as it may seem. We clarify this below when we explain the topic of freezing a RV.
在這里,為解釋最后一段的輸出:norm.rvs(5)產生了一個正態分布變量,其期望,即loc=5.
我傾向於明確的使用loc,scale作為關鍵字而非參數。這看上去只是個小麻煩。我們澄清這一點在我們解釋凍結RV的主題之前。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM