beta函數與置信度估計


可信度的估計

  • 二項分布中的\(p\) 服從Beta分布 $ {\rm beta}(\alpha, \beta)$, 密度函數 \(\frac1{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta -1}\)
  • 均值 \(\frac \alpha {\alpha + \beta}\)
  • 方差 \(\frac {\alpha \beta} {(\alpha+\beta)^2 (\alpha+ \beta + 1) } ​\)

from scipy.stats import beta

def confidence(n_bad, n_good, tol=2):
    ''' 返回估計的壞率p, 以及在tol倍標准差下的可信度'''
    a, b = n_bad+1, n_good+1
    p = a / (a+b)
    v = beta.std(a, b)
    up, low =  min(1, p + v*tol), max(0, p - v*tol)
    d = beta.cdf(up, a,b) -  beta.cdf(low, a,b)
    return p, v, d



test_set = [
    (500, 20000, 2), 
    (1000, 200000, 2), 
    (2000, 200000, 2), 
    (5000, 200000, 2),
    (500,  100000, 2), 
    (1000, 100000, 2), 
    (2000, 100000, 2), 
    (5000, 100000, 2), 
    (2000, 10000, 2), 
]

print("  bad;  total; 均值p;    標准差v;     均值的相對誤差e;  置信度")
for (n_bad, n_good, tol)  in  test_set:
    p,v,d = confidence(n_bad, n_good, tol)

    ss = ('{:5d};{:7d}; p={p:0.4f}; v={v:0.6f}; e={e:0.3f}; '  
         + '均值在[p - {t}v, p + {t}v]的概率 {d:2.2f}%'
         ).format(n_bad, n_bad+n_good, p=p,v=v, c=v/p, d =d*100,t=tol, e=tol*v/p)
    print(ss)

  bad;  total; 均值p;    標准差v;     均值的相對誤差e;  置信度
  500;  20500; p=0.0244; v=0.001078; e=0.088; 均值在[p - 2v, p + 2v]的概率 95.46%
 1000; 201000; p=0.0050; v=0.000157; e=0.063; 均值在[p - 2v, p + 2v]的概率 95.46%
 2000; 202000; p=0.0099; v=0.000220; e=0.044; 均值在[p - 2v, p + 2v]的概率 95.45%
 5000; 205000; p=0.0244; v=0.000341; e=0.028; 均值在[p - 2v, p + 2v]的概率 95.45%
  500; 100500; p=0.0050; v=0.000222; e=0.089; 均值在[p - 2v, p + 2v]的概率 95.46%
 1000; 101000; p=0.0099; v=0.000312; e=0.063; 均值在[p - 2v, p + 2v]的概率 95.46%
 2000; 102000; p=0.0196; v=0.000434; e=0.044; 均值在[p - 2v, p + 2v]的概率 95.45%
 5000; 105000; p=0.0476; v=0.000657; e=0.028; 均值在[p - 2v, p + 2v]的概率 95.45%
 2000;  12000; p=0.1667; v=0.003402; e=0.041; 均值在[p - 2v, p + 2v]的概率 95.45%

結論: 壞樣本大於2000以上, 在95%置信度下, 壞率的相對誤差<5%


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM