異常檢測用幾種方法

本文轉載自查看原文 2014-07-07 01:17 4111 One-Class SVM/ 異常檢測

在污染的數量已知的情況下，下面的例子介紹了執行野點和異常檢測的兩種不同方式：

基於協方差的穩健估計，假設數據是高斯分布的，那么在這樣的案例中執行效果將優於One-Class SVM；
利用One-Class SVM，它有能力捕獲數據集的形狀,因此對於強非高斯數據有更加優秀的效果，例如兩個截然分開的數據集；

正常值和異常值的真實狀況是由點的顏色而定的，橙色填充的區域則表示這部分點被對應的方法標記為異常值。

這里我們假定，我們知道數據集中一部分的異常值。由此我們通過對decision_function設置閾值來分離出相應的部分，而不是使用'預測'方法。

 1 """
 2 ==========================================
 3 Outlier detection with several methods.
 4 ==========================================
 5 
 6 When the amount of contamination is known, this example illustrates two
 7 different ways of performing :ref:`outlier_detection`:
 8 
 9 - based on a robust estimator of covariance, which is assuming that the
10   data are Gaussian distributed and performs better than the One-Class SVM
11   in that case.
12 
13 - using the One-Class SVM and its ability to capture the shape of the
14   data set, hence performing better when the data is strongly
15   non-Gaussian, i.e. with two well-separated clusters;
16 
17 The ground truth about inliers and outliers is given by the points colors
18 while the orange-filled area indicates which points are reported as outliers
19 by each method.
20 
21 Here, we assume that we know the fraction of outliers in the datasets.
22 Thus rather than using the 'predict' method of the objects, we set the
23 threshold on the decision_function to separate out the corresponding
24 fraction.
25 """
26 print(__doc__)
27 
28 import numpy as np
29 import pylab as pl
30 import matplotlib.font_manager
31 from scipy import stats
32 
33 from sklearn import svm
34 from sklearn.covariance import EllipticEnvelope
35 
36 # Example settings
37 n_samples = 200
38 outliers_fraction = 0.25
39 clusters_separation = [0, 1, 2]
40 
41 # define two outlier detection tools to be compared
42 classifiers = {
43     "One-Class SVM": svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,
44                                      kernel="rbf", gamma=0.1),
45     "robust covariance estimator": EllipticEnvelope(contamination=.1)}
46 
47 # Compare given classifiers under given settings
48 xx, yy = np.meshgrid(np.linspace(-7, 7, 500), np.linspace(-7, 7, 500))
49 n_inliers = int((1. - outliers_fraction) * n_samples)
50 n_outliers = int(outliers_fraction * n_samples)
51 ground_truth = np.ones(n_samples, dtype=int)
52 ground_truth[-n_outliers:] = 0
53 
54 # Fit the problem with varying cluster separation
55 for i, offset in enumerate(clusters_separation):
56     np.random.seed(42)
57     # Data generation
58     X1 = 0.3 * np.random.randn(0.5 * n_inliers, 2) - offset
59     X2 = 0.3 * np.random.randn(0.5 * n_inliers, 2) + offset
60     X = np.r_[X1, X2]
61     # Add outliers
62     X = np.r_[X, np.random.uniform(low=-6, high=6, size=(n_outliers, 2))]
63 
64     # Fit the model with the One-Class SVM
65     pl.figure(figsize=(10, 5))
66     for i, (clf_name, clf) in enumerate(classifiers.iteritems()):
67         # fit the data and tag outliers
68         clf.fit(X)
69         y_pred = clf.decision_function(X).ravel()
70         threshold = stats.scoreatpercentile(y_pred,
71                                             100 * outliers_fraction)
72         y_pred = y_pred > threshold
73         n_errors = (y_pred != ground_truth).sum()
74         # plot the levels lines and the points
75         Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
76         Z = Z.reshape(xx.shape)
77         subplot = pl.subplot(1, 2, i + 1)
78         subplot.set_title("Outlier detection")
79         subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
80                          cmap=pl.cm.Blues_r)
81         a = subplot.contour(xx, yy, Z, levels=[threshold],
82                             linewidths=2, colors='red')
83         subplot.contourf(xx, yy, Z, levels=[threshold, Z.max()],
84                          colors='orange')
85         b = subplot.scatter(X[:-n_outliers, 0], X[:-n_outliers, 1], c='white')
86         c = subplot.scatter(X[-n_outliers:, 0], X[-n_outliers:, 1], c='black')
87         subplot.axis('tight')
88         subplot.legend(
89             [a.collections[0], b, c],
90             ['learned decision function', 'true inliers', 'true outliers'],
91             prop=matplotlib.font_manager.FontProperties(size=11))
92         subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name, n_errors))
93         subplot.set_xlim((-7, 7))
94         subplot.set_ylim((-7, 7))
95     pl.subplots_adjust(0.04, 0.1, 0.96, 0.94, 0.1, 0.26)
96 
97 pl.show()

Total running time of the example: 2.13 seconds

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 漏洞檢測的幾種方法檢測數據類型的幾種方法 ASP.Net Core中處理異常的幾種方法 python中自定義超時異常的幾種方法 C++異常處理機制幾種方法基於SVM的異常檢測方法異常行為檢測方法 springJDBC的幾種方法遍歷的幾種方法去空格幾種方法