sklearn.datasets.make_blobs() 是用於創建多類單標簽數據集的函數,它為每個類分配一個或多個正態分布的點集。
sklearn.datasets.make_blobs(
n_samples=100, # 待生成的樣本的總數
n_features=2, # 每個樣本的特征數
centers=3, # 要生成的樣本中心(類別)數,或者是確定的中心點
cluster_std=1.0, # 每個類別的標准差
center_box=(-10.0, 10.0), #中心確定之后的數據邊界,亦即每個簇的上下限
shuffle=True, # 是否將樣本打亂
random_state=None) #隨機生成器的種子
參數的英文含義:
n_samples: int, optional (default=100) The total number of points equally divided among clusters. n_features: int, optional (default=2) The number of features for each sample. centers: int or array of shape [n_centers, n_features], optional (default=3) The number of centers to generate, or the fixed center locations. cluster_std: float or sequence of floats, optional (default=1.0) The standard deviation of the clusters. 如果生成2類數據,其中一類比另一類具有更大的方差,可以將cluster_std設置為[1.0,3.0]。 center_box: pair of floats (min, max), optional (default=(-10.0, 10.0)) The bounding box for each cluster center when centers are generated at random. shuffle: boolean, optional (default=True) Shuffle the samples. random_state: int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
返回值
X : array of shape [n_samples, n_features]
The generated samples.
生成的樣本數據集。
y : array of shape [n_samples]
The integer labels for cluster membership of each sample.
樣本數據集的標簽。
示例:
# 導入相關模塊
from sklearn.datasets import make_blobs import matplotlib.pyplot as plt
# 創建仿真聚類數據集 X, y = make_blobs(n_samples=150, n_features=2, centers=3, cluster_std=0.5, shuffle=True, random_state=0)
# 繪制散點圖 plt.figure('百里希文', facecolor='lightyellow') plt.scatter(X[:, 0], X[:, 1], c='w', edgecolor='k', marker='o', s=50) plt.grid() plt.show()

