數據挖掘 ---支持度和置信度的用法

本文轉載自查看原文 2019-07-20 11:15 1224 數據分析與數據運營

如果客戶買了 xx 物品,那么他可能買YY物品

規則常用的方法,支持度和置信度

支持度是指規則的應驗次數

置信度就是應驗次數所占的比例

直接上代碼

# 面包,牛奶,奶酪,蘋果,香蕉
from collections import OrderedDict
import numpy as np 
from pyexcel_xls import get_data
from pyexcel_xls import save_data
xls_data = get_data(r"777.xls")
features = ["bread", "milk", "cheese", "apples", "bananas"]

# print (xls_data['Sheet1'])
lis =xls_data['Sheet1']
X= np.array(lis)
n_samples,n_features=X.shape  # 獲取行數
print(n_samples)
print(n_features)
# print(X)
# 統計買蘋果的人數
num_apple_purchaes =0
for  sample  in X:
    if sample[3]==1:
        num_apple_purchaes +=1
print("{0} people bought Apples".format(num_apple_purchaes))
from collections import defaultdict

valid_rules =defaultdict(int)         # 接受應驗次數
invalid_rules =defaultdict(int)       # 接受不應驗次數
num_occurences =defaultdict(int)       # 接受出現次數



for sample in X:                                 #對每一行進行循環
    for premise in range(n_features):            #對每列進行循環
        if sample[premise] == 0: continue        #判斷該行的某一列列元素是否位0，即是否購買，若為0，跳出本輪循環，測試下一列
        
        num_occurences[premise] += 1             #記錄有購買的一列 sample[premise]
        for conclusion in range(n_features):     #當讀取到某一列有購買后，再次循環每一列的值
            if premise == conclusion:            #排除相同的一列，若循環到同一列，則跳出循環，比較下一列
                continue
            if sample[conclusion] == 1:          #當sample[conclusion] 的值為1時，滿足了當顧客購買前一件商品時也買了這種商品
                
                valid_rules[(premise, conclusion)] += 1  #記錄下該規則出現的次數
            else:
                
                invalid_rules[(premise, conclusion)] += 1  #當不滿足時即 sample[conclusion]=0 時，記錄下不滿足該規則的次數
support = valid_rules                               #支持度=規則出現的次數
confidence = defaultdict(float)                     #強制將置信度轉為浮點型
for premise, conclusion in valid_rules.keys():
    confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise] #計算某一規則的置信度，並將其存在字典confidence中

    
    
for premise, conclusion in confidence:     #根據字典的兩個參數來取值
    premise_name = features[premise]       #我們之前定義了features列表，它的每一列都對應數組的每一列，即商品名稱
    conclusion_name = features[conclusion] #商品名稱
 
    print("Rule: 如果顧客購買 {0} 那么他可能同時購買 {1}".format(premise_name, conclusion_name))
    print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
    print(" - Support: {0}".format(support[(premise, conclusion)]))
    print("")

結果: 通過置信度和支持度即可知道當買了什么時候,客戶更喜歡在買什么

25
5
18 people bought Apples
Rule: 如果顧客購買 bread 那么他可能同時購買 milk
 - Confidence: 0.533
 - Support: 8

Rule: 如果顧客購買 milk 那么他可能同時購買 cheese
 - Confidence: 0.222
 - Support: 2

Rule: 如果顧客購買 apples 那么他可能同時購買 cheese
 - Confidence: 0.333
 - Support: 6

Rule: 如果顧客購買 milk 那么他可能同時購買 apples
 - Confidence: 0.444
 - Support: 4

Rule: 如果顧客購買 bread 那么他可能同時購買 apples
 - Confidence: 0.667
 - Support: 10

Rule: 如果顧客購買 apples 那么他可能同時購買 bread
 - Confidence: 0.556
 - Support: 10

Rule: 如果顧客購買 apples 那么他可能同時購買 bananas
 - Confidence: 0.611
 - Support: 11

Rule: 如果顧客購買 apples 那么他可能同時購買 milk
 - Confidence: 0.222
 - Support: 4

Rule: 如果顧客購買 milk 那么他可能同時購買 bananas
 - Confidence: 0.556
 - Support: 5

Rule: 如果顧客購買 cheese 那么他可能同時購買 bananas
 - Confidence: 0.556
 - Support: 5

Rule: 如果顧客購買 cheese 那么他可能同時購買 bread
 - Confidence: 0.556
 - Support: 5

Rule: 如果顧客購買 cheese 那么他可能同時購買 apples
 - Confidence: 0.667
 - Support: 6

Rule: 如果顧客購買 cheese 那么他可能同時購買 milk
 - Confidence: 0.222
 - Support: 2

Rule: 如果顧客購買 bananas 那么他可能同時購買 apples
 - Confidence: 0.647
 - Support: 11

Rule: 如果顧客購買 bread 那么他可能同時購買 bananas
 - Confidence: 0.467
 - Support: 7

Rule: 如果顧客購買 bananas 那么他可能同時購買 cheese
 - Confidence: 0.294
 - Support: 5

Rule: 如果顧客購買 milk 那么他可能同時購買 bread
 - Confidence: 0.889
 - Support: 8

Rule: 如果顧客購買 bananas 那么他可能同時購買 milk
 - Confidence: 0.294
 - Support: 5

Rule: 如果顧客購買 bread 那么他可能同時購買 cheese
 - Confidence: 0.333
 - Support: 5

Rule: 如果顧客購買 bananas 那么他可能同時購買 bread
 - Confidence: 0.412
 - Support: 7

最后按照置信度排序

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據挖掘關聯分析中的支持度、置信度和提升度【數據倉庫與數據挖掘 - 關聯分析算法】頻繁項集？關聯規則？支持度？置信度？自連接？支持度、置信度和提升度關聯分析中的支持度、置信度和提升度支持度、置信度、提升度的區別和計算關聯分析中的支持度、置信度和提升度關聯分析--概述（項集、關聯規則、支持度、置信度、提升度）關聯規則中最小支持度和最小置信度百度數據挖掘筆試題什么是教育數據挖掘？