Python實現熵值法確定權重


 本文從以下四個方面,介紹用Python實現熵值法確定權重:

一. 熵值法介紹

二. 熵值法實現

三. Python實現熵值法示例1

四. Python實現熵值法示例2

 

一. 熵值法介紹

熵值法是計算指標權重的經典算法之一,它是指用來判斷某個指標的離散程度的數學方法。離散程度越大,即信息量越大,不確定性就越小,熵也就越小;信息量越小,不確定性越大,熵也越大。根據熵的特性,我們可以通過計算熵值來判斷一個事件的隨機性及無序程度,也可以用熵值來判斷某個指標的離散程度,指標的離散程度越大,該指標對綜合評價的影響越大。

 

二. 熵值法實現

1.假設數據有n行記錄,m個變量,數據可以用一個n*m的矩陣A表示(n行m列,即n行記錄數,m個特征列)

 

2.數據的歸一化處理

xij表示矩陣A的第i行j列元素。

 

3.計算第j項指標下第i個記錄所占比重

 

4.計算第j項指標的熵值

 

5.計算第j項指標的差異系數

 

6.計算第j項指標的權重

 

 

三. Python實現熵值法示例1

樣例數據1

.csv格式數據內容

var1,var2,var3,var4,var5,var6
171.33,151.33,0.28,0,106.36,0.05
646.66,370,1.07,61,1686.79,1.64
533.33,189.66,0.59,0,242.31,0.57
28.33,0,0.17,0,137.85,2.29
620,234,0.88,41.33,428.33,0.13
192.33,177.66,0.16,0,128.68,1.07
111,94,0.18,0,234.27,0.22
291,654,1.21,65.66,2.26,0
421.33,247,0.7,0,0.4,0
193,288.66,0.16,0,0,0
82.33,118,0.11,0,758.41,0.24
649.66,648.66,0.54,0,13.35,0.11
37.66,103.33,0.12,0,1133.51,1.1
183.33,282.33,0.55,0,624.73,1.04
1014.66,1264.66,5.07,814.66,0,0
90.66,134,0.3,0,0.15,0
200.66,98.33,0.33,0,681.54,0.51
540.66,558.66,1.08,62,2.71,0.09
80,60.66,0.13,0,910.19,0.88
530.66,281.33,0.88,36,743.21,0.72
166,133,0.13,0,246.88,2.05
377.66,310.33,0.57,0,102.89,0.57
143.33,73,0.23,0,103.94,0.1
394.66,473.66,0.56,0,1.06,0.03
535.66,447.33,0.44,0,10.59,0.08
52.66,56.66,0.52,0,0,0
1381.66,760.66,2.3,781.66,248.71,0.13
44.33,42.33,0.07,0,0.66,0
71.66,62.66,0.11,0,535.26,0.52
148.33,56.66,0.24,0,173.83,0.16

Python代碼:

#!/usr/bin/python
# -*- coding: utf-8 -*-
 
"""
Created on Fri Mar 23 10:48:36 2018
@author: Big Teacher Brother
"""
import pandas as pd
import numpy as np
import math
from numpy import array
 
# 1讀取數據
df = pd.read_csv('E:\\text.csv', encoding='gb2312')
# 2數據預處理 ,去除空值的記錄
df.dropna()
 
#定義熵值法函數
def cal_weight(x):
    '''熵值法計算變量的權重'''
    # 標准化
    x = x.apply(lambda x: ((x - np.min(x)) / (np.max(x) - np.min(x))))
 
    # 求k
    rows = x.index.size  #
    cols = x.columns.size  #
    k = 1.0 / math.log(rows)
 
    lnf = [[None] * cols for i in range(rows)]
 
    # 矩陣計算--
    # 信息熵
    # p=array(p)
    x = array(x)
    lnf = [[None] * cols for i in range(rows)]
    lnf = array(lnf)
    for i in range(0, rows):
        for j in range(0, cols):
            if x[i][j] == 0:
                lnfij = 0.0
            else:
                p = x[i][j] / x.sum(axis=0)[j]
                lnfij = math.log(p) * p * (-k)
            lnf[i][j] = lnfij
    lnf = pd.DataFrame(lnf)
    E = lnf
 
    # 計算冗余度
    d = 1 - E.sum(axis=0)
    # 計算各指標的權重
    w = [[None] * 1 for i in range(cols)]
    for j in range(0, cols):
        wj = d[j] / sum(d)
        w[j] = wj
        # 計算各樣本的綜合得分,用最原始的數據
    
    w = pd.DataFrame(w)
    return w
 
 
if __name__ == '__main__':
    # 計算df各字段的權重
    w = cal_weight(df)  # 調用cal_weight
    w.index = df.columns
    w.columns = ['weight']
    print(w)
    print('運行完成!')
 

運行的結果:

Running D:/tensorflow/ImageNet/shangzhifa.py
Backend Qt5Agg is interactive backend. Turning interactive mode on.
        weight
var1  0.088485
var2  0.074840
var3  0.140206
var4  0.410843
var5  0.144374
var6  0.141251
運行完成!

 

四. Python實現熵值法示例2

樣例數據:

將數據保存到Excel表格中,並用xlrd讀取。

Python代碼:

import numpy as np
import xlrd
 
#讀數據並求熵
path=u'K:\\選指標的.xlsx'
hn,nc=1,1
#hn為表頭行數,nc為表頭列數
sheetname=u'Sheet3'
def readexcel(hn,nc):
    data = xlrd.open_workbook(path)
    table = data.sheet_by_name(sheetname)
    nrows = table.nrows
    data=[]
    for i in range(hn,nrows):
        data.append(table.row_values(i)[nc:])
    return np.array(data)
def entropy(data0):
    #返回每個樣本的指數
    #樣本數,指標個數
    n,m=np.shape(data0)
    #一行一個樣本,一列一個指標
    #下面是歸一化
    maxium=np.max(data0,axis=0)
    minium=np.min(data0,axis=0)
    data= (data0-minium)*1.0/(maxium-minium)
    ##計算第j項指標,第i個樣本占該指標的比重
    sumzb=np.sum(data,axis=0)
    data=data/sumzb
    #對ln0處理
    a=data*1.0
    a[np.where(data==0)]=0.0001
#    #計算每個指標的熵
    e=(-1.0/np.log(n))*np.sum(data*np.log(a),axis=0)
#    #計算權重
    w=(1-e)/np.sum(1-e)
    recodes=np.sum(data0*w,axis=1)
    return recodes
data=readexcel(hn,nc)
grades=entropy(data)

計算結果為:

In[32]:grades
Out[32]: 
array([95.7069621 , 93.14062354, 93.17273781, 92.77037549, 95.84064938,
       98.01005572, 90.20508545, 95.17203466, 95.96929203, 97.80841298,
       97.021269  ])

上面的程序計算得分時用了標准化前的值×權重,這對於原始評分量綱相同時沒有什么問題。

按照論文上的公式,計算得分時應該用標准化后的值×權重,這對於原始數據量綱不同時應該這樣做,因此按照論文的公式計算的程序如下:

Python代碼為:

import numpy as np
import xlrd
 
#讀數據並求熵
path=u'K:\\選指標的.xlsx'
hn,nc=1,1
#hn為表頭行數,nc為表頭列數
sheetname=u'Sheet3'
def readexcel(hn,nc):
    data = xlrd.open_workbook(path)
    table = data.sheet_by_name(sheetname)
    nrows = table.nrows
    data=[]
    for i in range(hn,nrows):
        data.append(table.row_values(i)[nc:])
    return np.array(data)
def entropy(data0):
    #返回每個樣本的指數
    #樣本數,指標個數
    n,m=np.shape(data0)
    #一行一個樣本,一列一個指標
    #下面是歸一化
    maxium=np.max(data0,axis=0)
    minium=np.min(data0,axis=0)
    data= (data0-minium)*1.0/(maxium-minium)
    ##計算第j項指標,第i個樣本占該指標的比重
    sumzb=np.sum(data,axis=0)
    data=data/sumzb
    #對ln0處理
    a=data*1.0
    a[np.where(data==0)]=0.0001
#    #計算每個指標的熵
    e=(-1.0/np.log(n))*np.sum(data*np.log(a),axis=0)
#    #計算權重
    w=(1-e)/np.sum(1-e)
    recodes=np.sum(data*w,axis=1)
    return recodes
data=readexcel(hn,nc)
grades=entropy(data)

結果如下:

In[34]:grades
Out[34]: 
array([0.08767219, 0.07639727, 0.08342572, 0.07555273, 0.08920511,
       0.11506703, 0.06970125, 0.09550656, 0.09852824, 0.10232353,
       0.10662037])



參考文章:

https://blog.csdn.net/qq_24975309/article/details/82026022
https://blog.csdn.net/weixin_40450867/article/details/81226705
https://blog.csdn.net/weixin_41503009/article/details/82285422
https://blog.csdn.net/wangh0802/article/details/53981356
https://www.jianshu.com/p/3e08e6f6e244
https://blog.csdn.net/yang978897961/article/details/79164829/
https://blog.csdn.net/Yellow_python/article/details/83002698

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM