本文從以下四個方面,介紹用Python實現熵值法確定權重:
一. 熵值法介紹
二. 熵值法實現
三. Python實現熵值法示例1
四. Python實現熵值法示例2
一. 熵值法介紹
熵值法是計算指標權重的經典算法之一,它是指用來判斷某個指標的離散程度的數學方法。離散程度越大,即信息量越大,不確定性就越小,熵也就越小;信息量越小,不確定性越大,熵也越大。根據熵的特性,我們可以通過計算熵值來判斷一個事件的隨機性及無序程度,也可以用熵值來判斷某個指標的離散程度,指標的離散程度越大,該指標對綜合評價的影響越大。
二. 熵值法實現
1.假設數據有n行記錄,m個變量,數據可以用一個n*m的矩陣A表示(n行m列,即n行記錄數,m個特征列)

2.數據的歸一化處理
xij表示矩陣A的第i行j列元素。

3.計算第j項指標下第i個記錄所占比重

4.計算第j項指標的熵值

5.計算第j項指標的差異系數

6.計算第j項指標的權重

三. Python實現熵值法示例1
樣例數據1
.csv格式數據內容
var1,var2,var3,var4,var5,var6
171.33,151.33,0.28,0,106.36,0.05
646.66,370,1.07,61,1686.79,1.64
533.33,189.66,0.59,0,242.31,0.57
28.33,0,0.17,0,137.85,2.29
620,234,0.88,41.33,428.33,0.13
192.33,177.66,0.16,0,128.68,1.07
111,94,0.18,0,234.27,0.22
291,654,1.21,65.66,2.26,0
421.33,247,0.7,0,0.4,0
193,288.66,0.16,0,0,0
82.33,118,0.11,0,758.41,0.24
649.66,648.66,0.54,0,13.35,0.11
37.66,103.33,0.12,0,1133.51,1.1
183.33,282.33,0.55,0,624.73,1.04
1014.66,1264.66,5.07,814.66,0,0
90.66,134,0.3,0,0.15,0
200.66,98.33,0.33,0,681.54,0.51
540.66,558.66,1.08,62,2.71,0.09
80,60.66,0.13,0,910.19,0.88
530.66,281.33,0.88,36,743.21,0.72
166,133,0.13,0,246.88,2.05
377.66,310.33,0.57,0,102.89,0.57
143.33,73,0.23,0,103.94,0.1
394.66,473.66,0.56,0,1.06,0.03
535.66,447.33,0.44,0,10.59,0.08
52.66,56.66,0.52,0,0,0
1381.66,760.66,2.3,781.66,248.71,0.13
44.33,42.33,0.07,0,0.66,0
71.66,62.66,0.11,0,535.26,0.52
148.33,56.66,0.24,0,173.83,0.16
Python代碼:
#!/usr/bin/python # -*- coding: utf-8 -*- """ Created on Fri Mar 23 10:48:36 2018 @author: Big Teacher Brother """ import pandas as pd import numpy as np import math from numpy import array # 1讀取數據 df = pd.read_csv('E:\\text.csv', encoding='gb2312') # 2數據預處理 ,去除空值的記錄 df.dropna() #定義熵值法函數 def cal_weight(x): '''熵值法計算變量的權重''' # 標准化 x = x.apply(lambda x: ((x - np.min(x)) / (np.max(x) - np.min(x)))) # 求k rows = x.index.size # 行 cols = x.columns.size # 列 k = 1.0 / math.log(rows) lnf = [[None] * cols for i in range(rows)] # 矩陣計算-- # 信息熵 # p=array(p) x = array(x) lnf = [[None] * cols for i in range(rows)] lnf = array(lnf) for i in range(0, rows): for j in range(0, cols): if x[i][j] == 0: lnfij = 0.0 else: p = x[i][j] / x.sum(axis=0)[j] lnfij = math.log(p) * p * (-k) lnf[i][j] = lnfij lnf = pd.DataFrame(lnf) E = lnf # 計算冗余度 d = 1 - E.sum(axis=0) # 計算各指標的權重 w = [[None] * 1 for i in range(cols)] for j in range(0, cols): wj = d[j] / sum(d) w[j] = wj # 計算各樣本的綜合得分,用最原始的數據 w = pd.DataFrame(w) return w if __name__ == '__main__': # 計算df各字段的權重 w = cal_weight(df) # 調用cal_weight w.index = df.columns w.columns = ['weight'] print(w) print('運行完成!')
運行的結果:
Running D:/tensorflow/ImageNet/shangzhifa.py Backend Qt5Agg is interactive backend. Turning interactive mode on. weight var1 0.088485 var2 0.074840 var3 0.140206 var4 0.410843 var5 0.144374 var6 0.141251 運行完成!
四. Python實現熵值法示例2
樣例數據:
將數據保存到Excel表格中,並用xlrd讀取。

Python代碼:
import numpy as np import xlrd #讀數據並求熵 path=u'K:\\選指標的.xlsx' hn,nc=1,1 #hn為表頭行數,nc為表頭列數 sheetname=u'Sheet3' def readexcel(hn,nc): data = xlrd.open_workbook(path) table = data.sheet_by_name(sheetname) nrows = table.nrows data=[] for i in range(hn,nrows): data.append(table.row_values(i)[nc:]) return np.array(data) def entropy(data0): #返回每個樣本的指數 #樣本數,指標個數 n,m=np.shape(data0) #一行一個樣本,一列一個指標 #下面是歸一化 maxium=np.max(data0,axis=0) minium=np.min(data0,axis=0) data= (data0-minium)*1.0/(maxium-minium) ##計算第j項指標,第i個樣本占該指標的比重 sumzb=np.sum(data,axis=0) data=data/sumzb #對ln0處理 a=data*1.0 a[np.where(data==0)]=0.0001 # #計算每個指標的熵 e=(-1.0/np.log(n))*np.sum(data*np.log(a),axis=0) # #計算權重 w=(1-e)/np.sum(1-e) recodes=np.sum(data0*w,axis=1) return recodes data=readexcel(hn,nc) grades=entropy(data)
計算結果為:
In[32]:grades Out[32]: array([95.7069621 , 93.14062354, 93.17273781, 92.77037549, 95.84064938, 98.01005572, 90.20508545, 95.17203466, 95.96929203, 97.80841298, 97.021269 ])
上面的程序計算得分時用了標准化前的值×權重,這對於原始評分量綱相同時沒有什么問題。
按照論文上的公式,計算得分時應該用標准化后的值×權重,這對於原始數據量綱不同時應該這樣做,因此按照論文的公式計算的程序如下:
Python代碼為:
import numpy as np import xlrd #讀數據並求熵 path=u'K:\\選指標的.xlsx' hn,nc=1,1 #hn為表頭行數,nc為表頭列數 sheetname=u'Sheet3' def readexcel(hn,nc): data = xlrd.open_workbook(path) table = data.sheet_by_name(sheetname) nrows = table.nrows data=[] for i in range(hn,nrows): data.append(table.row_values(i)[nc:]) return np.array(data) def entropy(data0): #返回每個樣本的指數 #樣本數,指標個數 n,m=np.shape(data0) #一行一個樣本,一列一個指標 #下面是歸一化 maxium=np.max(data0,axis=0) minium=np.min(data0,axis=0) data= (data0-minium)*1.0/(maxium-minium) ##計算第j項指標,第i個樣本占該指標的比重 sumzb=np.sum(data,axis=0) data=data/sumzb #對ln0處理 a=data*1.0 a[np.where(data==0)]=0.0001 # #計算每個指標的熵 e=(-1.0/np.log(n))*np.sum(data*np.log(a),axis=0) # #計算權重 w=(1-e)/np.sum(1-e) recodes=np.sum(data*w,axis=1) return recodes data=readexcel(hn,nc) grades=entropy(data)
結果如下:
In[34]:grades Out[34]: array([0.08767219, 0.07639727, 0.08342572, 0.07555273, 0.08920511, 0.11506703, 0.06970125, 0.09550656, 0.09852824, 0.10232353, 0.10662037])
參考文章:
https://blog.csdn.net/qq_24975309/article/details/82026022
https://blog.csdn.net/weixin_40450867/article/details/81226705
https://blog.csdn.net/weixin_41503009/article/details/82285422
https://blog.csdn.net/wangh0802/article/details/53981356
https://www.jianshu.com/p/3e08e6f6e244
https://blog.csdn.net/yang978897961/article/details/79164829/
https://blog.csdn.net/Yellow_python/article/details/83002698
