我來填坑啦qwq...
大二的時候上離散課老師讓實現社區划分算法,當時在網上找了一個類似的代碼,但是有bug,自己調了調能用就交上去了,順手在原博客下評論了一句。沒想到兩年后,我的評論炸了。。。。

竟然有21個回復,所以我回去翻了翻之前的數據備份,找到了源代碼和數據,在此處分享一下,並填了兩年前的坑(我真不是故意現在才填的)。原博客講解fast unfolding鏈接在此
第一個是大家都要的fast_unfolding算法,后面會貼所用數據的鏈接(之前的代碼風格是真的爛)。
"""
函數作用:加載數據,從指定文件中按行讀取
數據格式: Vertex1 Vertex2 Weight
節點1 節點2 權重
返回值: 保存節點的字典Vector_dict和保存邊的字典edge_dict
"""
def loadData(filePath):
# Step1: 打開文件,
# 並初始化保存節點的Vector_dict和保存邊的edge_dict
f = open(filePath)
vector_dict = {}
edge_dict = {}
# Step2:按行讀取文件中信息,並根據特定格式對數據做出划分
for line in f.readlines():
# x.split()是根據括號內的內容分開
# x.strip()是消除字符串頭尾的指定字符串
# 將每行數據划分為長度為3的List
lines = line.strip().split("\t")
# 因為做的是一個無向圖,所以兩個節點都要遍歷
for i in range(2):
if lines[i] not in vector_dict:
# 如果此節點不在保存節點的字典中
# 將節點作為key放入字典,然后將value設為True
vector_dict[lines[i]] = True
# 初始化這個節點包含的邊
edge_list = []
else:
# 如果此節點在保存節點的字典中
# 獲取此節點之前的邊集及邊的權重
edge_list = edge_dict[lines[i]]
# 將邊上的另一個節點及邊的權重作為一個元素放入列表
edge_list.append(lines[1-i]+":"+lines[2])
# 更新字典中此節點包含的邊
edge_dict[lines[i]] = edge_list
# Step3:返回包含節點的字典和包含邊的字典
return vector_dict, edge_dict
"""
函數作用:計算模塊度Q值,
輸入:划分好的社團節點集合vector_dict,邊的集合edge_dict
輸出:模塊度Q的值
模塊度計算的具體表達式如下(化簡版)
Q=∑_c〖[(∑In)/2m-〖((∑tot)/2m)〗^2]〗
其中,∑In表示社區c內部的權重,
∑tot表示與社區c內部點連接的邊的權重和
原始模塊度計算公式
Q=(1/2m) * ∑_(i,j)〖[A_(i,j)-(k_i k_j)/2m]δ(c_i,c_j)〗
其中,m表示網絡中所有的權重和,A_(i,j)表示節點i到節點j之間邊的權重
m=1/2 ∑_(i,j)A_(i,j)
K_i表示和節點i連接的邊的權重合。K_i=∑_j A_(i,j)
C_i表示頂點分配到的社區
δ(c_i,c_j)用於判斷頂點i和頂點j是否被划分在同一個社區中
"""
def modularity(vector_dict, edge_dict):
Q = 0.0
m = 0
# Step1:對每一個點遍歷,計算出權重合m
for i in edge_dict.keys():
edge_list = edge_dict[i]
for j in range(len(edge_list)):
l = edge_list[j].strip().split(":")
m += float(l[1].strip())
# Step2:找到每個社區的節點集合
# Hint:vector_dict中
# Key的值為節點,Value的值為社區號
community_dict = {}
for i in vector_dict.keys():
if vector_dict[i] not in community_dict:
community_list = []
else:
community_list = community_dict[vector_dict[i]]
community_list.append(i)
community_dict[vector_dict[i]] = community_list
# Step3:計算∑In和∑tot
# Hint:利用剛剛划分好的社區字典community_list
# 對每個社區i分別計算∑In和∑tot
for i in community_dict.keys():
sum_in = 0.0
sum_tot = 0.0
# 找出來同一個社區的節點
vector_list = community_dict[i]
for j in range(len(vector_list)):
# vector_list[j]含有的邊集
link_list = edge_dict[vector_list[j]]
tmp_dict = {}
for link_mem in link_list:
l = link_mem.strip().split(":")
# 先把邊的另一個點放進去,值是權重
tmp_dict[l[0]] = l[1]
for k in range(0, len(vector_list)):
if vector_list[k] in tmp_dict:
sum_in += float(tmp_dict[vector_list[k]])
# 由於一條邊儲存了兩次,所以計算的值也是實際的2倍
# sum_tot計算較為簡單,直接將點對應邊的權重加和即可
for vec in vector_list:
link_list = edge_dict[vec]
for i in link_list:
l = i.strip().split(":")
sum_tot += float(l[1])
Q += ((sum_in / m) - (sum_tot/m)*(sum_tot/m))
# Step4:將計算的Q值返回
return Q
"""
函數作用:更改社區歸屬,迭代每次結果,找到最大模塊度的最大值
輸入:當前社區划分狀態vector_dict,含有邊的字典edge_dict及當前Q值
輸出:Q值最佳時的社區划分狀態以及此時的模塊度的值
"""
def change_community(vector_dict, edge_dict, Q):
# Step1:初始化臨時社區划分
vector_tmp_dict = {}
for key in vector_dict:
vector_tmp_dict[key] = vector_dict[key]
# Step2:遍歷社團划分方式
# Hint:如果兩個節點之間有一條邊,嘗試將他們歸為同一個社區
# 然后計算模塊度,若模塊度變大,保留此次划分;若模塊度減小,忽略此次划分
for key in vector_tmp_dict.keys():
neighbor_vector_list = edge_dict[key]
# 當前的邊
for vec in neighbor_vector_list:
# ori_com是當前點所在的社區
ori_com = vector_tmp_dict[key]
vec_v = vec.strip().split(":")
if ori_com != vector_tmp_dict[vec_v[0]]:
vector_tmp_dict[key] = vector_tmp_dict[vec_v[0]]
Q_new = modularity(vector_tmp_dict, edge_dict)
if (Q_new - Q) > 0:
Q = Q_new
else:
vector_tmp_dict[key] = ori_com
# Step3:返回新的社團划分以及新的模塊度值
return vector_tmp_dict, Q
"""
函數作用:計算此時的社區數量,然后更新社區的名字(從0開始排列)
輸入:包含節點信息的社區划分的字典vector_dict
輸出:此時的社區數量
"""
def modify_community(vector_dict):
# Step1:初始化計數器
community_dict = {}
community_num = 0
# Step2:計算社區數量(第一個社區是0)
for community_values in vector_dict.values():
if community_values not in community_dict:
community_dict[community_values] = community_num
community_num += 1
# Step3:更新社區名字
for key in vector_dict.keys():
vector_dict[key] = community_dict[vector_dict[key]]
# Step4:返回社區數量
return community_num
"""
函數作用:重新構造社區網絡,對原社區網絡縮點
函數使用條件:當前社區網絡的模塊度已達最大值
輸入:包含節點信息的社區划分的字典vector_dict,包含現有邊的字典edge_dict
當前社區數量community_num
輸出:新的包含節點信息的社區划分,新的邊,原始社區節點划分
"""
def rebuild_graph(vector_dict, edge_dict, community_num):
# Step1:初始化新的字典
vector_new_dict = {}
edge_new_dict = {}
# cal the inner connection in every community
community_dict = {}
# Step2:根據原社區划分,將社區值設為community_dict的key
# 將在同一個社區的節點作為對應key的value
for key in vector_dict.keys():
if vector_dict[key] not in community_dict:
community_list = []
else:
community_list = community_dict[vector_dict[key]]
community_list.append(key)
community_dict[vector_dict[key]] = community_list
# Step3:初始化新的包含社區划分的字典
# 即將每一個社區中的人縮為一個點
for key in community_dict.keys():
vector_new_dict[str(key)] = str(key)
# Step4.1:構造新的邊,將每個社區內部邊的權重作為新的邊的權重
# 然后令新邊為一個社區的自環。一個社區縮為一個點后,產生一條由當前社區指向自己的邊
# 其權重為社區內部邊權重的總和
for i in community_dict.keys():
sum_in = 0.0
# 當前社區內的節點,計算社區內部的sum_in
vector_list = community_dict[i]
if '0' in vector_list:
print(vector_list)
print(i)
for j in range(0,len(vector_list)):
# 當前社區內某個節點包含的邊
link_list = edge_dict[vector_list[j]]
tmp_dict = {}
for link_mem in link_list:
l = link_mem.strip().split(":")
tmp_dict[l[0]] = l[1]
for k in range(0, len(vector_list)):
if vector_list[k] in tmp_dict:
sum_in += float(tmp_dict[vector_list[k]])
# 初始化,每個節點都有指向自己的一條邊
inner_list = []
inner_list.append(str(i) + ":" + str(sum_in))
edge_new_dict[str(i)] = inner_list
# Step4.2:計算兩社區之間邊的權重和,然后將權重和作為縮點后兩社區邊的權重
community_list = list(community_dict.keys())
for i in range(len(community_list)):
for j in range(len(community_list)):
if i != j:
sum_outer = 0.0
# 把兩個社區的點都拿出來
member_list_1 = community_dict[community_list[i]]
member_list_2 = community_dict[community_list[j]]
for i_1 in range(len(member_list_1)):
tmp_dict = {}
tmp_list = edge_dict[member_list_1[i_1]]
for k in range(len(tmp_list)):
tmp = tmp_list[k].strip().split(":")
tmp_dict[tmp[0]] = tmp[1]
for j_1 in range(len(member_list_2)):
if member_list_2[j_1] in tmp_dict:
sum_outer += float(tmp_dict[member_list_2[j_1]])
# 如果i,j兩個社區之間有聯系,把和對應社區的聯系設為新點之間的權重
if sum_outer != 0:
inner_list = edge_new_dict[str(community_list[i])]
inner_list.append(str(j) + ":" + str(sum_outer))
edge_new_dict[str(community_list[i])] = inner_list
# Step5:返回新構造的社區,邊集,以及原始的社區節點划分
return vector_new_dict, edge_new_dict, community_dict
"""
函數作用:fast_unfolding的框架,調用各個子函數
輸入:原始的節點和原始的邊
輸出:社區划分的過程以及最終的社區划分
"""
def fast_unfolding(vector_dict, edge_dict):
# Step1:初始化原始節點,將每個節點划為一個社區
for i in vector_dict.keys():
vector_dict[i] = i
# Step2:不斷計算並迭代模塊度,直至模塊度達到當前划分的最大值
Q = modularity(vector_dict, edge_dict)
Q_new = 0.0
while (Q_new != Q):
Q_new = Q
vector_dict, Q = change_community(vector_dict, edge_dict, Q)
community_num = modify_community(vector_dict)
# Step3:輸出第一次迭代后的結果
print("Q = ", Q)
print(community_num)
'''
print("vector_dict.key : ", vector_dict.keys())
print("vector_dict.value : ", vector_dict.values())
'''
# Step4:不斷縮點,更新模塊度的值
Q_best = Q
while True:
# Step4.1:重建社區網絡,縮點
print("\n rebuild")
vector_new_dict, edge_new_dict, community_dict = rebuild_graph(vector_dict, edge_dict, community_num)
print("community_dict : ", community_dict)
# Step4.2:重新計算模塊度
Q_new = 0.0
while (Q_new != Q):
Q_new = Q
vector_new_dict, Q = change_community(vector_new_dict, edge_new_dict, Q)
community_num = modify_community(vector_new_dict)
# Step4.3:輸出本次迭代的結果
print("Q = ", Q)
print("community_num : ", community_num)
if (Q_best == Q):
break
Q_best = Q
# Step4.4:
vector_result = {}
for key in community_dict.keys():
value_of_vector = community_dict[key]
for i in range(len(value_of_vector)):
# 社區中的每個點為key,轉化為社區
vector_result[value_of_vector[i]] = str(vector_new_dict[str(key)])
for key in vector_result.keys():
vector_dict[key] = vector_result[key]
# print("vector_dict.key : ", vector_dict.keys())
# print("vector_dict.value : ", vector_dict.values())
# Step5:輸出最終解
vector_result = {}
for key in community_dict.keys():
value_of_vector = community_dict[key]
for i in range(len(value_of_vector)):
vector_result[value_of_vector[i]] = str(vector_new_dict[str(key)])
for key in vector_result.keys():
vector_dict[key] = vector_result[key]
print("Q_best : ", Q_best)
print("vector_result.key : ", vector_dict.keys())
print("vector_result.value : ", vector_dict.values())
if __name__ == "__main__":
vector_dict, edge_dict=loadData("./data.txt")
fast_unfolding(vector_dict, edge_dict)
代碼中的data.txt點擊此處從百度網盤獲得(提取碼gugh)。
算法運行結果如下圖所示,進行了幾次縮點,Q值在不斷提升(squeezed text是python IDLE輸出時縮進的,正常應該是輸出的一堆結果)。

下面一個算法是fast newman,這個我忘了當時怎么搞的了,總之是社區划分算法,現在還能跑,里面有數據demo,也一並分享出來。
#-*- coding:utf-8 –*-
# 創建關系矩陣
def create_relation_matrix(List, n):
adjacent_matrix = create_matrix(n, 0)
for relation in List:
adjacent_matrix[relation[0]][relation[1]] = 1
adjacent_matrix[relation[1]][relation[0]] = 1
return adjacent_matrix
# 輸出列表
def printf(List):
for x in List:
print (x)
# 列表去重
def list_unique(List):
new_list = []
for id in List:
if id not in new_list:
new_list.append(id)
return new_list
# 創建矩陣,number為矩陣一維個數,number為填充數字
def create_matrix(number, amount):
matrix = []
for i in range(0, number):
tmp = []
for j in range(0, number):
tmp.append(amount)
matrix.append(tmp)
return matrix
# 查找包含該元素的所有位置
def find_index(List, node):
return [i for i, j in enumerate(List) if j == node]
# 獲取模塊度
def get_modularity(node_list, node_club, club_list, node_matrix):
uni = list_unique(club_list)
# 更新社團位置
for node in uni:
idices = find_index(club_list, node)
for i in idices:
node_club[i] = uni.index(node)
Q = 0
m = sum([sum(node) for node in node_matrix])/2 # 網絡的邊的數目
k = len(list_unique(node_club)) # 當前社團數目
e = create_matrix(k, 0) # 構造0矩陣
for i in range(k):
idx = find_index(node_club, i)
labelsi = idx
for j in range(k):
idx = find_index(node_club, j)
labelsj = idx
for ii in labelsi:
for jj in labelsj:
e[i][j] = e[i][j]+node_matrix[ii][jj] # e[i][j]代表i社團與j社團之間有多少連接
e = [[float(j)/(2*m) for j in i] for i in e]
a = []
for i in range(k):
ai = sum(e[i])
a.append(ai)
Q = Q + e[i][i]-ai**2
return Q, e, a, node_club
def fast_newman(node_list, List,n,divide_num):
adjacent_matrix=create_relation_matrix(List, n)
n = len(adjacent_matrix)
max_id = n
Z = []
# 初始划分,node_list是節點標號,node_club是社團標號的變換,club_list是社團標號
node_club = [0 for i in range(n)]
club_list = [i for i in range(n)]
step = 1
while len(list_unique(club_list)) != 1: # 計算滿足條件的個數
Q, e, a, node_club = get_modularity(node_list, node_club, club_list, adjacent_matrix)
k = len(e) # 社團數目
DeltaQs = []
DeltaQs_i = []
DeltaQs_j = []
for i in range(k):
for j in range(k):
if i != j:
DeltaQ = 2*(e[i][j]-a[i]*a[j])
DeltaQs.append(DeltaQ)
DeltaQs_i.append(i)
DeltaQs_j.append(j)
maxDeltaQ = max(DeltaQs) # 選擇最大Q值的社團進行合並
id_club = DeltaQs.index(maxDeltaQ)
i = DeltaQs_i[id_club]
j = DeltaQs_j[id_club]
max_id = max_id + 1
c_id1 = find_index(node_club, i) # 獲取社團i的標號
c_id2 = find_index(node_club, j) # 獲取社團j的標號
id1 = list_unique([club_list[item] for item in c_id1]) # 找到社團i的所有節點
id2 = list_unique([club_list[item] for item in c_id2]) # 找到社團j的所有節點
for item in c_id1:
club_list[item] = max_id
for item in c_id2:
club_list[item] = max_id
Z.append([id1, id2, len(c_id1+c_id2)])
step = step + 1
result_name = []
result_index = []
for item in list_unique(club_list):
tmp = find_index(club_list, item)
result_name.append([node_list[t] for t in tmp])
result_index.append(tmp)
if len(result_name) <= divide_num:
break
club_link=[]
for item in List:
if club_list[item[0]]!=club_list[item[1]]:
club_link.append(item)
return result_name,result_index,club_link
if __name__ == '__main__':
List = [[0, 1], [1, 2], [1, 3], [3, 4], [3, 5], [3, 6], [4, 5], [1, 5]]
node_list = ["node0", "node1", "node2", "node3", "node4", "node5", "node6"]
print (fast_newman(node_list, List, 7, 3))
以上。
