SNA -- 基本的一些圖論知識

本文轉載自查看原文 2020-07-09 14:43 627 SNA and GIS

<< <乾>:元，亨，利，貞。>>>

初九：潛龍，勿用。

九二：見龍在田，利見大人。

九三：君子終日乾乾，夕惕若厲，無咎。

九四：或躍在淵，無咎。

九五：飛龍在天，利見大人。

上九：亢龍，有悔。

用九：見群龍無首，吉。

1. Graph concepts

One interactive website: https://d3gt.com/unit.html

Graphs

a graph G = (V,E), consisting of a finite nonempty set V of vertices or nodes, and a set E ⊆ V × V of edges consisting of unordered pairs of vertices.
weighted graph 里對於每一條邊 (v_i, v_i) ∈ E 都有一個相對應的 weight w_ij
(v_i, v_i) : loop, 一個沒有loop的無向圖叫 simple graph.
(v_i, v_j) : 這兩個node稱作 neighbors，且互為adjacent. 在有向圖中，這個有向的edge 可被稱為 arc，v_i 為tail， v_j為head
|V| = n, the number of nodes in G, 也叫做 order of the graph.
|E| = m, the number of edges in G, 也叫做 size of the graph. 比如下圖中 order = 6， size = 5

比如這里

Subgraphs

A subgraph G^' of a graph G is a graph G^' whose vertex set and edge set are subsets of those of G. If G^' is a subgraph of G, then G is said to be a supergraph of G^' (Harary 1994, p. 11).
A (sub)graph is called complete (or a clique) if there exists an edge between all pairs of nodes.

Degree

Degree of vertice 就是有多少個連着它，denoted by deg (v), minimum degree 用 δ(G), maximum 用 ∆(G). 為了避免混淆，前面那個叫small delta, 后面是 big delta. 注意分清哪個是G哪個是V的屬性
v_i ∈ V , denoted as d(v_i) or just d_i
Degree sequence 就是這些nodes的degrees的一個列表，如下圖，Degree Sequence = (5,4,4,4,4,4,1)
for directed graphs, indegree 寫成 id(v_i), 就是以該點為head的連線數量，outdegree od(v_i), the number of outgoing edges from v_i

Path and Distance

A walk in a graph G between nodes x and y is an ordered sequence of vertices, starting at x and ending at y.
The length of the walk, t, is measured in terms of hops – the number of edges along the walk
A trail 不重復edges
A path 不重復vertices （除首尾巴）
A cycle 一個closed trail ≥ 3，開頭結尾都是同一個點且nodes不重復
The distance between two vertices in a graph is the number of edges in a shortest or minimal path.

Connectedness

if there exists a path between them 就叫connected了

一個栗子：

The degree sequence of the graph is (4,4,4,3,2,2,2,1)， and therefore its degree frequency distribution is given as (N0,N1,N2,N3,N4) = (0,1,3,1,3)，
The degree distribution is given as （f (0),f (1),f (2),f (3),f (4)）= (0,0.125,0.375,0.125,0.375）
對於（b）圖， The indegree of v₇ is id(v₇) = 2, whereas its outdegree is od(v₇) = 0.

Adjacency matrix

A graph G = (V,E), with |V| = n vertices, can be conveniently represented in the form of an n × n, symmetric binary adjacency matrix, A
當有向時，這個矩陣不對稱

Graphs from data matrix

這里講的是把data轉變為需要的graph形式，也就是找一個weighted graph來放下有着n個點的d-維空間的dataset 然后通過一些similarity的distance的算法來映射，把這個matrix轉變成一個binary

2. Topological attributes

只能應用在單個點或邊的attributes是local 能用在整個圖叫 global

Degree （local）

Average degree：

Average path length (also called characteristic path length)

For a connected graph:

Eccentricity (local)

defined as the maximum distance of one vertex from other vertex. denoted by e(V)

Radius and diameter

對於disconnected graph，看的都是all the connected components
radius, r(G):
Diameter, d(G):
當然這個d對一些異常值挺敏感，所以引入 effective diameter，也就是設定一個minimum number of hops 讓這個范圍內的所有連接點都可以任意鏈接到, say:
這個圖里 94%的pairs of nodes 落在了7步以內，所以可以說 effective diameter 是 7.

Clustering coefficient

對於vi及neighbors形成的一個subgraph Gi來說，the clustering coefficient of vi is defined as:
The clustering coefficient of a graph G is simply the average clustering coefficient over all the nodes.

Efficiency

The efficiency for a pair of nodes vi and vj is defined as 1/d(vi,vj). 如果兩個點不相連，那么d無限大也就是efficiency為0，兩者間距離越小，越efficient
Efficiency for a graph G, is the average efficiency over all pairs of nodes, whether connected or not, given as:

舉個栗子，用上面那個圖，

求node v4，整個graph的clustering coefficient 以及 v4 的local efficiency
知一個點的cc和與它連接的neighbor們所產生的subgraph有關，即用subgraph中實際的邊數除以 maximum number邊數。而一個 graph 的cc 就只是簡單的對圖中每個點的 cc進行平均，

通過下圖：

可以得到 C(v4) = 2/6 = 0.33, C(G) = 1/8 * (1/2 +1/3 +1 + 1/3 + 1/3 + 0 + 0 +0 )= 0.3125

而local efficiency用上面的公式：

3. Centrality analysis

Centrality measures have typically been used as indicators of power, influence, popularity and prestige.

3.1 Basic centralities

Degree Centrality

也就是直接數有幾條邊啦

Eccentricity centrality

less eccentric, more central. 也就是看最大的distance（也就是length of shortest path) 是什么然后取倒數
center node: 當等於radius時；等於diameter時稱作periphery node （適合醫院選址問題）

Closeness Centrality

the reciprocal of farness
Uses the sum of all the distances to rank how central a node is.
smallest total distance, median node
For comparison purpose, we can standardize the closeness by dividing by the maximum possible value 1/(n − 1)
The more central a node is, the lower its total distance to all other nodes.

Betweenness centrality

brokers, bridges, bottlenecks
measures how many shortest paths between all pairs of vertices
首先計算某兩個點間 shortest paths的數量，然后計算通過given vertex的paths的數量，計算fraction （注意這里的jk選取時是不考慮i的）

計算betweenness的栗子（http://www2.unb.ca/~ddu/6634/Lecture_notes/Lecture_4_centrality_measure.pdf）：

再用之前那個graph計算各點的中心度：

3.2 Web centralities

web里很多主要指的是有向網

Prestige score (eigenvector centrality)

As a centrality, prestige is supposed to be a measure of the importance or rank of a node/ the influence of a node in a network
A high eigenvector score means that a node is connected to many nodes who themselves have high scores
也就是看誰給的最多或者誰收到的最多
簡而言之就是提取點點之間的關系矩陣轉換成一個等價最顯著的特征向量從而進行比較，具體的策略是：

舉個例子：給一個有5個node，也就是 5*5的關系矩陣求值 Starting with an initial prestige vector p0=(1,1,1,1,1)^T,
每次iterate后都用vector中得到的最大值進行 scale。每次iterate之后的 vector p 比上前一次的 vector 得到 λ，也就是特征值

經過多次iterations之后，λ會穩定在某一個值，如下圖：

我們再把它normalize成單位向量，就可以得到 dominant eigenvector，比較結果vector中哪個點的值更大，就可以說哪個更prestige一點

Random jumps

指的是random surfing中就算點與點之間沒有聯系但還是可能會從這里跳到那里去

Page rank

a method for computing the prestige or centrality of nodes in the context of Web search.
用了 random surfing 的假設，也就是人們會隨機點開這些links
The PageRank of a Web page is defined to be the probability of a random web surfer landing at that page.

Normalized Prestige

也就是考慮了random jumps 這個點跳到另一個點outdegree的幾率（多除了一個）For the random surfer matrix, the outdegree of each node is od(u) = n
so far, PageRank vector is essentially a normalized prestige vector.

舉個例子，依舊以上面那5個點的adjacency list為例，首先是把它normalize：

然后是對random jump normalize（normalized random jump adjacency matrix：

假設這個小概率α = 0.1，那么總的 normalized adjacency matrix 為 M= 0.9N+0.1Nr =

Hub and Authority Scores

這個概念的出現是為了解決web search 的 ranking 問題。又是也當作 Hyperlink Induced Topic Search (HITS) method
和pagerank不同的是引入了兩個 two flavors of importance: 含有所需要相關topic信息的 authority 以及提供了指引導向所需要authority信息的 hub （就比如某個大學ranking網站為 hub，你想知道的university 是authority）
The authority score of a page is analogous to PageRank or prestige, and it depends on how many “good” pages point to it. On the other hand, the hub score of a page is based on how many “good” pages it points to.
同樣的就可以對每一個網頁進行兩個score 的 weighting, 一個是 Authority score (a) ，一個是 hub score (h):

計算時像之前那個prestige的例子，先列出原關系矩陣，矩陣的轉置，然后start with 都是1 的 a 矩陣，相乘后再用最大值 scale，得到第一個iterate之后的 h vector，接着用轉置矩陣 × 這個 h vector 得到 a，再scale 得到第一個iterate之后的a矩陣 A^TA and AA^T，不斷重復

4. Graph models

常見的三個property

Small-world Property

average path length μL ∝ logn， n is the number of nodes in the graph

Scale-free Property

empirical degree distribution f (k) exhibits a scale-free behavior captured by a power-law relationship with k， f (k) ∝ k^−γ

Clustering Effect

Erd¨os–R´enyi Random Graph Model

generates a random graph such that any of the possible graphs with a fixed number of nodes and edges has equal probability of being chosen.

Watts–Strogatz Small-world Graph Model

Such a network will have a high clustering coefficient, but will not be small-world.

Barab´asi–Albert Scale-free Model

可以看看他寫的那本網絡科學的書噢

這一part就先到這里具體的一些theory還有model放在其他的自我梳理環節

Some references

Emirbayer/Goodwin (1994): Network Analysis, Culture, and the Problem of Agency Identifies three social network paradigms: structural determinism, structural instrumentalism, and structural constructionism
Freeman (2004): The Development of Social Network Analysis: A Study in the Sociology of Science
The SAGE Handbook of Social Network Analysis 少不了這本經典的大部頭啦
Link analysis pagerank hub authority之類的算法

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 圖論中的一些名詞的定義。關於set的一些知識一些樂理知識關於UIImage的一些知識前端知識的一些總結消息隊列的一些知識 PHP 的一些底層知識一些java的基礎知識項目管理的一些知識關於SVM的一些知識點