neo4j實現疾病知識圖譜實戰
1. neo4j安裝
linux下安裝,直接創建腳本Neo4j_setup.sh安裝腳本,執行安裝即可,安裝完成后,打開瀏覽器http:// localhost:7474,默認用戶名/密碼為neo4j/neo4j,首次登錄需要修改密碼
#!/bin/bash #neo4j 安裝 #1)設置hosts綁定 IP=`ifconfig|sed -n 2p|awk '{print $2}'|cut -d ":" -f2` echo "$IP neo4j" >>/etc/hosts #2)下載安裝neo4j cd /home/tools wget -c https://neo4j.com/artifact.php?name=neo4j-community-3.4.14-unix.tar.gz tar zxvf artifact.php\?name\=neo4j-community-3.4.14-unix.tar.gz -C /usr/local/ ln -s /usr/local/neo4j-community-3.4.14 /usr/local/neo4j-community #3)配置環境變量 cat >/etc/profile.d/neo4j <<EOF export NEO4J_HOME=/usr/local/neo4j export PATH=\$PATH:\$NEO4J_HOME/bin EOF source /etc/profile.d/neo4j #4) 配置資源 sed -i 's/#dbms.memory.heap.initial_size=512m/dbms.memory.heap.initial_size=2048m/g' /usr/local/neo4j-community/conf/neo4j.conf sed -i 's/#dbms.memory.heap.max_size=512m/dbms.memory.heap.max_size=2048m/g' /usr/local/neo4j-community/conf/neo4j.conf sed -i 's/#dbms.connectors.default_listen_address=0.0.0.0/dbms.connectors.default_listen_address=neo4j/g' /usr/local/neo4j-community/conf/neo4j.conf #5) 配置neo4j啟動腳本 cat >/etc/init.d/neo4j <<EOF #!/bin/bash ### BEGIN REDHAT INFO # chkconfig: 2345 99 20 # description: Neo4j Graph Database server SCRIPTNAME=\$0 NEO4J_CONF=/usr/local/neo4j-community/conf NEO4J_HOME=/usr/local/neo4j-community NEO_USER=root NEO4J_ULIMIT_NOFILE=60000 PATH=/sbin:/usr/sbin:/bin:/usr/bin NAME=neo4j DAEMON=\${NEO4J_HOME}/bin/\${NAME} PIDDIR=\${NEO4J_HOME}/run PIDFILE=\${PIDDIR}/neo4j.pid SCRIPTNAME=/etc/init.d/\${NAME} SYSTEMCTL_SKIP_REDIRECT=1 [ -x "\$DAEMON" ] || exit 0 #[ -r \${NEO4J_CONF}/\${NAME}.conf ] && . \${NEO4J_CONF}/\${NAME}.conf [ -n "\${NEO_USER}" ] || NEO_USER=\${NAME} # Debian distros and SUSE has_lsb_init() { test -f "/lib/lsb/init-functions" } # RedHat/Centos distros has_init() { test -f "/etc/init.d/functions" } if has_lsb_init ; then . /lib/lsb/init-functions elif has_init ; then . /etc/init.d/functions else echo "Error: your platform is not supported by \${NAME}" >&2 exit 1 fi do_start() { do_ulimit [ -d "\${PIDDIR}" ] || mkdir -p "\${PIDDIR}" chown "\${NEO_USER}:" "\${PIDDIR}" if has_lsb_init ; then start-stop-daemon --chuid \${NEO_USER} --start --quiet --oknodo --pidfile \${PIDFILE} --exec \${DAEMON} -- start else daemon --user="\${NEO_USER}" --pidfile="\${PIDFILE}" "\${DAEMON} start > /dev/null 2>&1 &" fi } do_stop() { \${DAEMON} stop } do_status() { if has_lsb_init ; then status_of_proc -p "\${PIDFILE}" "\${DAEMON}" "\${NAME}" else status -p "\${PIDFILE}" "\${NAME}" fi } do_ulimit() { if [ -n "\${NEO4J_ULIMIT_NOFILE}" ]; then ulimit -n "\${NEO4J_ULIMIT_NOFILE}" fi } case "\$1" in start) do_start ;; stop) do_stop ;; status) do_status ;; restart|force-reload) do_stop && do_start ;; *) echo "Usage: \$SCRIPTNAME {start|stop|status|restart|force-reload}" >&2 exit 3 ;; esac EOF #6) 設置權限 chmod +x /etc/init.d/neo4j #7) 啟動neo4j service neo4j start #8) 配置開機自啟動 chkconfig neo4j on echo 'Neo4j install done'
2. neo4j圖數據庫簡介
Neo4j是一款是由java語言實現的圖數據庫,圖形數據庫將數據以圖的數據結構進行存儲和管理,並且能夠以高度可問的方式優雅地表示任何種類的數據,而Neo4j是基於屬性圖模型(Property Graph Model)的數據庫
在屬性圖中存在如下元素:
1、 實體(Entity)
a) 節點(Node)
b) 關系(Relationship)
2、 邊/路徑(Path)
3、 記號(Token)
a) 標簽(Label)
b) 關系類型(Relationship Type)
c) 屬性key(Property Key)
4、 屬性(Property)
參考https://www.cnblogs.com/jpfss/p/11268835.html
3. neo4j基本語法
3.1 Cypher
neo4j的查詢語言為Cypher,是一個描述性的圖形查詢語言
說明:()內代表節點,[]代表關系,->關系方向,{}代表屬性,:后面跟記號如節點的標簽、關系的類型
節點:(Variable:Lable{Key1:Value1,Key2,Value2,...})
關系:[Variable:RelationshipType{Key1:Value1,Key2:Value2,...}]
3.2 語法
l 新節點、新關系、無屬性
create ()-[]->()
l 新節點、新關系、有屬性
create (:{})-[:{}]->(:{})
l 已有節點、新關系、無屬性
MATCH (:),(:) create ()-[:]->()
先用match找到兩個節點,再給節點添加關系(如果不用match,則會新建節點)
另外,同時執行時(一個分號內),前面節點會在新建關系時被識別(不用match),否則,會認為是新的節點
新節點可以與已有節點名稱、標簽、屬性都相同(如同年同月同日生同名同性別的人),但是會自動生成唯一標識id以區分
l merge
merge(:{})
可以看成是match和create的合體,找不到則創建節點,找到則更新節點
l 同時匹配兩標簽
match (n) where any(label in labels(n) WHERE label in ['label1', 'label2']) return n
4. 實戰應用
4.1 診斷歸一知識圖譜
create (disease1:頂級節點:diagnosis{name:'疾病名稱'}) create (disease2:頂級節點:diagnosis{name:'呼吸系統疾病名稱'}) create (disease2)-[:belong_to]->(disease1) create (standard01:標准詞:diagnosis{name:'間質性肺疾病'}) create (standard02:標准詞:diagnosis{name:'矽肺'}) create (standard1:標准詞:diagnosis{name:'矽肺[硅肺]壹期'}) create (standard2:標准詞:diagnosis{name:'矽肺[硅肺]貳期'}) create (standard3:標准詞:diagnosis{name:'矽肺[硅肺]叄期'}) create (standard01)-[:belong_to]->(disease2) create (standard02)-[:belong_to]->(standard01) create (standard1)-[:belong_to]->(standard02) create (standard2)-[:belong_to]->(standard02) create (standard3)-[:belong_to]->(standard02) create (origin1:原始詞:diagnosis{name:'硅肺'}) create (origin2:原始詞:diagnosis{name:'硅沉着肺'}) create (origin3:原始詞:diagnosis{name:'矽肺[硅沉着病]'}) create (origin4:原始詞:diagnosis{name:'矽肺(硅沉着病)'}) create (origin5:原始詞:diagnosis{name:'矽肺(硅肺)'}) create (origin6:原始詞:diagnosis{name:'矽肺Ⅰ期'}) create (origin7:原始詞:diagnosis{name:'矽肺(硅肺)I期'}) create (origin8:原始詞:diagnosis{name:'矽肺(I期)'}) create (origin9:原始詞:diagnosis{name:'矽肺(II期)'}) create (origin10:原始詞:diagnosis{name:'矽肺(硅肺)Ⅱ期'}) create (origin11:原始詞:diagnosis{name:'矽肺(硅肺)Ⅲ期'}) create (origin1)-[:standardized]->(standard02) create (origin2)-[:standardized]->(standard02) create (origin3)-[:standardized]->(standard02) create (origin4)-[:standardized]->(standard02) create (origin5)-[:standardized]->(standard02) create (origin6)-[:standardized]->(standard1) create (origin7)-[:standardized]->(standard1) create (origin8)-[:standardized]->(standard1) create (origin9)-[:standardized]->(standard2) create (origin10)-[:standardized]->(standard2) create (origin11)-[:standardized]->(standard3)
4.2 圖形效果
5. Python實現輸入與查詢
5.1 Python環境
Anaconda官網下載安裝即可,Anaconda包含了conda、Python在內的超過180個科學包及其依賴項,內置spyder、jupyter調試工具
5.2 讀取csv/excel
# -*- coding: utf-8 -*- """ Created on Wed Sep 30 10:29:49 2020 @author:Quentin """ from py2neo import Graph, Node, Relationship,NodeMatcher import pandas as pd import re import os import sys class CreateGraph: def __init__(self,csv_name): #當前目錄 cur_dir = '/'.join(os.path.abspath('__file__').split('/')[:-1]) self.data_path = os.path.join(cur_dir, csv_name) self.graph = Graph("http://192.168.31.240:7474", username="neo4j", password="123456") def read_file(self): all_data = pd.read_csv(self.data_path, encoding='utf-8').loc[:, :].values return all_data def create_graph(self): all_data = self.read_file() top_node = 'undefined' matcher = NodeMatcher(self.graph) if (all_data[0][1] == '頂級節點'): top_node = all_data[0][0] #創建節點 for row_data in all_data: #判斷node是否存在 node_match = matcher.match(row_data[1],name = row_data[0],topNode = top_node).first() if node_match is None: node = Node(row_data[1],name = row_data[0],topNode = top_node) self.graph.create(node) print('創建新節點:' + str(node).encode('utf-8').decode('unicode_escape')) #創建關系 for row_data in all_data: if len(str(row_data[2])) > 0 and str(row_data[2]) != 'nan': node1 = matcher.match(row_data[1],name = row_data[0],topNode = top_node).first() node2 = self.node_std_or_top(matcher,row_data[2],top_node) if node1 is not None and node2 is not None: if str(row_data[1]) == '原始詞': relation = Relationship(node1,'standard',node2) else: relation = Relationship(node1,'belong_to',node2) self.graph.create(relation) print('創建關系:' + str(relation)) def node_std_or_top(self,matcher,name,topNode): node = matcher.match('標准詞',name = name,topNode = topNode).first() if node is None : node = matcher.match('頂級節點',name = name,topNode = topNode).first() return node if __name__ == "__main__": str_csv = sys.argv[1] handler = CreateGraph(str_csv) handler.create_graph()
5.3 詞表查找
# -*- coding: utf-8 -*- """ Created on Wed Sep 30 10:29:49 2020 @author: Quentin """ from py2neo import Graph, Node, Relationship,NodeMatcher import pandas as pd import re import os import sys class SelectStandard: def __init__(self): self.graph = Graph("http://192.168.31.240:7474", username="neo4j", password="123456") #查詢上級詞(標准詞) def select_upper_vocab(self,orig,top_node='',label='原始詞'): """ 查找輸入詞的上級節點 Parameters ---------- orig : 輸入詞 原始詞、標准詞都可 top_node : 頂級節點, optional The default is ''. label : label類型,原始詞、標准詞, optional The default is '原始詞'. Returns ------- 返回上級節點,字符型 """ if top_node == '': query = "match(n:%s)-[r]->(m) where n.name = '%s' return m.name" %(label,orig) result = self.graph.run(query).to_ndarray() else: query = "match(n:%s)-[r]->(m) where n.name = '%s' and n.topNode = '%s' return m.name" %(label,orig,top_node) result = self.graph.run(query).to_ndarray() if len(result) > 0: return result[0][0] else: return '無' #查詢同級詞(原始詞) def select_equal_vocab(self,orig,top_node='',label='原始詞'): """ 查找輸入詞的同級節點 Parameters ---------- orig : 輸入詞 原始詞、標准詞都可 top_node : 頂級節點, optional The default is ''. label : label類型,原始詞、標准詞, optional The default is '原始詞'. Returns ------- 返回同級節點,數組 """ if top_node == '': query = "match(n1:%s)-[r1]->(m1) where n1.name ='%s' match(n2:%s)-[r2]->(m1) where n2.name <> '%s' return n2.name" %(label,orig,label,orig) result = self.graph.run(query).to_ndarray() else: query = "match(n1:%s)-[r1]->(m1) where n1.name ='%s' and n1.topNode = '%s' match(n2:%s)-[r2]->(m1) where n2.name <> '%s' and n2.topNode = '%s' return n2.name" %(label,orig,top_node,label,orig,top_node) result = self.graph.run(query).to_ndarray() if len(result) > 0: rs_arr = [] for rs in result: rs_arr.append(rs[0]) return rs_arr else: return '無' if __name__ == "__main__": #輸入參數 str_vocab = sys.argv upper_vocab_param = ['','','原始詞'] equal_vocab_param = ['','','原始詞'] for i in range(0,min(3,len(str_vocab)-1)): upper_vocab_param[i] = str_vocab[i+1] equal_vocab_param[i] = str_vocab[i+1] #輸出結果 handler = SelectStandard() upper_vocab = handler.select_upper_vocab(upper_vocab_param[0],upper_vocab_param[1],upper_vocab_param[2]) print("標准詞:") print(upper_vocab) equal_vocab = handler.select_equal_vocab(equal_vocab_param[0],equal_vocab_param[1],equal_vocab_param[2]) print("同義詞:") print(equal_vocab)
5.4 輸出結果