neo4j實現疾病知識圖譜實戰


neo4j實現疾病知識圖譜實戰

1. neo4j安裝

linux下安裝,直接創建腳本Neo4j_setup.sh安裝腳本,執行安裝即可,安裝完成后,打開瀏覽器http:// localhost:7474,默認用戶名/密碼為neo4j/neo4j,首次登錄需要修改密碼

#!/bin/bash

#neo4j 安裝

#1)設置hosts綁定

IP=`ifconfig|sed -n 2p|awk '{print $2}'|cut -d ":" -f2`

echo "$IP neo4j" >>/etc/hosts

#2)下載安裝neo4j

cd /home/tools

wget -c https://neo4j.com/artifact.php?name=neo4j-community-3.4.14-unix.tar.gz

tar zxvf artifact.php\?name\=neo4j-community-3.4.14-unix.tar.gz -C /usr/local/

ln -s /usr/local/neo4j-community-3.4.14 /usr/local/neo4j-community

#3)配置環境變量

cat >/etc/profile.d/neo4j <<EOF

export NEO4J_HOME=/usr/local/neo4j

export PATH=\$PATH:\$NEO4J_HOME/bin

EOF

source /etc/profile.d/neo4j

#4) 配置資源

sed -i 's/#dbms.memory.heap.initial_size=512m/dbms.memory.heap.initial_size=2048m/g' /usr/local/neo4j-community/conf/neo4j.conf

sed -i 's/#dbms.memory.heap.max_size=512m/dbms.memory.heap.max_size=2048m/g' /usr/local/neo4j-community/conf/neo4j.conf

sed -i 's/#dbms.connectors.default_listen_address=0.0.0.0/dbms.connectors.default_listen_address=neo4j/g' /usr/local/neo4j-community/conf/neo4j.conf

#5) 配置neo4j啟動腳本

cat >/etc/init.d/neo4j <<EOF

#!/bin/bash

### BEGIN REDHAT INFO

# chkconfig: 2345 99 20

# description: Neo4j Graph Database server

SCRIPTNAME=\$0

NEO4J_CONF=/usr/local/neo4j-community/conf

NEO4J_HOME=/usr/local/neo4j-community

NEO_USER=root

NEO4J_ULIMIT_NOFILE=60000

PATH=/sbin:/usr/sbin:/bin:/usr/bin

NAME=neo4j

DAEMON=\${NEO4J_HOME}/bin/\${NAME}

PIDDIR=\${NEO4J_HOME}/run

PIDFILE=\${PIDDIR}/neo4j.pid

SCRIPTNAME=/etc/init.d/\${NAME}

SYSTEMCTL_SKIP_REDIRECT=1

[ -x "\$DAEMON" ] || exit 0

#[ -r \${NEO4J_CONF}/\${NAME}.conf ] && . \${NEO4J_CONF}/\${NAME}.conf

[ -n "\${NEO_USER}" ] || NEO_USER=\${NAME}

# Debian distros and SUSE

has_lsb_init()

{

  test -f "/lib/lsb/init-functions"

}

# RedHat/Centos distros

has_init()

{

  test -f "/etc/init.d/functions"

}

if has_lsb_init ; then

  . /lib/lsb/init-functions

elif has_init ; then

  . /etc/init.d/functions

else

  echo "Error: your platform is not supported by \${NAME}" >&2

  exit 1

fi

do_start()

{

  do_ulimit

  [ -d "\${PIDDIR}" ] || mkdir -p "\${PIDDIR}"

  chown "\${NEO_USER}:" "\${PIDDIR}"

  if has_lsb_init ; then

    start-stop-daemon --chuid \${NEO_USER} --start --quiet --oknodo --pidfile \${PIDFILE} --exec \${DAEMON} -- start

  else

    daemon --user="\${NEO_USER}" --pidfile="\${PIDFILE}" "\${DAEMON} start > /dev/null 2>&1 &"

  fi

}

do_stop()

{

  \${DAEMON} stop

}

do_status()

{

  if has_lsb_init ; then

    status_of_proc -p "\${PIDFILE}" "\${DAEMON}" "\${NAME}"

  else

    status -p "\${PIDFILE}" "\${NAME}"

  fi

}

do_ulimit()

{

  if [ -n "\${NEO4J_ULIMIT_NOFILE}" ]; then

    ulimit -n "\${NEO4J_ULIMIT_NOFILE}"

  fi

}

case "\$1" in

  start)

    do_start

    ;;

  stop)                                                         

    do_stop

    ;;

  status)

    do_status

    ;;

  restart|force-reload)

    do_stop && do_start

    ;;

  *)

    echo "Usage: \$SCRIPTNAME {start|stop|status|restart|force-reload}" >&2

    exit 3

    ;;

esac

EOF

#6) 設置權限

chmod +x /etc/init.d/neo4j

#7) 啟動neo4j

service neo4j start

#8) 配置開機自啟動

chkconfig neo4j on

echo 'Neo4j install done'

 

2. neo4j圖數據庫簡介

Neo4j是一款是由java語言實現的圖數據庫,圖形數據庫將數據以圖的數據結構進行存儲和管理,並且能夠以高度可問的方式優雅地表示任何種類的數據,而Neo4j是基於屬性圖模型(Property Graph Model)的數據庫

在屬性圖中存在如下元素:

1、    實體(Entity)

  a) 節點(Node)

  b) 關系(Relationship)

2、    邊/路徑(Path)

3、    記號(Token)

  a) 標簽(Label)

  b) 關系類型(Relationship Type)

  c) 屬性key(Property Key)

4、    屬性(Property)

參考https://www.cnblogs.com/jpfss/p/11268835.html

 

3. neo4j基本語法

3.1 Cypher

neo4j的查詢語言為Cypher,是一個描述性的圖形查詢語言

說明:()內代表節點,[]代表關系,->關系方向,{}代表屬性,:后面跟記號如節點的標簽、關系的類型

節點:(Variable:Lable{Key1:Value1,Key2,Value2,...})

關系:[Variable:RelationshipType{Key1:Value1,Key2:Value2,...}]

3.2 語法

l  新節點、新關系、無屬性

create ()-[]->()

l  新節點、新關系、有屬性

create (:{})-[:{}]->(:{})

l  已有節點、新關系、無屬性

MATCH (:),(:) create ()-[:]->()

先用match找到兩個節點,再給節點添加關系(如果不用match,則會新建節點)

另外,同時執行時(一個分號內),前面節點會在新建關系時被識別(不用match),否則,會認為是新的節點

新節點可以與已有節點名稱、標簽、屬性都相同(如同年同月同日生同名同性別的人),但是會自動生成唯一標識id以區分

l  merge

merge(:{})

可以看成是match和create的合體,找不到則創建節點,找到則更新節點

l  同時匹配兩標簽

match (n) where any(label in labels(n) WHERE label in ['label1', 'label2']) return n

 

4. 實戰應用

4.1 診斷歸一知識圖譜

create (disease1:頂級節點:diagnosis{name:'疾病名稱'})

create (disease2:頂級節點:diagnosis{name:'呼吸系統疾病名稱'})

create (disease2)-[:belong_to]->(disease1)

 

create (standard01:標准詞:diagnosis{name:'間質性肺疾病'})

create (standard02:標准詞:diagnosis{name:'矽肺'})

create (standard1:標准詞:diagnosis{name:'矽肺[硅肺]壹期'})

create (standard2:標准詞:diagnosis{name:'矽肺[硅肺]貳期'})

create (standard3:標准詞:diagnosis{name:'矽肺[硅肺]叄期'})

create (standard01)-[:belong_to]->(disease2)

create (standard02)-[:belong_to]->(standard01)

create (standard1)-[:belong_to]->(standard02)

create (standard2)-[:belong_to]->(standard02)

create (standard3)-[:belong_to]->(standard02)

 

create (origin1:原始詞:diagnosis{name:'硅肺'})

create (origin2:原始詞:diagnosis{name:'硅沉着肺'})

create (origin3:原始詞:diagnosis{name:'矽肺[硅沉着病]'})

create (origin4:原始詞:diagnosis{name:'矽肺(硅沉着病)'})

create (origin5:原始詞:diagnosis{name:'矽肺(硅肺)'})

create (origin6:原始詞:diagnosis{name:'矽肺Ⅰ期'})

create (origin7:原始詞:diagnosis{name:'矽肺(硅肺)I期'})

create (origin8:原始詞:diagnosis{name:'矽肺(I期)'})

create (origin9:原始詞:diagnosis{name:'矽肺(II期)'})

create (origin10:原始詞:diagnosis{name:'矽肺(硅肺)Ⅱ期'})

create (origin11:原始詞:diagnosis{name:'矽肺(硅肺)Ⅲ期'})

create (origin1)-[:standardized]->(standard02)

create (origin2)-[:standardized]->(standard02)

create (origin3)-[:standardized]->(standard02)

create (origin4)-[:standardized]->(standard02)

create (origin5)-[:standardized]->(standard02)

create (origin6)-[:standardized]->(standard1)

create (origin7)-[:standardized]->(standard1)

create (origin8)-[:standardized]->(standard1)

create (origin9)-[:standardized]->(standard2)

create (origin10)-[:standardized]->(standard2)

create (origin11)-[:standardized]->(standard3)

4.2 圖形效果

 

 

 

5. Python實現輸入與查詢

5.1 Python環境

Anaconda官網下載安裝即可,Anaconda包含了conda、Python在內的超過180個科學包及其依賴項,內置spyder、jupyter調試工具

5.2 讀取csv/excel

# -*- coding: utf-8 -*-

"""

Created on Wed Sep 30 10:29:49 2020

@author:Quentin

"""

 

from py2neo import Graph, Node, Relationship,NodeMatcher

import pandas as pd

import re

import os

import sys

 

class CreateGraph:

    def __init__(self,csv_name):

        #當前目錄

        cur_dir = '/'.join(os.path.abspath('__file__').split('/')[:-1])

        self.data_path = os.path.join(cur_dir, csv_name)

        self.graph = Graph("http://192.168.31.240:7474", username="neo4j", password="123456")

                 

    def read_file(self):

        all_data = pd.read_csv(self.data_path, encoding='utf-8').loc[:, :].values

        return all_data    

           

    def create_graph(self):

        all_data = self.read_file()

        top_node = 'undefined'

        matcher = NodeMatcher(self.graph)

        if (all_data[0][1] == '頂級節點'):

            top_node = all_data[0][0]

           

        #創建節點

        for row_data in all_data:

            #判斷node是否存在

            node_match =  matcher.match(row_data[1],name = row_data[0],topNode = top_node).first()

            if node_match is None:

                node = Node(row_data[1],name = row_data[0],topNode = top_node)

                self.graph.create(node)

                print('創建新節點:' + str(node).encode('utf-8').decode('unicode_escape'))

                

        #創建關系

        for row_data in all_data:

            if len(str(row_data[2])) > 0 and str(row_data[2]) != 'nan':

                node1 = matcher.match(row_data[1],name = row_data[0],topNode = top_node).first()

                node2 = self.node_std_or_top(matcher,row_data[2],top_node)

                if  node1 is not None and node2 is not None:                   

                    if str(row_data[1]) == '原始詞':

                        relation = Relationship(node1,'standard',node2)

                    else:

                        relation = Relationship(node1,'belong_to',node2)

                    self.graph.create(relation)

                    print('創建關系:' + str(relation))

 

    def node_std_or_top(self,matcher,name,topNode):

        node = matcher.match('標准詞',name = name,topNode = topNode).first()

        if  node is  None :  

            node = matcher.match('頂級節點',name = name,topNode = topNode).first()

        return node

  

       

if __name__ == "__main__":

    str_csv = sys.argv[1]

    handler = CreateGraph(str_csv)

    handler.create_graph()

5.3 詞表查找

# -*- coding: utf-8 -*-

"""

Created on Wed Sep 30 10:29:49 2020

@author: Quentin

"""

 

from py2neo import Graph, Node, Relationship,NodeMatcher

import pandas as pd

import re

import os

import sys

 

class SelectStandard:

    def __init__(self):

        self.graph = Graph("http://192.168.31.240:7474", username="neo4j", password="123456")

         

    #查詢上級詞(標准詞)

    def select_upper_vocab(self,orig,top_node='',label='原始詞'):

        """   

        查找輸入詞的上級節點

        Parameters

        ----------

        orig : 輸入詞

             原始詞、標准詞都可

        top_node : 頂級節點, optional

             The default is ''.

        label : label類型,原始詞、標准詞, optional

             The default is '原始詞'.

            

        Returns

        -------

        返回上級節點,字符型

        """

        if top_node == '':

            query = "match(n:%s)-[r]->(m) where n.name = '%s' return m.name" %(label,orig)

            result = self.graph.run(query).to_ndarray()

        else:

            query = "match(n:%s)-[r]->(m) where n.name = '%s' and n.topNode = '%s' return m.name" %(label,orig,top_node)

            result = self.graph.run(query).to_ndarray()           

        if len(result) > 0:

            return result[0][0]

        else:

            return ''

   

    #查詢同級詞(原始詞)

    def select_equal_vocab(self,orig,top_node='',label='原始詞'):

        """

        查找輸入詞的同級節點

 

        Parameters

        ----------

        orig : 輸入詞

            原始詞、標准詞都可

        top_node : 頂級節點, optional

             The default is ''.

        label : label類型,原始詞、標准詞, optional

             The default is '原始詞'.

 

        Returns

        -------

        返回同級節點,數組

 

        """

        if top_node == '':

            query = "match(n1:%s)-[r1]->(m1) where n1.name ='%s' match(n2:%s)-[r2]->(m1) where n2.name <> '%s'  return n2.name" %(label,orig,label,orig)

            result = self.graph.run(query).to_ndarray()

        else:

            query = "match(n1:%s)-[r1]->(m1) where n1.name ='%s' and n1.topNode = '%s' match(n2:%s)-[r2]->(m1) where n2.name <> '%s' and n2.topNode = '%s' return n2.name" %(label,orig,top_node,label,orig,top_node)

            result = self.graph.run(query).to_ndarray()

        if len(result) > 0:

            rs_arr = []

            for rs in result:

                rs_arr.append(rs[0])

            return rs_arr

        else:

            return ''

   

if __name__ == "__main__":

    #輸入參數

    str_vocab = sys.argv

    upper_vocab_param = ['','','原始詞']

    equal_vocab_param = ['','','原始詞']

    for i in range(0,min(3,len(str_vocab)-1)):

        upper_vocab_param[i] = str_vocab[i+1]

        equal_vocab_param[i] = str_vocab[i+1]

    #輸出結果

    handler = SelectStandard()

    upper_vocab = handler.select_upper_vocab(upper_vocab_param[0],upper_vocab_param[1],upper_vocab_param[2])

    print("標准詞:")

    print(upper_vocab)

    equal_vocab = handler.select_equal_vocab(equal_vocab_param[0],equal_vocab_param[1],equal_vocab_param[2])

    print("同義詞:")

    print(equal_vocab)

   

 

5.4 輸出結果

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM