Django-DRF中使用Elasticsearch ,使用IK分詞


一.安裝依賴

django-haystack==2.8.1
drf-haystack==1.8.6
Django==2.0.5
djangrestframework==3.8.2
elasticsearch==6.4.0

二.安裝JAVA SDK

先到官網下載安裝包:

下載鏈接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

因為我裝的Elasticsearch的版本是2.4.1,安裝的JDK==1.8,ES 2.x后的版本使用haystack會有不兼容問題.

安裝步驟:

# 首先:
cd /usr/local/
mkdir javajdk
# 將下載的文件上傳到:
/usr/local/javajdk
# 將文件解壓到此文件夾
tar -xzvf jdk-8u231-linux-i586.tar.gz 
mv jdk1.8.0_231 java
# 配置環境變量:
vim /etc/profile

# 在文件最后添加這幾行:

export JAVA_HOME=/usr/local/javajdk/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

 # 然后

 source /etc/profile

出現下面的提示則代表安裝成功:

三.安裝Elasticsearch

下載地址:https://www.elastic.co/cn/downloads/past-releases#elasticsearch

要注意的是Elasticsearch在root用戶下啟動是會報錯的!

首先要新建用戶:

useradd -g elastic elastic
# 在/home新建用戶目錄
mkdir elastic
# 將下載的安裝包上傳到 elastic 目錄下
tar -xzvf elasticsearch-2.4.1.tar.gz -C /home/elastic/
# 給此目錄授權
chown -R elastic:elastic elastic
# 切換用戶
su - elastic
# 修改配置文件:
vim /home/elastic/elasticsearch-2.4.1/config/elasticsearch.yml
# 修改內容
path.data: /home/elastic/elasticsearch-2.4.1/data path.logs: /home/elastic/elasticsearch-2.4.1/logs network.host: 172.xxx.xxx.xxx http.cors.allow-origin: "*"
# 如果沒有data與logs在相關目錄下建立

# 啟動ES,在elasticsearch的bin目錄下:
./elasticsearch

如果在瀏覽器中看到上面的內容,則表示安裝成功!

如果出錯解決方法:

1.最大文件描述符太少了,至少要65536,修改/etc/security/limits.conf文件
命令:vim /etc/security/limits.conf
內容修改為:* hard nofile 65536

2.一個進程可以擁有的VMA(虛擬內存區域)的數量太少了,至少要262144,修改文件  
命令:vim /etc/sysctl.conf
增加內容為:vm.max_map_count=262144

3.最大線程太少了,至少要4096,修改/etc/security/limits.conf文件
命令:vim /etc/security/limits.conf
增加內容為:* hard nproc 65536

四.安裝IK分詞插件

下載安裝包:

下載地址:https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v5.0.0

所選版本應於ES版本對應:

 

ES 2.4.1 對應 IK 版本是 1.10.1

將安裝包解壓到es的安裝目錄/plugin/ik

如果/plugin下面沒有ik目錄需要自己手動創建

五.可視化插件安裝。

1.插件安裝方式(推薦)
#在Elasticsearch目錄下
elasticsearch/bin/plugin install mobz/elasticsearch-head

2.下載安裝方式
從https://github.com/mobz/elasticsearch-head下載ZIP包。

在 elasticsearch  目錄下創建目錄/plugins/head/_site 並且將剛剛解壓的elasticsearch-head-master目錄下所有內容COPY到當前創建的/plugins/head/_site/目錄下即可。

需要注意的是在5.xx后的版本,安裝方法與這個不一樣!

3.重啟elasticsearch訪問:
 訪問地址是http://{你的ip地址}:9200/_plugin/head/
 http  端口默認是9200  

 

六.集群搭建

Elasticsearch集群搭建:

  1. 准備三台elasticsearch服務器

    創建elasticsearch-cluster文件夾,在內部復制三個elasticsearch服務

  2. 修改每台服務器配置

    修改elasticsearch-cluster\node*\config\elasticsearch.yml

如果在現有單機版本的基礎上節點進行復制,需要注意的是,在當前節點的安裝目錄/elasticsearch/data中不能有數據,否則搭建集群會失敗.需要刪除data目錄

# 節點1的配置信息
# 集群名稱,保證唯一
cluster.name:my-elasticsearch
# 節點名稱,必須不一樣
node.name:node-1
# 必須為本機的ip地址
network.host:172.xxx.xxx.xxx
# 服務器端口號,在同一機器下必須不一樣
http:port:9200
# 集群間通信端口號,在同一機器下必須不一樣
transport.tcp.port:9300
# 設置集群自動發現機器ip集合
discovery.zen.ping.unicast.host:["172.xxx.xxx.xxx:9300",'172.xxx.xxx.xxx:9301',"172.xxx.xxx.xxx:9303"]

 將服務啟動即可

七.在Django中配置

首先要在app中創建一個 search_indexes.py 文件這是這django-haystack規定的 

django-haystack:文檔地址:https://django-haystack.readthedocs.io/en/master/tutorial.html#configuration

drf-haystack:文檔地址:https://drf-haystack.readthedocs.io/en/latest/07_faceting.html#serializing-faceted-results

創建模型類:

from django.db import models

class Article(models.Model):
    title = models.CharField(max_length=128)
    files = models.FileField(upload_to='%Y/%m/')
    content = models.TextField(default='')

創建索引類:

from haystack import indexes
from app001.models import Article

class DocsIndex(indexes.SearchIndex, indexes.Indexable):
    # 1.構建的索引字段
    text = indexes.CharField(document=True, use_template=True)
    files = indexes.CharField(model_attr='files')
    content = indexes.CharField(model_attr='content')

    # 2.指定模型類
    def get_model(self):
        return Article

    # 3.提供數據集
    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.all()

view視圖:

mport os
import datetime
import uuid

from rest_framework.views import APIView
from rest_framework import serializers
from rest_framework.response import Response
from django.conf import settings
from drf_haystack.serializers import HaystackSerializer
from drf_haystack.viewsets import HaystackViewSet

from .models import Article
from .search_indexes import DocsIndex


class DemoSerializer(serializers.ModelSerializer):
    """
    序列化器
    """
    class Meta:
        model = Article
        fields = ('id', 'title','files')



class LocationSerializer(HaystackSerializer):
    object = DemoSerializer(read_only=True)  # 只讀,不可以進行反序列化

    class Meta:
        # The `index_classes` attribute is a list of which search indexes
        # we want to include in the search.
        index_classes = [DocsIndex]

        # The `fields` contains all the fields we want to include.
        # NOTE: Make sure you don't confuse these with model attributes. These
        # fields belong to the search index!
        fields = [
             "text","files","id","title"
        ]
 
class LocationSearchView(HaystackViewSet):

    # `index_models` is an optional list of which models you would like to include
    # in the search result. You might have several models indexed, and this provides
    # a way to filter out those of no interest for this particular view.
    # (Translates to `SearchQuerySet().models(*index_models)` behind the scenes.
    index_models = [Article]

    serializer_class = LocationSerializer

setting配置:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'silk',
    'debug_toolbar',
    'haystack',
    'app001',
]


# 搜索引擎配置:
# haystack配置
HAYSTACK_CONNECTIONS = {
'default': {
# 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'ENGINE': 'app001.elasticsearch_ik_backend.IKSearchEngine', # 如果配置分詞需要重新制定引擎,下面會寫到
'URL': 'http://172.16.xxx.xxx:9200/',   # elasticseach 服務地址
'INDEX_NAME': 'haystack', # 索引名稱
},
}
# 保持索引都是最新的
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
# 搜索顯示的最多條數
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 50

 

重寫ik分詞配置引擎:

在app中建立 elasticsearch_ik_backend.py 文件:

from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
class IKSearchBackend(ElasticsearchSearchBackend):
    DEFAULT_ANALYZER = "ik_max_word" # 這里將 es 的 默認 analyzer 設置為 ik_max_word

    def __init__(self, connection_alias, **connection_options):
        super().__init__(connection_alias, **connection_options)

    def build_schema(self, fields):
        content_field_name, mapping = super(IKSearchBackend, self).build_schema(fields)
        for field_name, field_class in fields.items():
            field_mapping = mapping[field_class.index_fieldname]
            if field_mapping["type"] == "string" and field_class.indexed:
                if not hasattr(
                    field_class, "facet_for"
                ) and not field_class.field_type in ("ngram", "edge_ngram"):
                    field_mapping["analyzer"] = getattr(
                        field_class, "analyzer", self.DEFAULT_ANALYZER
                    )
            mapping.update({field_class.index_fieldname: field_mapping})
        return content_field_name, mapping


class IKSearchEngine(ElasticsearchSearchEngine):
    backend = IKSearchBackend

 

在django中使用drf-haystack對查詢還不是很全:

在這我使用python 的 elasticsearch 進行查詢:def-haystack的查詢我覺得並不是很好用:

class EsSearch(APIView):
    def get(self,request):
        es = Elasticsearch(["http://xxx.xxx.xxx.xxx:9200"])
        query = request.GET.get("query")
     # 這里面的搜索方式可以定制你自己想要用的查詢:
      
     # https://www.elastic.co/guide/cn/elasticsearch/guide/current/match-query.html
body = { "query":{ "multi_match": { "query": "%s" % query, "fields": [ "text", "content" ] } }, "highlight":{ "fields":{ "content":{}, "text":{} } } } result = es.search(index="haystack", doc_type="modelresult", body=body) return Response(result)

 

url配置:

"""tool_bar URL Configuration

The `urlpatterns` list routes URLs to views. For more information please see:
    https://docs.djangoproject.com/en/2.0/topics/http/urls/
Examples:
Function views
    1. Add an import:  from my_app import views
    2. Add a URL to urlpatterns:  path('', views.home, name='home')
Class-based views
    1. Add an import:  from other_app.views import Home
    2. Add a URL to urlpatterns:  path('', Home.as_view(), name='home')
Including another URLconf
    1. Import the include() function: from django.urls import include, path
    2. Add a URL to urlpatterns:  path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path
from django.conf import settings
from django.conf.urls import url,include
from django.conf.urls.static import static
from django.conf import settings


from app001.views import Index,Uploads
from rest_framework import routers

from app001.views import LocationSearchView,EsSearch
from app002.views import BlogView

# drf-haystack查詢
router = routers.DefaultRouter()
router.register("search", LocationSearchView,base_name="location-search")

urlpatterns = [
    # 使用自定義查詢
    url(r'elastic_search/',EsSearch.as_view()),
  
url(r"api/", include(router.urls)),
] 

 

查詢展示:


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM