elasticsearch基本使用

本文轉載自查看原文 2018-12-14 17:55 642 數據分析/ django/ elasticsearch/ 數據庫

elasticsearch 是java對lucence的封裝，所以需要事先安裝java。

它適用於全文索引，便捷的分布式，主要原理就是倒排索引。一般搜索某個關鍵字，是通過在一篇篇文章中查找這個關鍵字，而elasticsearch是存儲的時候就將需要索引的內容進行分詞，形成多個標簽，查找時直接在標簽索引中查找匹配的標簽，再把標簽對應的文章顯示出來。來優化搜索效率。

安裝

由於java是跨平台的，所以elasticsearch也是跨平台的。在linux中，下載，解壓，運行即可

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.tar.gz

tar -xvf elasticsearch-6.3.2.tar.gz

cd elasticsearch-6.3.2/bin

./elasticsearch

在windows上，下載window對應的包， https://www.elastic.co/downloads/elasticsearch。解壓，運行 bin\elasticsearch.bat。雙擊elasticsearch.bat文件也可。

在瀏覽器中輸入localhost:9200/就可以看elasticsearch的版本/節點等信息。

elasticsearch 6.3.1 ,python與之對接的工具包為elasticsearch ,elasticsearch_dsl ,適用pip install即可。

elasticsearch的可視化插件Head.安裝https://www.cnblogs.com/hts-technology/p/8477258.html

需先安裝 node.js和grunt,

node下載適用windows的.msi即可https://nodejs.org/en/download/，已安裝的使用 node -v可查看node的版本。

在node命令行界面，使用 npm install -g grunt-cli 即可，使用grunt -version 查看版本號

修改elasticsearch的config/elasticsearch.yml文件：

在末尾添加

http.cors.enabled: true

http.cors.allow-origin: "*"

node.master: true

node.data: true

然后network.host:192.168.0.1的注釋，改為network.host:0.0.0.0 ;去掉cluster.name;node.name;http.port的注釋。

下載head包https://github.com/mobz/elasticsearch-head，可clone，也可下載zip.

然后修改elasticsearch-head-master文件中的Gruntfile.js,設置hostname:'*' :

connect:{

　　server:{

　　　　options:{

　　　　　　hostname:'*',

　　　　　　port:9100,

　　　　　　base:'.',

　　　　　　keepalive:true

　　　　}}}

然后在node命令行下切換到目錄elasticsearch-head-master ,安裝 head,使用命令 npm install ,完成后運行 grunt server啟動head. 在瀏覽器中輸入localhost:9100就可以看到界面了。

elasticsearch對於非java語言，可使用rest API來對數據進行增刪改查，設置。

對於elasticsearch中的數據結構有概念 index,type,id 可以大概理解為 index相當於database,type相當於table,id相當於索引。在新版中，一個index只允許包含一個type.

通過rest API常用操作的語法：

注，在windows上使用curl 需要添加一些額外的參數，url要用雙引號，添加數據時要添加參數聲明數據類型 -H "Content-Type:application/json" ,對於數據中的key:value都需要用三個雙引號引起來。請求方法前添加-X參數，另外key和value中似乎不能有空格。

查看當前節點的所有index

curl -X GET "http://localhost：9200/_cat/indices?v"

查看所有index中各個字段的類型：

curl 'localhost:9200/_mapping?pretty=true'

新建weather索引：

curl -X PUT "localhost:9200/weather"

刪除weather索引：

curl -X DELETE "localhost:9200/weather"

給index新增數據：

curl -X PUT "localhost:9200/my_index/my_type/my_id" -d '{"te":"test","ta":"data"}' put請求再針對已存在的記錄則是更新該記錄

也可使用POST新增，不指定id,隨機生成id : curl -X POST "localhost:9200/my_index/my_type" -d '{"te":"test","ta":"data"}'

查看指定記錄

curl -X GET "localhost:9200/my_index/my_type/1?pretty" pretty表示以易讀的格式返回數據，found字段表示查詢成功，_source字段返回原始記錄

搜索

curl "localhost:9200/my_index/my_type/_search?q=name:somekey" 搜索name中包含somekey的文檔

等價於 curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty' -d '{"query":{"match":{"name":"somekey"}}}'

更新products索引的字段類型 ,

方法一，將字段類型更新為多類型字段：（由於string類型的字段默認會被分詞，所以聚合和排序效率很低，默認是不允許的。如果需要對string類型的

的字段進行聚合和排序，雖然可以開啟一個參數，但是推薦將其設置為多類型字段，一個類型被分詞用於查詢，一個類型不分詞keyword類型用於排序和聚合）

curl -XPUT localhost:9200/my_index/my_type/_mapping -d

'{"my_type":{"properties":{

　　　　　　　　　　　　　　"created":{"type":"multi_filed","fileds":{

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　"created":{"type":"text"},"date":{"type":"date"}

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　}

這里為create字段設置了兩種類型，當需要搜索時使用created,需要排序時使用date

方法二，使用reindex,該方法是用於將一個索引中的數據（全集或子集），以索引2的格式復制到索引2中。（如果索引2不存在，則新建的索引2會是索引1的副本，如果存在，則以索引2的格式存儲原索引中的數據），復制完后刪除原索引。

products中原category的數據類型為text : {"mappings":{"doc":{"properties":{"category":{"type":"text"},...}}}}

創建調整字段類型后的索引 PUT localhost:9200/products_adjust -d

"mappings" : {
      "doc" : {
        "properties" : {

"category" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },

"img_url" : { "type" : "keyword" }, 
"name" : { "type" : "text", "analyzer" : "ik_smart" }
 } } }

然后使用reindex ,copy數據到索引products_adjust,

POST localhost:9200/_reindex -d {"source"{"index":products"},"dest":{"index":"products_adjust"}}

最后刪除原索引即可。

方法三。開始的時候就使用索引的別名，類似數據庫的視圖。后期需要更改數據結構，改變別名指向的索引即可。

創建索引別名

curl -XPOST localhost:9200/_aliases -d '{"actions":[{"add":{"alias":"my_index","index":"my_index_v1"}}]}'

更新別名指向的索引

curl -XPOST localhost:9200/_aliases -d '{"actions":[{"remove":{"alias":"my_index","index":"my_index_v1"}},,{"add":{"alias":"my_index","index":"my_index_v2"}}]}'

刪除舊索引

curl -XDELETE localhost:9200/my_index_v1

在django中使用elasticseach

elasticsearch_dsl的使用示例 https://github.com/elastic/elasticsearch-dsl-py/tree/master/examples

新建elasticsearch索引模型文件es_docs.py

from ealsticsearch_dsl import Document,Date,Long,Keyword,Float,Text,connections

class ESProduct(Document):

　　name=Text(analyzer="ik_smart",fields={'keyword':Keyword()})

　　description=Text()

　　price=Float()

　　category=Text(fields={'cate':Keyword()})

　　tags=Keyword(multi=True)

　　class Index:

　　　　name='products'

　　　　settings={

　　　　"number_of_shards":2,}

if__name__=='__main__':

　　connections.create_connection(hosts=['localhost'])

　　ESProduct.init()

ESProduct.init()的作用是在elasticsearch中創建索引

創建導入數據的命令在app目錄下創建 app\management\commands\index_all_data.py 用於將數據庫中數據導入elasticsearch；

import elasticsearch_dsl

from django.core.management import BaseCommand

from main.models import Product

from main.es_docs import ESProduct

class Command(BaseCommand):

　　help="Index all data to Elasticsearch"

　　def handle(self,*args,**options):

　　　　elasticsearch_dsl.connections.create_connection()

　　　　for product in Product.objects.all():

　　　　　　esp=ESProduct(meta={'id':product.pk},name=product.name,description=product.description,price=product.price,category=product.category.name)

　　　　　　for tag in product.tags.all():

　　　　　　　　esp.tags.append(tag.name)

　　　　　　esp.save()

這樣在項目根目錄執行 python manage.py index_all_data則會將數據庫中數據寫入到elasticsearch中。

from elasticsearch import Elasticsearch

from ealsticsearch_dsl import Search ,connections

創建連接時，可以使用

client=Elasticsearch()

s=Search(using=client,index="decorates")

也可以使用：

connections.create_connection(host=['localhost'])

s=Search(index="decorates")

在views.py中使用elasticsearch查詢；

import random 
from django.urls import reverse 
from django.shortcuts import render 
from django.views.generic import View 
from elasticsearch_dsl import Search ,connections ,Q
from main.forms import SearchForm
import logging
logger=logging.getLogger("django.main")
class HomeView(View):
	def get(self,request):
		form=SearchForm(request.GET)
		logger.debug("form: %s",form )
		ctx={
		"form":form 
		}
		if form.is_valid():
			connections.create_connection(hosts=["localhost"])
			name_query=form.cleaned_data["name"]
			if name_query:
				s=Search(index="products").query("match",name=name_query)
			else:
				s=Search(index="products")
			min_price=form.cleaned_data.get("min_price")
			max_price=form.cleaned_data.get("max_price")
			if min_price is not None or max_price is not None:
				price_q={'range':{"price":{}}}
				if min_price is not None:
					price_q['range']['price']["gte"]=min_price 
				if max_price is not None:
					price_q['range']['price']["lte"]=max_price 
				s=s.query(Q(price_q))
				#Q語法就類似於原生的elasticsearch dsl的json語句
				#A（）用於聚合，a=A('terms',field='category.keyword') 等同於 {'term':{'field':'category.keyword'}} ,s.aggs.bucket('category_terms',a)
				#還可以在a上作用 metric,或再次聚合 ：a.metric('clicks_per_category','sum',field='clicks').bucket('tags_per_category','terms',field='tags')
				#等價於 {'agg':{'categories':{"terms":{"field":"category.keyword"},'aggs':{
				#'clicks_per_category':{'sum':{'field':'clicks'}},
				#'tags_per_category':{'terms':{'field':'tags;'}}
				#}}
			#添加分組（聚合）字段，aggregations,field應該是用於分組的字段（前面的“categories”，是聚合字段的別名，后面通過這個別名獲取聚合的結果）
			s.aggs.bucket("categories","terms",field="category.keyword")
			#聚合的第一個參數為聚合值字段名（自定義），第二個參數值為聚合方法，第三個參數為聚合方法作用的字段
			#terms應該是計數，其他的聚合方法有 avg(數值類字段的平均值)
			if request.GET.get("category"):
				s=s.query("match",category=request.GET["category"])
				#s=s.filter('terms',category=)
			result=s.execute()
			ctx["products"]=result
			#聚合結果和數據是分開的，查詢結果集在hits中，聚合結果在aggregations中，要獲取聚合的數據可以通過buckets獲得
			category_aggregations=list()
			for bucket in result.aggregations.categories.buckets:
				category_name=bucket.key
				doc_count=bucket.doc_count
				category_url_params=request.GET.copy()
				category_url_params["category"]=category_name 
				category_url="{}?{}".format(reverse("main_home"),category_url_params.urlencode())

				category_aggregations.append({"name":category_name,"doc_count":doc_count,"url":category_url})
			ctx["category_aggs"]=category_aggregations

		if "category" in request.GET:
			remove_category_search_params=request.GET.copy()
			del remove_category_search_params["category"]
			remove_category_url="{}?{}".format(reverse("main_home"),remove_category_search_params.urlencode())
			ctx["remove_category_url"]=remove_category_url
		return render(request,"main_home.html",ctx)

Q（price_q）內的搜索語句結構為：price_q={'range':{"price":{"gte":35,"lte":"70"}}}

s=s.query(Q(price_q)) 搜索 35<=price<=70 的記錄

s=s.query(Q({"match":{"site":"taobao"}})) (1)

s=s.query(Q({"match":{"goods_class":"clothes"}})) (2)

(1)和（2）中的兩個搜索條件網站中包含taobao，商品類別中包含clothes 是且的關系，滿足（1）且滿足（2)

聚合

site_agg=A({"terms":{"field":"site"}})

s.aggs.bucket('sites",site_agg)

第一個參數為聚合字段的別名，可用於獲取聚合結果，該語句的含義就是對字段"site"做分組，聚合后的字段數據名為‘sites'

調用聚合的結果

for bucket in result.aggregations.sites.buckets:
　　site_name=bucket.key
　　doc_count=bucket.doc_count

elasticsearch的搜索有兩種 query和filter .

query：不僅要對匹配的結果進行檢索，還要對結果的匹配度進行打分，然后按匹配度排序返回結果

filter：只需篩選出符合的結果

對於查詢返回的結果

result=s.execute()

可以通過循環獲取

for hit in result:

　　print(hit.name,hit.price,hit.category)

result.hits.total可以獲取結果條數

https://elasticsearch-dsl.readthedocs.io/en/latest/

對查詢做分頁的from ,size設置

s=s[10,34]

這樣相當於 from=10,size=14

排序

s=s.sort('name','-price') 這樣會對結果以name的升序，price的倒序排序。

s = Search().sort( 'category', '-title', {"price" : {"order" : "asc", "mode" : "avg"}} )
will sort by category, title (in descending order) and price in ascending order using the avg mode

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticSearch基本使用 springboot 使用 elasticsearch（使用） python下使用ElasticSearch elasticsearch-head的使用 Elasticsearch 使用集群 Elasticsearch _reindex Alias使用 Elasticsearch配置與使用 elasticsearch 與springboot 結合使用 Docker中使用ElasticSearch elasticsearch 及分詞使用