在Python中操作Elasticsearch

本文转载自查看原文 2019-04-23 09:52 2795

$ brew services start elasticsearch
$ brew services stop elasticsearch

或

elasticsearch // 启动
control + c // 停止
重启：

$ brew services restart elasticsearch

启动后浏览

http://localhost:9200

在Python中操作Elasticsearch

安装Elasticsearch模块

pip install elasticsearch

忽略

如果elasticsearch返回2XX响应，则API调用被视为成功（并将返回响应）。否则TransportError将引发（或更具体的子类）的实例。您可以在“ 例外”中查看其他异常和错误状态。如果您不希望引发异常，则始终ignore 可以使用应忽略的单个状态代码或其列表传递参数：

from elasticsearch import Elasticsearch
es = Elasticsearch()

# ignore 400 cause by IndexAlreadyExistsException when creating an index
es.indices.create(index='test-index', ignore=400)

# ignore 404 and 400
es.indices.delete(index='test-index', ignore=[400, 404])

超时

全局超时可以设置构造客户端时（见 Connection的timeout参数）或在使用每个请求的基础request_timeout（以秒的浮点数）的任何API调用的一部分，则此值将得到传递给perform_request连接类的方法：

# only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)

响应过滤

该filter_path参数用于减少elasticsearch返回的响应。例如，仅返回_id和_type，这样做：

result=es.search(index='my_index1',doc_type='test_type1',body={'query':{'match_all':{}}},filter_path=['hits.hits._type','hits.hits._id'])
# result=es.search(index='my_index1',doc_type='test_type1',body={'query':{'match_all':{}}},)
print(result)

{'hits': {'hits': [{'_type': 'test_type1', '_id': '2'}, {'_type': 'test_type1', '_id': '1'}]}}

它还支持*通配符以匹配字段名称的任何字段或部分：

es.search(index='test-index', filter_path=['hits.hits._*'])

或者

result=es.search(index='my_index1',doc_type='test_type1',body={'query':{'match_all':{}}},filter_path=['hits.hits._i*'])
#找到以  _i  开头的内容

{'hits': {'hits': [{'_index': 'my_index1', '_id': '2'}, {'_index': 'my_index1', '_id': '1'}]}}

注意

某些API调用还接受timeout传递给Elasticsearch服务器的参数。此超时是内部的，并不保证请求将在指定时间内结束。

添加数据

from elasticsearch import Elasticsearch

# 默认host为localhost,port为9200.但也可以指定host与port
es = Elasticsearch()

# 添加或更新数据,index，doc_type名称可以自定义，id可以根据需求赋值,body为内容    如果不写id值的话会生成一个随机数的id
es.index(index="my_index",doc_type="test_type",id=1,body={"name":"python","addr":"深圳"})
如

{'_index': 'my_index', '_type': 'test_type', '_id': '9K3sSGoBL92egitcUv3j', '_score': 1.0, '_source': {'name': 'c', 'addr': '广州', 'tittle': '13 c学习', 'age': 13}}

没有设定id 就生成了这种id值

# 或者:ignore=409忽略文档已存在异常
es.create(index="my_index",doc_type="test_type",id=1,ignore=409,body={"name":"python","addr":"深圳"})
---------------------

#删除数据

delete：删除指定index、type、id的文档

es.delete(index='indexName', doc_type='typeName', id='idValue') #当被删除的文档不存在的时候会报错

elasticsearch.exceptions.NotFoundError: NotFoundError(404, 

'{"_index":"my_index","_type":"test_type","_id":"13","_version":3,"result":"not_found","_shards":
{"total":2,"successful":1,"failed":0},"_seq_no":6,"_primary_term":1}')

#条件删除

body = {
   'query':{
       'range':{
           'age':{
               'gte':10,
               'lte':10
           }
       }
   }
}
#
es.delete_by_query(index='my_index',body=body)

查询数据

from elasticsearch import Elasticsearch

es = Elasticsearch()

# 获取索引为my_index,文档类型为test_type的所有数据,result为一个字典类型
result = es.search(index="my_index")

# 或者这样写:搜索id=1的文档
result = es.get(index="my_index",doc_type="test_type",id=1)

# 打印所有数据
for item in result["hits"]["hits"]:
    print(item["_source"])
---------------------

或者

# 或者
body = {
    "query":{
        "match_all":{}
    }
}
result=es.search(index="my_index",body=body)  #注意查询不要使用  es.search(index="my_index",doc_type="test_type",body=body)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

constant_score

通常当查找一个精确值的时候，我们不希望对查询进行评分计算。只希望对文档进行包括或排除的计算，所以我们会使用 constant_score 查询以非评分模式来执行

term 查询并以一作为统一评分。(只查询不评分将分数默认至为1 查询效率会很高)

最终组合的结果是一个 constant_score 查询，它包含一个 term 查询：

body={
    'query':{
        'constant_score': #我们用 constant_score 将 term 查询转化成为过滤器
            {'filter':
                 {'term': #我们之前看到过的 term 查询
                      {'age':11}
                  }
             }
    }
}

result=es.search(index='my_index1',doc_type='test_type1',body=body)

{'_index': 'my_index1', '_type': 'test_type1', '_id': '1', '_score': 1.0, '_source': 
{'name': 'wang', 'age': 11, 'hoby': ['music', 'game', 'car'], 'other': {'job': 'python', 'phone_mun': '18700000000'}}}

查询置于 filter 语句内不进行评分或相关度的计算，所以所有的结果都会返回一个默认评分 1 。

term、terms查询

term query会去倒排索引中寻找确切的term，它并不知道分词器的存在，这种查询适合keyword、numeric、date等明确值的

term：查询某个字段里含有某个关键词的文档

body = {
    "query":{
        "term":{
            "name":"python"
        }
    }
}
result=es.search(index="my_index",body=body)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

查询结果

{'_index': 'my_index', '_type': 'test_type', '_id': '1', '_score': 0.2876821, '_source': {'name': 'python', 'addr': '深圳'}}
{'name': 'python', 'addr': '深圳'}

terms：查询某个字段里含有多个关键词的文档

body = {
    "query":{
        "terms":{
            "name":["python","java"]
        }
    }
}
result=es.search(index="my_index",body=body)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

结果

{'_index': 'my_index', '_type': 'test_type', '_id': '2', '_score': 1.0, '_source': {'name': 'java', 'addr': '上海'}}
{'name': 'java', 'addr': '上海'}
{'_index': 'my_index', '_type': 'test_type', '_id': '1', '_score': 1.0, '_source': {'name': 'python', 'addr': '深圳'}}
{'name': 'python', 'addr': '深圳'}

match查询

match query 知道分词器的存在，会对field进行分词操作，然后再查询

body = {
    "query":{
        "match":{
            "name":"python java" #它和term区别可以理解为term是精确查询，这边match模糊查询；match会对"python java"分词为两个单词，然后term对认为这是一个单词
 }
    }
}
result=es.search(index="my_index",body=body)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

match_all：查询所有文档

GET /customer/doc/_search/
{
  "query": {
    "match_all": {}
  }
}

multi_match：可以指定多个字段

body = {
    "query":{
        "multi_match":{
            "query":"python",
            "fields":["name","tittle"] #只要里面一个字段值是 python 既可以
        }
    }
}
result=es.search(index="my_index",body=body)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

排序

使用sort实现排序（类似sql）：desc 降序，asc升序

ids

body2={
    'query':{
        'ids':{
            'type':'test_type',
            'values':['1','2'] ## 搜索出id为1和2d的所有数据  如果没有id为2的就只展示 1 的数据，或者写成 'values':[1,2,3]不带引号的
                    # 'values':[1,1] 就只展示一次  id为1的数据

        }
    }
}
result=es.search(index="my_index",body=body2)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

复合查询bool

组合过滤器编辑

前面的两个例子都是单个过滤器（filter）的使用方式。在实际应用中，我们很有可能会过滤多个值或字段。比方说，怎样用 Elasticsearch 来表达下面的 SQL ？

SELECT product
FROM   products
WHERE  (price = 20 OR productID = "XHDK-A-1293-#fJ3")
  AND  (price != 30)

这种情况下，我们需要 bool （布尔）过滤器。 这是个 复合过滤器（compound filter） ，它可以接受多个其他过滤器作为参数，并将这些过滤器结合成各式各样的布尔（逻辑）组合。

布尔过滤器编辑

一个 bool 过滤器由三部分组成：

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

j将上边的转换成es

b4={
    'query':{
        'bool':{
            'must_not':[{'term':{'price':30
            }}],
            'should':[{'term':{'price':20}},{'term':{'productID':'XHDK-A-1293-#fJ3'}},]

        }
    }
}

result=es.search(index='my_store',doc_type='products',body=b4)

bool有3类查询关系，must(都满足),should(其中一个满足),must_not(都不满足)

body2 = {
    "query":{
        "bool":{
            "must":[
                {'match':{  #注意这must 里边的条件是不能用term的  用term来查询会查不出结果
                    'addr':'上海'
                }},
                {'match':{
                    'name':'java'
                }}

            ]
        }
    }
}

result=es.search(index="my_index",body=body2)

for item in result["hits"]["hits"]:
    print('111')
    print(item)
    print(item["_source"])

范围查询

body2={
    'query':{
        'range':{
            'age':{
                'lte':14,
                'gte':10
            }
        }
    }
}
# 年龄大于等于10 小于等于14
result=es.search(index="my_index",body=body2)
# print(result)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

前缀查询

body2={
    'query':{
        'prefix':{
           'name':'p' # 查询前缀为"p"的所有数据

        }
    }
}

result=es.search(index="my_index",body=body2)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

通配符查询

body2={
    'query':{
        'wildcard':{
           'name':'*p' # 查询以"p"为后缀的所有数据

        }
    }
}
result=es.search(index="my_index",body=body2)
for item in result["hits"]["hits"]:
    print(item)
    print(it

排序

body2={
    'query':{
        'range':{
           'age':{
               'gte':10,
               'lte':20
           } # 查询以"p"为后缀的所有数据
        }

    },
    'sort':{'age':{'order':'desc'}  #以年龄降序排列
    }
}
result=es.search(index="my_index",body=body2)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

filter_path

响应过滤

# 只需要获取_id数据,多个条件用逗号隔开
es.search(index="my_index",doc_type="test_type",filter_path=["hits.hits._id"])

{'_id': '5'}
{'_id': '8a3lSGoBL92egitcqP1n'}
{'_id': '8'}
{'_id': '2'}
{'_id': '4'}
{'_id': '6'}
{'_id': '1'}

# 获取所有数据
es.search(index="my_index",doc_type="test_type",filter_path=["hits.hits._*"])

{'_index': 'my_index', '_type': 'test_type', '_id': '8a3lSGoBL92egitcqP1n', '_score': 1.0, '_source': {'name': 'c', 'addr': '广州', 'tittle': 'c学习'}}
{'_index': 'my_index', '_type': 'test_type', '_id': '8', '_score': 1.0, '_source': {'name': 'c', 'addr': '广州', 'tittle': '10 c学习', 'age': 10}}

count

执行查询并获取该查询的匹配数

body2={
    'query':{
        'range':{
           'age':{
               'gte':10,
               'lte':20
           } # 查询以"p"为后缀的所有数据
        }

    },
}
result=es.count(index="my_index",body=body2)
print(result)
#结果是5

度量类聚合

获取最大值

body = {
    "query":{
        "match_all":{}
    },
    "aggs":{                        # 聚合查询
        "max_age":{                 # 最大值的key
            "max":{                 # 最大
                "field":"age"       # 查询"age"的最大值
            }
        }
    }
}
#
result=es.search(index="my_index2",body=body)
print(result['aggregations'])

查询平均值

body = {
    "query":{
        "match_all":{}
    },
    "aggs":{                        # 聚合查询
        "avg_age":{                 # 平均值的key
            "avg":{                 # 平均
                "field":"age"       # 查询"age"的平均值
            }
        }
    }
}
#
result=es.search(index="my_index2",body=body)
print(result['aggregations'])

查询最小值

body = {
    "query":{
        "match_all":{}
    },
    "aggs":{                        # 聚合查询
        "min_age":{                 # 平均值的key
            "min":{                 # 平均
                "field":"age"       # 查询"age"的平均值
            }
        }
    }
}
#
result=es.search(index="my_index2",body=body)
print(result['aggregations'])

查询和

body = {
    "query":{
        "match_all":{}
    },
    "aggs":{                        # 聚合查询
        "sum_age":{                 # 平均值的key
            "sum":{                 # 平均
                "field":"age"       # 查询"age"的平均值
            }
        }
    }
}
#
result=es.search(index="my_index2",body=body)
print(result['aggregations'])

_source：当我们希望返回结果只是一部分字段时，可以加上_source

body = {
    "_source":["name"],
    "query":{
        "match":{
            "name":"python"}
    }
}
result=es.search(index="my_index",body=body)
# print(a)
for item in result["hits"]["hits"]:
    print(item)
    print(item["_source"])

　结果

{'_index': 'my_index', '_type': 'test_type', '_id': '1', '_score': 0.2876821, '_source': {'name': 'python'}}
{'name': 'python'}

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Python 操作Elasticsearch之elasticsearch模块 python插入Elasticsearch操作 7、python操作ElasticSearch Python操作ElasticSearch python操作elasticsearch7 elasticsearch for Python之操作篇 Python操作elasticsearch python操作Elasticsearch (一、例子) python下的Elasticsearch操作 Python 操作 ElasticSearch

在Python中操作Elasticsearch

Python Elasticsearch api(官方文档)

在Python中操作Elasticsearch

安装Elasticsearch模块

忽略

超时

响应过滤

添加数据

查询数据

constant_score

term、terms查询

match查询

排序

ids

复合查询bool

组合过滤器编辑

布尔过滤器编辑

范围查询

前缀查询

通配符查询

排序

filter_path

count

度量类聚合

获取最大值

免责声明！