Elasticsearch(GEO)空間檢索查詢


Elasticsearch(GEO)空間檢索查詢python版本

1、Elasticsearch

ES的強大就不用多說了,當你安裝上插件,搭建好集群,你就擁有了一個搜索系統。

當然,ES的集群優化和查詢優化就是另外一個議題了。這里mark一個最近使用的es空間檢索的功能。

 

2、ES GEO空間檢索

空間檢索顧名思義提供了通過空間距離和位置關系進行檢索的能力。有很多空間索引算法和類庫可供選擇。

ES內置了這種索引方式。下面詳細介紹。

step1:創建索引

def create_index():
    mapping = {
        "mappings": {
            "poi": {
                "_routing": {
                    "required": "true",
                    "path": "city_id"
                },
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "geofence_type": {
                        "type": "integer"
                    },
                    "city_id": {
                        "type": "integer"
                    },
                    "city_name": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "activity_id": {
                        "type": "integer"
                    },
                    "post_date": {
                        "type": "date"
                    },
                    "rank": {
                        "type": "float"
                    },
                    # 不管是point還是任意shape, 都用geo_shape,通過type來設置
                    # type在數據里
                    "location_point": {
                        "type": "geo_shape"
                    },
                    "location_shape": {
                        "type": "geo_shape"
                    },
                    # 在計算點間距離的時候, 需要geo_point類型變量
                    "point": {
                        "type": "geo_point"
                    }
                }
            }
        }
    }
    # 創建索引的時候可以不 mapping
    es.create_index(index='mapapp', body=mapping)
    # set_mapping = es_dsl.set_mapping('mapapp', 'poi', body=mapping)

這里我們創建了一個名叫mapapp的索引,映射的設置如mapping所示。

 

2、批量插入數據bulk

def bulk():
# actions 是一個可迭代對象就行, 不一定是list
workbooks = xlrd.open_workbook('./geo_data.xlsx')
table = workbooks.sheets()[1]
colname = list()
actions = list()
for i in range(table.nrows):
if i == 0:
colname = table.row_values(i)
continue
geo_shape_point = json.loads(table.row_values(i)[7])
geo_shape_shape = json.loads(table.row_values(i)[8])
geo_point = json.loads(table.row_values(i)[9])
raw_data = table.row_values(i)[:7]
raw_data.extend([geo_shape_point, geo_shape_shape, geo_point])
source = dict(zip(colname, raw_data))
geo = GEODocument(**source)
action = {
"_index": "mapapp",
"_type": "poi",
"_id": table.row_values(i)[0],
"_routing": geo.city_id,
#"_source": source,
"_source": geo.to_json(),
}
actions.append(action)
es.bulk(index='mapapp', actions=actions, es=es_handler, max=25)

刷入測試數據,geo_data數據形如:

id    geofence_type    city_id    city_name    activity_id    post_date    rank    location_point    location_shape    point
1    1    1    北京    100301    2016/10/20    100.30     {"type":"point","coordinates":[55.75,37.616667]}    {"type":"polygon","coordinates":[[[22,22],[4.87463,52.37254],[4.87875,52.36369],[22,22]]]}    {"lat":55.75,"lon":37.616667}
2    1    1    北京    100302    2016/10/21    12.00     {"type":"point","coordinates":[55.75,37.616668]}    {"type":"polygon","coordinates":[[[0,0],[4.87463,52.37254],[4.87875,52.36369],[0,0]]]}    {"lat":48.8567,"lon":2.3508}
3    1    1    北京    100303    2016/10/22    3432.23     {"type":"point","coordinates":[55.75,37.616669]}    {"type":"polygon","coordinates":[[[4.8833,52.38617],[4.87463,52.37254],[4.87875,52.36369],[4.8833,52.38617]]]}    {"lat":32.75,"lon":37.616668}
4    1    1    北京    100304    2016/10/23    246.80     {"type":"point","coordinates":[52.4796, 2.3508]}    {"type":"polygon","coordinates":[[[4.8833,52.38617],[4.87463,52.37254],[4.87875,52.36369],[4.8833,52.38617]]]}    {"lat":11.56,"lon":37.616669}

 

3、GEO查詢:兩點間距離

# 點與點之間的距離
# 按照距離升序排列,如果size取1個,就是最近的
def sort_by_distance():
    body = {
        "from": 0,
        "size": 1,
        "query": {
            "bool": {
                "must": [{
                    "term": {
                        "geofence_type": 1
                    }
                }, {
                    "term": {
                        "city_id": 1
                    }
                }]
            }
        },
        "sort": [{
            "_geo_distance": {
                "point": {
                    "lat": 8.75,
                    "lon": 37.616
                },
                "unit": "km",
                "order": "asc"
            }
        }]
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:
        print type(i), i

4、GEO查詢:邊界框過濾

tips:大家都知道,ES的過濾是會生成緩存的,所以在優化查詢的時候,常常需要將頻繁用到的查詢提取出來作為過濾呈現,但不幸的是,對於GEO過濾不會生成緩存,所以沒有必要考慮,這里為了做出區分,使用post_filter,查詢后再過濾,下面的都類似。

# 邊界框過濾:用框去圈選點和形狀
# 這里實現了矩形框選中
# post_filter后置filter, 對查詢結果再過濾; aggs常用后置filter
def bounding_filter():
    body = {
        "from": 0,
        "size": 1,
        "query": {
            "bool": {
                "must": [{
                    "term": {
                        "geofence_type": 1
                    }
                }, {
                    "term": {
                        "city_id": 1
                    }
                }]
            }
        },
        "post_filter": {
            "geo_shape": {
                "location_point": {
                    "shape": {
                        "type": "envelope",
                        "coordinates": [[52.4796, 2.3508], [48.8567, -1.903]]
                    },
                    "relation": "within"
                }
            }
        }
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:
        print type(i), i

5、GEO查詢:圓形圈選

# 邊界框過濾: 圓形圈選
# post_filter后置filter, 對查詢結果再過濾; aggs常用后置filter
def circle_filter():
    body = {
        "from": 0,
        "size": 1,
        "query": {
            "bool": {
                "must": [{
                    "term": {
                        "geofence_type": 1
                    }
                }, {
                    "term": {
                        "city_id": 1
                    }
                }]
            }
        },
        "post_filter": {
            "geo_shape": {
                "location_point": {
                    "shape": {
                        "type": "circle",
                        "radius": "10000km",
                        "coordinates": [22, 45]
                    },
                    "relation": "within"
                }
            }
        }
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:
        print type(i), i

6、GEO查詢:反選

# 邊界框反選:點落在框中,框被查詢出來
# post_filter后置filter, 對查詢結果再過濾; aggs常用后置filter
# 包含正則匹配regexp
def intersects():
    body = {
       "from": 0,
       "size": 1,
       "query": {
            "bool": {
                "must": [{
                    "term": {
                        "geofence_type": 1
                    }
                }, {
                    "regexp": {
                        "city_name": u".*北京.*"
                    }
                }, {
                    "term": {
                        "city_id": 1
                    }
                }]
            }
       },
       "post_filter": {
            "geo_shape": {
                "location_shape": {
                    "shape": {
                        "type": "point",
                        "coordinates": [22,22]
                    },
                    "relation": "intersects"
                }
            }
       }
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['hits']['hits']:
        print type(i), i

 

7、最后粘兩個空間聚合的例子,作為參考

# 空間聚合
# 按照與中心點距離聚合
def aggs_geo_distance():
    body = {
        "aggs": {
            "aggs_geopoint": {
                "geo_distance": {
                    "field": "point",
                    "origin": {
                        "lat": 51.5072222,
                        "lon": -0.1275
                    },
                    "unit": "km",
                    "ranges": [
                        {
                            "to": 1000
                        },
                        {
                            "from": 1000,
                            "to": 3000
                        },
                        {
                            "from": 3000
                        }
                    ]
                }
            }
        }
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['aggregations']['aggs_geopoint']['buckets']:
        print type(i), i


# 空間聚合
# geo_hash算法, 網格聚合grid
# 兩次聚合
def aggs_geohash_grid():
    body = {
        "aggs": {
            "new_york": {
                "geohash_grid": {
                    "field":     "point",
                    "precision": 5
                }
            },
            "map_zoom": {
                "geo_bounds": {
                    "field": "point"
              }
            }
          }
    }
    for i in es.search(index='mapapp', doc_type='poi', body=body)['aggregations']['new_york']['buckets']:
        print type(i), i

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM