第一個分析需求:計算每個tag下的商品數量
默認情況下,大部分字段都是被索引的(有個倒排索引),以使得他們可以被搜索。
然而,在腳本中排序、聚合和訪問字段的值,需要不同的搜索訪問模式。
搜索需要回答的問題是“哪些文檔包含這些搜索的內容?”,而排序和聚合需要回答的問題是“這個文檔中這個字段的值是什么?”
大部分字段都可以使用index-time,磁盤上的doc_values用於這個數據的訪問模式;
然而,text字段不支持doc_values。
代替的是,text字段使用一個叫做fielddata的數據結構,該數據結構含義是查詢時內存數據結構。該數據結構是按需求首次構建在一個被用於聚合、排序和在腳本的字段上。
它是通過讀取從磁盤每段的整個倒排索引來構建的,倒排搜索的內容<->文檔關系,其存儲在jvm堆上的內存上。
默認情況下text字段是沒有開啟的:
聚合時需要對相應的字段做處理如下,否則會報錯:
Fielddata is disabled on text fields by default.
Set fielddata=true on [your_field_name] in order to load fielddata
in memory by uninverting the inverted index. Note that this can however use significant memory.
PUT my_index/_mapping/my_type
{
"properties": {
"my_field": { ①
"type": "text",
"fielddata": true
}
}
}
GET /ecommerce/product/_search
{
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}
將文本field的fielddata屬性設置為true
PUT /ecommerce/_mapping/product
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}
GET /ecommerce/product/_search
{
"aggs": {
"group_by_aggs": {
"terms": {
"field": "tags"
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "ecommerce",
"_type": "product",
"_id": "2",
"_score": 1,
"_source": {
"name": "jiajieshi yagao",
"desc": "youxiao fangzhu",
"price": 25,
"producer": "jiajieshi producer",
"tags": [
"fangzhu"
]
}
},
{
"_index": "ecommerce",
"_type": "product",
"_id": "1",
"_score": 1,
"_source": {
"name": "gaolujie yagao",
"desc": "gaoxiao meibai",
"price": 30,
"producer": "gaolujie producer",
"tags": [
"meibai",
"fangzhu"
]
}
},
{
"_index": "ecommerce",
"_type": "product",
"_id": "3",
"_score": 1,
"_source": {
"name": "zhonghua yagao",
"desc": "caoben zhiwu",
"price": 40,
"producer": "zhonghua producer",
"tags": [
"qingxin"
]
}
}
]
},
"aggregations": {
"group_by_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fangzhu",
"doc_count": 2
},
{
"key": "meibai",
"doc_count": 1
},
{
"key": "qingxin",
"doc_count": 1
}
]
}
}
}
----------------------------------------------------------------------------------------------------------------
第二個聚合分析的需求:對名稱中包含yagao的商品,計算每個tag下的商品數量
GET /ecommerce/product/_search
{
"size": 0,
"query": {
"match": {
"name": "yagao"
}
},
"aggs": {
"all_tags": {
"terms": {
"field": "tags"
}
}
}
}
----------------------------------------------------------------------------------------------------------------
第三個聚合分析的需求:先分組,再算每組的平均值,計算每個tag下的商品的平均價格
GET /ecommerce/product/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
{
"took": 62,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fangzhu",
"doc_count": 2,
"avg_price": {
"value": 27.5
}
},
{
"key": "meibai",
"doc_count": 1,
"avg_price": {
"value": 30
}
},
{
"key": "qingxin",
"doc_count": 1,
"avg_price": {
"value": 40
}
}
]
}
}
}
----------------------------------------------------------------------------------------------------------------
第四個數據分析需求:計算每個tag下的商品的平均價格,並且按照平均價格降序排序:terms 條件的意思
GET /ecommerce/product/_search
{
"size": 0,
"aggs" : {
"all_tags" : {
"terms" : { "field" : "tags", "order": { "avg_price": "desc" } },
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}
----------------------------------------------------------------------------------------------------------------
第五個數據分析需求:按照指定的價格范圍區間進行分組,然后在每組內再按照tag進行分組,最后再計算每組的平均價格
GET /ecommerce/product/_search
{
"size": 0,
"aggs": {
"goup_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 20
},{
"from": 20,
"to": 40
},{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
結果:
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"goup_by_price": {
"buckets": [
{
"key": "0.0-20.0",
"from": 0,
"to": 20,
"doc_count": 0,
"group_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "20.0-40.0",
"from": 20,
"to": 40,
"doc_count": 2,
"group_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fangzhu",
"doc_count": 2,
"avg_price": {
"value": 27.5
}
},
{
"key": "meibai",
"doc_count": 1,
"avg_price": {
"value": 30
}
}
]
}
},
{
"key": "40.0-50.0",
"from": 40,
"to": 50,
"doc_count": 1,
"group_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "qingxin",
"doc_count": 1,
"avg_price": {
"value": 40
}
}
]
}
}
]
}
}
}