Elasticsearch-sql針對groupby 聚合查詢limit問題


在使用Elasticsearch-sql插件查詢ES中,我們經常遇到多個字段group by聚合查詢,例如:

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num    
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,
ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31    
where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z'    
group by pageRef,province,city,ip limit 100

解析后:

{
    "from": 0,
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "match_phrase": {
                                                "appCode": {
                                                    "query": "5f05acfc9a084d9f9a07e165a2516c18",
                                                    "slop": 0,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": "2018-07-07T09:57:15.436Z",
                                                    "to": null,
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": null,
                                                    "to": "2018-08-07T09:57:15.436Z",
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        }
                                    ],
                                    "disable_coord": false,
                                    "adjust_pure_negative": true,
                                    "boost": 1
                                }
                            }
                        ],
                        "disable_coord": false,
                        "adjust_pure_negative": true,
                        "boost": 1
                    }
                }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "_source": {
        "includes": [
            "SUM"
        ],
        "excludes": []
    },
    "aggregations": {
        "pageRef": {
            "terms": {
                "field": "pageRef",
                "size": 100,
                "shard_size": 2000,
                "min_doc_count": 1,
                "shard_min_doc_count": 0,
                "show_term_doc_count_error": false,
                "order": [
                    {
                        "_count": "desc"
                    },
                    {
                        "_term": "asc"
                    }
                ]
            },
            "aggregations": {
                "province": {
                    "terms": {
                        "field": "province",
                        "size": 10,
                        "min_doc_count": 1,
                        "shard_min_doc_count": 0,
                        "show_term_doc_count_error": false,
                        "order": [
                            {
                                "_count": "desc"
                            },
                            {
                                "_term": "asc"
                            }
                        ]
                    },
                    "aggregations": {
                        "city": {
                            "terms": {
                                "field": "city",
                                "size": 10,
                                "min_doc_count": 1,
                                "shard_min_doc_count": 0,
                                "show_term_doc_count_error": false,
                                "order": [
                                    {
                                        "_count": "desc"
                                    },
                                    {
                                        "_term": "asc"
                                    }
                                ]
                            },
                            "aggregations": {
                                "ip": {
                                    "terms": {
                                        "field": "ip",
                                        "size": 10,
                                        "min_doc_count": 1,
                                        "shard_min_doc_count": 0,
                                        "show_term_doc_count_error": false,
                                        "order": [
                                            {
                                                "_count": "desc"
                                            },
                                            {
                                                "_term": "asc"
                                            }
                                        ]
                                    },
                                    "aggregations": {
                                        "num": {
                                            "sum": {
                                                "field": "errorCount"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

我們看到解析后的json看到:limit 15中的15只對group by 后面的第一個字段起作用,其他的字段size其實都是10,limit並沒起作用,這就是Elasticsearch-sql針對group by存在的問題。

解決方式為:使用terms(field='correspond_brand_name',size='10',alias='correspond_brand_name',include='\".*sport.*\"',exclude='\"water_.*\"')")

注意:這種方式不再添加limit關鍵詞,另外還要注意group by后面字段的順序不一樣,因為數據的情況,查詢結果條數不一樣,但是整體是沒有問題的。

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num    
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,
ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31    
where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z'    
group by terms(field='pageRef',size='15',alias='pageRef'),
terms(field='province',size='15',alias='province'),
terms(field='city',size='15',alias='city'),
terms(field='ip',size='15',alias='ip')

解析后:

{
    "from": 0,
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "match_phrase": {
                                                "appCode": {
                                                    "query": "5f05acfc9a084d9f9a07e165a2516c18",
                                                    "slop": 0,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": "2018-07-07T09:57:15.436Z",
                                                    "to": null,
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": null,
                                                    "to": "2018-08-07T09:57:15.436Z",
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        }
                                    ],
                                    "disable_coord": false,
                                    "adjust_pure_negative": true,
                                    "boost": 1
                                }
                            }
                        ],
                        "disable_coord": false,
                        "adjust_pure_negative": true,
                        "boost": 1
                    }
                }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "_source": {
        "includes": [
            "SUM"
        ],
        "excludes": []
    },
    "aggregations": {
        "pageRef": {
            "terms": {
                "field": "pageRef",
                "size": 15,
                "min_doc_count": 1,
                "shard_min_doc_count": 0,
                "show_term_doc_count_error": false,
                "order": [
                    {
                        "_count": "desc"
                    },
                    {
                        "_term": "asc"
                    }
                ]
            },
            "aggregations": {
                "province": {
                    "terms": {
                        "field": "province",
                        "size": 15,
                        "min_doc_count": 1,
                        "shard_min_doc_count": 0,
                        "show_term_doc_count_error": false,
                        "order": [
                            {
                                "_count": "desc"
                            },
                            {
                                "_term": "asc"
                            }
                        ]
                    },
                    "aggregations": {
                        "city": {
                            "terms": {
                                "field": "city",
                                "size": 15,
                                "min_doc_count": 1,
                                "shard_min_doc_count": 0,
                                "show_term_doc_count_error": false,
                                "order": [
                                    {
                                        "_count": "desc"
                                    },
                                    {
                                        "_term": "asc"
                                    }
                                ]
                            },
                            "aggregations": {
                                "ip": {
                                    "terms": {
                                        "field": "ip",
                                        "size": 15,
                                        "min_doc_count": 1,
                                        "shard_min_doc_count": 0,
                                        "show_term_doc_count_error": false,
                                        "order": [
                                            {
                                                "_count": "desc"
                                            },
                                            {
                                                "_term": "asc"
                                            }
                                        ]
                                    },
                                    "aggregations": {
                                        "num": {
                                            "sum": {
                                                "field": "errorCount"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

從解析后的內容看出:四個字段的size都是15了,可以使用postman查詢看看,結果是正確的。

 語法來自;https://github.com/NLPchina/elasticsearch-sql中的terms用法。

補充:如果是nested(嵌套查詢),比如:

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num   
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31   
where appCode = '5f05acfc9a084d9f9a07e165a2516c18'        
and logTime>= '2018-07-08T06:20:13.144Z'    
and logTime<= '2018-08-08T06:20:13.144Z'
group by pageRef,province,city,ip,nested(errors.message) limit 10

那么需要這么來查:

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num   
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31   
where appCode = '5f05acfc9a084d9f9a07e165a2516c18'        
and logTime>= '2018-07-08T06:20:13.144Z'    
and logTime<= '2018-08-08T06:20:13.144Z'
group by terms(field='pageRef',size='15',alias='pageRef'),
terms(field='province',size='1',alias='province'),
terms(field='city',size='2',alias='city'),
terms(field='ip',size='3',alias='ip'),
terms(field='errors.message',size='4',alias='errors.message',nested="errors")

對於聚合查詢可以參考官方文檔:https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations.html

本文版權歸作者所有,歡迎轉載,請在文章頁面明顯位置給出原文連接:https://www.cnblogs.com/wynjauu/articles/9439089.html

作者:敲完代碼好睡覺


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM