OLAP之Druid之查詢


數據查詢

Druid的聚合查詢主要有三種形式:

  • Timeseries
  • TopN
  • GroupBy

一般而言,OLAP系統最核心的能力是GroupBy查詢,Druid也不例外。 但是GroupBy查詢資源消耗較多,TopNTimeseries作為GroupBy的有益補充,能夠改善查詢的性能。我們建議:如果TopNTimeseries能夠滿足業務的應用場景,那么盡量采用這兩種查詢,而非GroupBy

Druid提供RESTful的查詢接口,用戶使用JSON表達查詢意圖。

查詢命令:

curl -X POST 'broker:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -d @<query_json_file>

注意點

在Druid查詢中,過濾條件是所有查詢都可能涉及的部分,並且有一些使用技巧,需要特別注意。請參考Filters

指標聚合這部分也是非常重要的,Aggregations也提供了系統的介紹,此處就不再贅述了。我們需要指出的是,這一頁文檔中Filtered Aggregator能夠提供非常強大的查詢功能,比如在查詢過程中根據維度取值定制指標。

GroupBy

示例

{
  "queryType": "groupBy",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "dimensions": ["country", "device"], #需要聚合的維度列
  "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] }, #limit語句
  "filter": { #過濾條件
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "carrier", "value": "AT&T" },
      { "type": "or", 
        "fields": [
          { "type": "selector", "dimension": "make", "value": "Apple" },
          { "type": "selector", "dimension": "make", "value": "Samsung" }
        ]
      }
    ]
  },
  "aggregations": [ #返回的指標列
    { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
    { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
  ],
  "postAggregations": [ #這部分是可選的
    { "type": "arithmetic",
      "name": "avg_usage",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "fieldName": "data_transfer" },
        { "type": "fieldAccess", "fieldName": "total_usage" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ], #本次查詢需要覆蓋的時間范圍
  "having": { #having語句,這部分是可選的
    "type": "greaterThan",
    "aggregation": "total_usage",
    "value": 100
  }
}

Timeseries

示例

{
  "queryType": "timeseries",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "descending": "true", #是否排序
  "filter": { #過濾條件
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
      { "type": "or",
        "fields": [
          { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
          { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
        ]
      }
    ]
  },
  "aggregations": [ #返回的指標列
    { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
    { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
  ],
  "postAggregations": [ #這部分是可選的
    { "type": "arithmetic",
      "name": "sample_divide",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
        { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-04T00:00:00.000" ] #本次查詢覆蓋的時間范圍
}

Timeseries query通常對空的查詢時間段返回0作為查詢結果

TopN

  • TopN查詢返回的是根據某一維度進行group by后再排序,返回結果集
  • 為了提高執行效率,TopN的查詢是近似查詢(從我們使用經驗來看,返回結果基本是比較准確的)

示例

{
  "queryType": "topN",
  "dataSource": "sample_data",
  "dimension": "sample_dim", #需要聚合的維度列
  "threshold": 5,
  "metric": "count", #作為排序依據的指標列
  "granularity": "all",
  "filter": { #過濾條件
    "type": "and",
    "fields": [
      {
        "type": "selector",
        "dimension": "dim1",
        "value": "some_value"
      },
      {
        "type": "selector",
        "dimension": "dim2",
        "value": "some_other_val"
      }
    ]
  },
  "aggregations": [ #返回的指標列
    {
      "type": "longSum",
      "name": "count",
      "fieldName": "count"
    },
    {
      "type": "doubleSum",
      "name": "some_metric",
      "fieldName": "some_metric"
    }
  ],
  "postAggregations": [ #后處理邏輯,這部分是可選的
    {
      "type": "arithmetic",
      "name": "sample_divide",
      "fn": "/",
      "fields": [
        {
          "type": "fieldAccess",
          "name": "some_metric",
          "fieldName": "some_metric"
        },
        {
          "type": "fieldAccess",
          "name": "count",
          "fieldName": "count"
        }
      ]
    }
  ],
  "intervals": [
    "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000" #查詢覆蓋的時間范圍
  ]
}

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM