clickhouse-(09)- row_number() over (partition by)的幾種實現方法


ClickHouse row_number() over (partition by)的幾種實現方法

hive中有row_number() over (partition by)函數,可以一句SQL實現想要的排序,在ClickHouse中有很多種實現方式,本篇就介紹一下幾種方法。

目錄

1.row_number排序

2.row_number排序后取出rank=1的結果

3.特殊場景

1.row_number排序

HIVE中寫法:

select number,
       row_number() over (partition by number order by time desc) as rank
  from table a
 GROUP BY number

ClickHouse寫法:

select number,
       groupArray(time) AS arr_val,
       arrayEnumerate(arr_val) as row_number
  from (select distinct orderid as number,
               toDate(operatetime) as time
          from table
         order by time desc
         ) a
 GROUP BY number

2.row_number排序后取出rank=1的結果

hive寫法:

select orderid
  from (select orderid,
               row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
          from table
         where d = '${CurrentDate}'
         ) a
 where row_num = 1;

ClickHouse寫法:

方法1:利用groupArray

select orderid, 
       groupArray(1)(datachange_lasttime) as dates
  from (select orderid, 
               datachange_lasttime
          from table
         ORDER BY orderid, datachange_lasttime desc
        ) a
 group by orderid

方法2:利用max函數實現倒序,如果正序使用min函數即可

select orderid,
       max(datachange_lasttime) as datachange_lasttime
  from table
 group by orderid

方法3:利用rowNumberInAllBlocks函數

select orderid, status
  from (select orderid, status, rowNumberInAllBlocks() as rank
          from (select orderid, status, datachange_lasttime
                  from table
                 order by orderid, datachange_lasttime desc
                 ) a
       ) b LIMIT 1 BY orderid

方法4:利用arrayEnumerate函數

select orderid
  from (select orderid,
               groupArray(datachange_lasttime) AS arr_val,
               arrayEnumerate(arr_val) as row_number
          from (select orderid, datachange_lasttime
                  from table
                 order by datachange_lasttime desc
                 ) a
         GROUP BY number
         ) b
 where row_number = 1

3.特殊場景
要求:

對於以下場景,需要按照orderid分組,按照日期倒序,取最新一條,若日期一致,則隨機取一條作為結果即可

hive寫法:

select orderid  from (select orderid,               status,               row_number() over(partition by orderid order by datachange_lasttime desc) as row_num          from table         where d = '${CurrentDate}'         ) as b where row_num = 1

ClickHouse寫法:

通過上面的案例,我們很容易想到,把上面的結果作為一個子表,與原表進行關聯,只是這樣關聯,隨便舉一個關聯的寫法:

select a.orderid as orderid_a, a.status as status  from olap_htlmaindb.tmp_ord_orders_status_s_pre a inner join (select orderid, groupArray(1)(datachange_lasttime) as dates               from (select orderid, datachange_lasttime                       from table                      ORDER BY orderid, datachange_lasttime desc                     ) a              group by orderid) b    on a.orderid = b.orderid   and cast(a.datachange_lasttime as String) = cast(b.dates [ 1 ] as String) 

這里我們是先把符合要求的orderid和時間取出來,再回去關聯,取出需要的列,因為這些函數都有一個缺點是只能有partition by的字段和排序字段,不能有其他字段,所以要返回關聯,所以上面四種方法,ininer join原表,都不能解決上面案例的問題。

這里就想到了LIMIT 1 BY這個方法,這個方法其實是最有效的,如下:

select orderid,        status,        datachange_lasttime  from table order by orderid, datachange_lasttime desc  LIMIT 1 BY orderid


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM