項目實戰從0到1之hive(17)hive求新增用戶數,日活,留存率


很簡單的sql 用戶分析語句 :只要自定義簡單的udf函數 獲取統計時間createdatms字段的
使用的日歷類 add方法 和simpledateformat 將long類型的 定義多個重載方法 獲取返回值int類型 或者long類型 進行時間判斷即可
getdaybegin(天開始),比如2017-08-08這一天的createtime為15288888888888 獲取到 152888880000(代表20170808 00:00:00)當天開始的凌晨 getWeekbegin,getMonthgin  同上道理 
過去的五周(包含本周)某個app每周的周活躍用戶數
 注意,如果能夠界定分區區間的話,務必要進行分區限定查詢。
 20170501
 ym/day/hm
 //過去的五周,每周的活躍數
 select  formattime(createdatms,'yyyyMMdd',0) stdate, count(distinct deviceid) stcount from ext_startup_logs where concat(ym,day)>=formattime(getweekbegin(-4),'yyyyMMdd') and appid ='sdk34734' group by formattime(createdatms,'yyyyMMdd',0) ;
 2.最近的六個月(包含本月)每月的月活躍數。
 select  formattime(createdatms,'yyyyMM') stdate, count(distinct deviceid) stcount from ext_startup_logs where ym >= formattime(getmonthbegin(-5),'yyyyMM') and appid ='sdk34734' group by formattime(createdatms,'yyyyMM') ;
 3.沉默用戶數
 3.1)查詢今天沉默用戶數     //某個設備 啟動時間 在今天(本周、本月) 只有一次 ,后續在無啟動
 select count(*) from (select deviceid , count(createdatms) dcount,min(createdatms) dmin from ext_startup_logswhere appid = 'sdk34734' group by deviceid having dcount = 1 and min(createdatms) > getdaybegin(-1)) t
 4.啟動次數
 4.1)今天app的啟動次數
 啟動次數類似於活躍用戶數,活躍用戶數去重,啟動次數不需要去重。
 select count(*) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd');
 5.版本分布
 5.1)今天appid為34734的不同版本的活躍用戶數。
 select appversion,count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and ym = formattime(getdaybegin(),'yyyyMM') and day = formattime(getdaybegin(),'dd') group by appversion ;
 
 5.2)本周內每天各版本日活
 select formattime(createdatms,'yyyyMMdd'),appversion , count(distinct deviceid) from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(),'yyyyMMdd') group by formattime(createdatms,'yyyyMMdd') , appversion
 
 
 [用戶構成分析]
 1.本周回流用戶  上周未啟動,本周啟動了的用   必須當使用not in 子查詢和后續查詢都必須加入別名
 select
 distinct a.deviceid
 from ext_startup_logs a
 where a.appid = 'sdk34734' and concat(a.ym,a.day) >= formattime(getweekbegin(),'yyyyMMdd') and a.deviceid not in (
 select
 distinct t.deviceid
 from ext_startup_logs t
 where t.appid = 'sdk34734' and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd') and  concat(t.ym,t.day) < formattime(getweekbegin(),'yyyyMMdd')
 )
 
 2.連續活躍n周  連續三周活躍  2018101 20181008 20181016  去掉重有三次就是活躍
 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd') group by deviceid having c = 3
 
 3.忠誠用戶 連續活躍5周的
 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd') group by deviceid having c = 5
 
 4.連續活躍用戶 連續活躍n周
 select deviceid , count(distinct(formattime(createdatms,'yyyyMMdd',0))) c from ext_startup_logs where appid = 'sdk34734' and concat(ym,day) >= formattime(getweekbegin(-1),'yyyyMMdd') group by deviceid having c = 2
 
 
 select distinct(a.deviceid) from ext_startup_logs a where  concat(a.ym,a.day) < formattime(getweekbegin(-4),'yyyyMMdd') and  deviceid not in ( select distinct(t.deviceid) from ext_startup_logs t where concat(t.ym,t.day)>=formattime(getweekbegin(-4),'yyyyMMdd'))
 
 5.近期流失用戶
 最近2、3、4都沒有啟動過app.
 查詢所有用戶訪問的時間的max,max不能落在
 //四周內流失
 select
 distinct(deviceid)
 from ext_startup_logs
 where appid='#'
 and concat(ym,day) >= formattime(getweekbegin(-4),'yyyyMMdd')
 and concat(ym,day) < formattime(getweekbegin(-3),'yyyyMMdd')
 and deviceid not in (
 select
 distinct(t.deviceid)
 from ext_startup_logs t
 where t.appid=''
 and concat(t.ym,t.day) >= formattime(getweekbegin(-3),'yyyyMMdd')
 
 )
 union
 //三周內流失
 select
 distinct(deviceid)
 from ext_startup_logs
 where appid='#'
 and concat(ym,day) >= formattime(getweekbegin(-3),'yyyyMMdd')
 and concat(ym,day) < formattime(getweekbegin(-2),'yyyyMMdd')
 and deviceid not in (
 select
 distinct(t.deviceid)
 from ext_startup_logs t
 where t.appid=''
 and concat(t.ym,t.day) >= formattime(getweekbegin(-2),'yyyyMMdd')
 
 )
 union
 //兩周內流失
 select
 distinct(deviceid)
 from ext_startup_logs
 where appid='#'
 and concat(ym,day) >= formattime(getweekbegin(-2),'yyyyMMdd')
 and concat(ym,day) < formattime(getweekbegin(-1),'yyyyMMdd')
 and deviceid not in (
 select
 distinct(t.deviceid)
 from ext_startup_logs t
 where t.appid=''
 and concat(t.ym,t.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
 )
 
 
 
 [留存分析]
 1.留存用戶
 周留存用戶。上周新增的用戶在本周還使用的
 select
 distinct(a.deviceid)
 from ext_startup_logs a
 where a.appid = 'sdk34734'
 and concat(a.ym,a.day) >= formattime(getweekbegin(-1),'yyyyMMdd')
 and concat(a.ym,a.day) < formattime(getweekbegin(),'yyyyMMdd')
 and a.deviceid in (
 select distinct(t.deviceid)
 from (
 select tt.deviceid , min(tt.createdatms) mintime
 from ext_startup_logs tt
 where tt.appid = 'sdk34734'
 group by tt.deviceid having mintime >= getweekbegin(-2) and mintime < getweekbegin(-1)
 ) t
 )
 
 
 
 
 2.用戶的新鮮度
 新鮮度 = 某段時間的新增用戶數/某段時間的活躍的老用戶數 .
 //今天活躍用戶
 
 m = select count(distinct(t.deviceid))
 from ext_startup_logs where concat(ym,day) = formattime(getdaybegin(),'yyyyMMdd')  and appid = ... ;
 //今天新增用戶
 n = select count(distinct(t.deviceid))
 from (
 select tt.deviceid , min(tt.createdatms) mintime
 from ext_startup_logs tt
 where tt.appid = 'sdk34734'
 group by tt.deviceid having mintime >= getdaybegin(0)
 ) t

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM