Kylin| 常見問題


 

詳細使用參考官方文檔:

          http://kylin.apache.org/cn/docs/

Kylin的構建過程

sql如下

SELECT
`APP_POINT`.`MID` as `APP_POINT_MID`
,`APP_POINT`.`MEMBER_LEVEL` as `APP_POINT_MEMBER_LEVEL`
,`APP_POINT`.`MARKET_ID` as `APP_POINT_MARKET_ID`
,`APP_POINT`.`SUMMARY_TIME` as `APP_POINT_SUMMARY_TIME`
,`APP_USER_BASE_INFO`.`GENDER` as `APP_USER_BASE_INFO_GENDER`
,`APP_USER_BASE_INFO`.`BIRTH_TIME` as `APP_USER_BASE_INFO_BIRTH_TIME`
,`APP_USER_BASE_INFO`.`IS_HAVE_CHILD` as `APP_USER_BASE_INFO_IS_HAVE_CHILD`
,`APP_USER_BASE_INFO`.`MEMBER_TYPE` as `APP_USER_BASE_INFO_MEMBER_TYPE`
,`APP_USER_BASE_INFO`.`IS_NEW_MEMBER` as `APP_USER_BASE_INFO_IS_NEW_MEMBER`
,`APP_USER_BASE_INFO`.`IS_HAVE_CAR` as `APP_USER_BASE_INFO_IS_HAVE_CAR`
,`APP_USER_BASE_INFO`.`VIP_LEVEL` as `APP_USER_BASE_INFO_VIP_LEVEL`
,`APP_USER_BASE_INFO`.`MID` as `APP_USER_BASE_INFO_MID`
,`APP_POINT`.`POINT` as `APP_POINT_POINT`
 FROM `HST_APP`.`APP_POINT` as `APP_POINT`
INNER JOIN `HST_APP`.`APP_USER_BASE_INFO` as `APP_USER_BASE_INFO`
ON `APP_POINT`.`MID` = `APP_USER_BASE_INFO`.`MID`
WHERE 1=1
View Code

kylin.log日志,構建所創建的表及索引

USE default; DROP TABLE IF EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306; CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306 ( \`APP_POINT_MID\` bigint ,\`APP_POINT_MEMBER_LEVEL\` bigint ,\`APP_POINT_MARKET_ID\` bigint ,\`APP_POINT_SUMMARY_TIME\` string ,\`APP_USER_BASE_INFO_GENDER\` string ,\`APP_USER_BASE_INFO_BIRTH_TIME\` string ,\`APP_USER_BASE_INFO_IS_HAVE_CHILD\` string ,\`APP_USER_BASE_INFO_MEMBER_TYPE\` string ,\`APP_USER_BASE_INFO_IS_NEW_MEMBER\` string ,\`APP_USER_BASE_INFO_IS_HAVE_CAR\` string ,\`APP_USER_BASE_INFO_VIP_LEVEL\` string ,\`APP_POINT_POINT\` bigint ) STORED AS SEQUENCEFILE LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-8a15ba0f-54d5-699e-9812-bab76649c511/kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306'; ALTER TABLE kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306 SET TBLPROPERTIES('auto.purge'='true'); INSERT OVERWRITE TABLE \`kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306\` SELECT \`APP_POINT\`.\`MID\` as \`APP_POINT_MID\` ,\`APP_POINT\`.\`MEMBER_LEVEL\` as \`APP_POINT_MEMBER_LEVEL\` ,\`APP_POINT\`.\`MARKET_ID\` as \`APP_POINT_MARKET_ID\` ,\`APP_POINT\`.\`SUMMARY_TIME\` as \`APP_POINT_SUMMARY_TIME\` ,\`APP_USER_BASE_INFO\`.\`GENDER\` as \`APP_USER_BASE_INFO_GENDER\` ,\`APP_USER_BASE_INFO\`.\`BIRTH_TIME\` as \`APP_USER_BASE_INFO_BIRTH_TIME\` ,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CHILD\` as \`APP_USER_BASE_INFO_IS_HAVE_CHILD\` ,\`APP_USER_BASE_INFO\`.\`MEMBER_TYPE\` as \`APP_USER_BASE_INFO_MEMBER_TYPE\` ,\`APP_USER_BASE_INFO\`.\`IS_NEW_MEMBER\` as \`APP_USER_BASE_INFO_IS_NEW_MEMBER\` ,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CAR\` as \`APP_USER_BASE_INFO_IS_HAVE_CAR\` ,\`APP_USER_BASE_INFO\`.\`VIP_LEVEL\` as \`APP_USER_BASE_INFO_VIP_LEVEL\` ,\`APP_POINT\`.\`POINT\` as \`APP_POINT_POINT\` FROM \`HST_APP\`.\`APP_POINT\` as \`APP_POINT\` INNER JOIN \`HST_APP\`.\`APP_USER_BASE_INFO\` as \`APP_USER_BASE_INFO\` ON \`APP_POINT\`.\`MID\` = \`APP_USER_BASE_INFO\`.\`MID\` WHERE 1=1 AND (\`APP_POINT\`.\`SUMMARY_TIME\` >= '2020-12-01' AND \`APP_POINT\`.\`SUMMARY_TIME\` < '2021-02-01') ; " --hiveconf hive.auto.convert.join=true  --hiveconf dfs.replication=2  --hiveconf hive.auto.convert.join.noconditionaltask=true  --hiveconf hive.merge.mapfiles=false  --hiveconf mapreduce.reduce.java.opts=-Xms6g  --hiveconf mapreduce.map.memory.mb=8192  --hiveconf hive.merge.mapredfiles=false  --hiveconf mapreduce.reduce.input.buffer.percent=0.5  --hiveconf hive.exec.compress.output=true  --hiveconf mapreduce.reduce.memory.mb=16384  --hiveconf mapreduce.job.split.metainfo.maxsize=-1  --hiveconf mapreduce.map.java.opts=-Xms3g  --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000  --hiveconf hive.stats.autogather=true
View Code

 

1. 常見問題

兼容性問題

apache-kylin-3.0.2、 apache-hive-3.1.2

 

 它去讀取Hive的原數據,發現沒有CATALOG_NAME:

 

 解決方法:1)修改源碼 或者

Hive訪問元數據必須采用metastore的方式,即保證以下兩點:

① 保證hive-site.xml文件有以下參數: 否則它就會去獲取myslq中的元數據。

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://hadoop101:9083</value>
</property>

②啟動hive metastore服務

hive --service metastore

 

 

 Skip snapshot for this lookup table.

(Some  lookup table is too big eg. > 300MB  for snapshot,thus must be marked as limited.  Limited  lookup table cannot be queried  directly and does not support derived  dimensions. )

如果維度表超過了300M,沒有勾選Skip snapshot,在cube構建時就會報錯:

 

 

 tail -f  kylin.log  #監控kylin.log日志

 

 以上錯誤是兩個Segments overlap 構建的時間維度有重合了。

解決方法:

進行refresh

 

 或者 將狀態 從READY 變成 DISABLE 后進行清除Purge 重新構建 

    

 頁面中的Storage記錄了 每次Segment的記錄  

  

2. BitMap的使用

http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html

Count distinct(bitmap) 度量對於許多場景來說都非常重要, 比如統計點擊量, kylin從1.5.3版本開始支持精確去重.

Apache Kylin 實現了基於bitmap的精確去重, 並且使用全局字典將字符串類型編碼為整數類型。當前的全局字典是單線程構建的,對於高基列可能會占用大量的時間和內存。

Kylin v3.0.0 引入了第一版的 Hive global dictionary(KYLIN-3841). 這個功能使用Hive的分布式SQL引擎來構建全局字典。

為了進一步提升性能, kylin v3.1.0 引入了第二版的Hive global dictionary v2(KYLIN-4342), 這個版本在某些步驟使用MapReduce代替HQL進行全局字典的構建。

收益

1.使用分布式的方式來構建全局字典,節省時間。

2.Kylin集群中的Job Server可以做更少的工作, 因此會更加穩定。

3.OneID, Hive Global Dictionary在kylin之外仍然具有可讀性,因此每個人都可以在公司其他場景中重用這個字典。

 

 

 

 cubes的sql如下:

SELECT
`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MARKET_ID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_KEY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_TYPE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_VALUE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_WEIGHT` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT`
,`APP_POINT_USE_WAY_TEST`.`POINT_COUNT` as `APP_POINT_USE_WAY_TEST_POINT_COUNT`
,`APP_POINT_USE_WAY_TEST`.`CHANNEL` as `APP_POINT_USE_WAY_TEST_CHANNEL`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID`
 FROM `HST_APP`.`APP_PROFILE_ACTION_TAGS_PART_CATEGORY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`
LEFT JOIN `HST_APP`.`APP_POINT_USE_WAY_TEST` as `APP_POINT_USE_WAY_TEST`
ON `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` = `APP_POINT_USE_WAY_TEST`.`MID`
WHERE 1=1
View Code

Monitor的jobs工作過程如下:

USE default;

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0
(
\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\` bigint
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\` decimal(32,10)
,\`APP_POINT_USE_WAY_TEST_POINT_COUNT\` bigint
,\`APP_POINT_USE_WAY_TEST_CHANNEL\` bigint
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\` string
)
STORED AS SEQUENCEFILE
LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-0c6c73f4-9871-1f04-ccc2-2d276126b7cd/kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0';

ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 SET TBLPROPERTIES('auto.purge'='true');
INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT
\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MARKET_ID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_KEY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_TYPE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_VALUE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_WEIGHT\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\`
,\`APP_POINT_USE_WAY_TEST\`.\`POINT_COUNT\` as \`APP_POINT_USE_WAY_TEST_POINT_COUNT\`
,\`APP_POINT_USE_WAY_TEST\`.\`CHANNEL\` as \`APP_POINT_USE_WAY_TEST_CHANNEL\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\`
 FROM \`HST_APP\`.\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`
LEFT JOIN \`HST_APP\`.\`APP_POINT_USE_WAY_TEST\` as \`APP_POINT_USE_WAY_TEST\`
ON \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` = \`APP_POINT_USE_WAY_TEST\`.\`MID\`
WHERE 1=1;

" --hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 --
hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.exec.compress.output=false;
set hive.mapred.mode=unstrict;
CREATE TABLE IF NOT EXISTS default.MidBitMap_global_dict
 ( dict_key STRING COMMENT '', 
   dict_val INT COMMENT '' 
) 
COMMENT 'Hive Global Dictionary' 
PARTITIONED BY (dict_column string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE; 

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value; 
CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value 
( 
   dict_key STRING COMMENT '' 
) 
COMMENT '' 
PARTITIONED BY (dict_column string) 
STORED AS TEXTFILE 
;

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict; 
CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict 
( 
   dict_key STRING COMMENT '' , 
  dict_val STRING COMMENT '' 
) 
COMMENT '' 
PARTITIONED BY (dict_column string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE 
;

INSERT OVERWRITE TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value 
PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') 
SELECT 
    a.DICT_KEY FROM 
(SELECT 
APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID as DICT_KEY 
  FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0
  GROUP BY APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID) a 
LEFT JOIN 
  (SELECT DICT_KEY FROM default.MidBitMap_global_dict    WHERE DICT_COLUMN = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ) b 
ON a.DICT_KEY = b.DICT_KEY 
WHERE b.DICT_KEY IS NULL 
;

INSERT OVERWRITE TABLE  kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value PARTITION (DICT_COLUMN = 'KYLIN_MAX_DISTINCT_COUNT') 
SELECT 
  CONCAT_WS(',', tc.dict_column, 
  cast(tc.total_distinct_val AS String), 
  if(tm.max_dict_val is null, '0', cast(max_dict_val as string))) 
FROM (
    SELECT dict_column, count(1) total_distinct_val
    FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value
    WHERE DICT_COLUMN != 'KYLIN_MAX_DISTINCT_COUNT'
    GROUP BY dict_column) tc 
LEFT JOIN (

    SELECT dict_column, if(max(dict_val) is null, 0, max(dict_val)) as max_dict_val 
    FROM default.MidBitMap_global_dict
    GROUP BY dict_column) tm 
ON tc.dict_column = tm.dict_column;
" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true



USE default;
set mapreduce.job.reduces=25;
set hive.merge.mapredfiles=false;
INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT * FROM \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` DISTRIBUTE BY APP_PROFILE_ACTION_TAGS_
PART_CATEGORY_MARKET_ID,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE;

" --hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.mapred.mode=unstrict;
ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict ADD IF NOT EXISTS PARTITION (dict_column='APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID'); 

INSERT OVERWRITE TABLE default.MidBitMap_global_dict PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') 
SELECT 
    dict_key, dict_val 
FROM default.MidBitMap_global_dict 
WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' 
UNION ALL 
SELECT 
   dict_key, dict_val 
FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict 
 WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ;

" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.exec.compress.output=false; set hive.mapred.mode=unstrict;
INSERT OVERWRITE TABLE default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 
SELECT 
a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT 
,a.APP_POINT_USE_WAY_TEST_POINT_COUNT 
,a.APP_POINT_USE_WAY_TEST_CHANNEL 
,b.dict_val 
FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 a 
LEFT OUTER JOIN 
 (SELECT dict_key, dict_val FROM default.MidBitMap_global_dict WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') b 
ON a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID = b.dict_key;
" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true
View Code

 

Kylin常用函數:

intersect_value 和 intersect_count

select market_id, intersect_value(mid,tag_value,array['女']) as arr_mid_bitmap from APP_PROFILE_ACTION_TAGS_PART_CATEGORY where tag_type = 'gender' and tag_value = '' group by market_id --- 132  [119787,119798,119823] 164  [136555,140549,141655,145728,145729,145796,14580,1639794] 213  [632938,635520,644039] 150  [588915,1328378] select market_id, intersect_count(mid,tag_value,array['女']) as arr_mid_bitmap from APP_PROFILE_ACTION_TAGS_PART_CATEGORY where tag_type = 'gender' and tag_value = '' group by market_id -- 132 3 164   8
213   3
150   2

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM