詳細使用參考官方文檔:
http://kylin.apache.org/cn/docs/
Kylin的構建過程
sql如下

SELECT `APP_POINT`.`MID` as `APP_POINT_MID` ,`APP_POINT`.`MEMBER_LEVEL` as `APP_POINT_MEMBER_LEVEL` ,`APP_POINT`.`MARKET_ID` as `APP_POINT_MARKET_ID` ,`APP_POINT`.`SUMMARY_TIME` as `APP_POINT_SUMMARY_TIME` ,`APP_USER_BASE_INFO`.`GENDER` as `APP_USER_BASE_INFO_GENDER` ,`APP_USER_BASE_INFO`.`BIRTH_TIME` as `APP_USER_BASE_INFO_BIRTH_TIME` ,`APP_USER_BASE_INFO`.`IS_HAVE_CHILD` as `APP_USER_BASE_INFO_IS_HAVE_CHILD` ,`APP_USER_BASE_INFO`.`MEMBER_TYPE` as `APP_USER_BASE_INFO_MEMBER_TYPE` ,`APP_USER_BASE_INFO`.`IS_NEW_MEMBER` as `APP_USER_BASE_INFO_IS_NEW_MEMBER` ,`APP_USER_BASE_INFO`.`IS_HAVE_CAR` as `APP_USER_BASE_INFO_IS_HAVE_CAR` ,`APP_USER_BASE_INFO`.`VIP_LEVEL` as `APP_USER_BASE_INFO_VIP_LEVEL` ,`APP_USER_BASE_INFO`.`MID` as `APP_USER_BASE_INFO_MID` ,`APP_POINT`.`POINT` as `APP_POINT_POINT` FROM `HST_APP`.`APP_POINT` as `APP_POINT` INNER JOIN `HST_APP`.`APP_USER_BASE_INFO` as `APP_USER_BASE_INFO` ON `APP_POINT`.`MID` = `APP_USER_BASE_INFO`.`MID` WHERE 1=1
kylin.log日志,構建所創建的表及索引

USE default; DROP TABLE IF EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306; CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306 ( \`APP_POINT_MID\` bigint ,\`APP_POINT_MEMBER_LEVEL\` bigint ,\`APP_POINT_MARKET_ID\` bigint ,\`APP_POINT_SUMMARY_TIME\` string ,\`APP_USER_BASE_INFO_GENDER\` string ,\`APP_USER_BASE_INFO_BIRTH_TIME\` string ,\`APP_USER_BASE_INFO_IS_HAVE_CHILD\` string ,\`APP_USER_BASE_INFO_MEMBER_TYPE\` string ,\`APP_USER_BASE_INFO_IS_NEW_MEMBER\` string ,\`APP_USER_BASE_INFO_IS_HAVE_CAR\` string ,\`APP_USER_BASE_INFO_VIP_LEVEL\` string ,\`APP_POINT_POINT\` bigint ) STORED AS SEQUENCEFILE LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-8a15ba0f-54d5-699e-9812-bab76649c511/kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306'; ALTER TABLE kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306 SET TBLPROPERTIES('auto.purge'='true'); INSERT OVERWRITE TABLE \`kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306\` SELECT \`APP_POINT\`.\`MID\` as \`APP_POINT_MID\` ,\`APP_POINT\`.\`MEMBER_LEVEL\` as \`APP_POINT_MEMBER_LEVEL\` ,\`APP_POINT\`.\`MARKET_ID\` as \`APP_POINT_MARKET_ID\` ,\`APP_POINT\`.\`SUMMARY_TIME\` as \`APP_POINT_SUMMARY_TIME\` ,\`APP_USER_BASE_INFO\`.\`GENDER\` as \`APP_USER_BASE_INFO_GENDER\` ,\`APP_USER_BASE_INFO\`.\`BIRTH_TIME\` as \`APP_USER_BASE_INFO_BIRTH_TIME\` ,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CHILD\` as \`APP_USER_BASE_INFO_IS_HAVE_CHILD\` ,\`APP_USER_BASE_INFO\`.\`MEMBER_TYPE\` as \`APP_USER_BASE_INFO_MEMBER_TYPE\` ,\`APP_USER_BASE_INFO\`.\`IS_NEW_MEMBER\` as \`APP_USER_BASE_INFO_IS_NEW_MEMBER\` ,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CAR\` as \`APP_USER_BASE_INFO_IS_HAVE_CAR\` ,\`APP_USER_BASE_INFO\`.\`VIP_LEVEL\` as \`APP_USER_BASE_INFO_VIP_LEVEL\` ,\`APP_POINT\`.\`POINT\` as \`APP_POINT_POINT\` FROM \`HST_APP\`.\`APP_POINT\` as \`APP_POINT\` INNER JOIN \`HST_APP\`.\`APP_USER_BASE_INFO\` as \`APP_USER_BASE_INFO\` ON \`APP_POINT\`.\`MID\` = \`APP_USER_BASE_INFO\`.\`MID\` WHERE 1=1 AND (\`APP_POINT\`.\`SUMMARY_TIME\` >= '2020-12-01' AND \`APP_POINT\`.\`SUMMARY_TIME\` < '2021-02-01') ; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true
1. 常見問題
兼容性問題
apache-kylin-3.0.2、 apache-hive-3.1.2
它去讀取Hive的原數據,發現沒有CATALOG_NAME:
解決方法:1)修改源碼 或者
Hive訪問元數據必須采用metastore的方式,即保證以下兩點:
① 保證hive-site.xml文件有以下參數: 否則它就會去獲取myslq中的元數據。
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop101:9083</value>
</property>
②啟動hive metastore服務
hive --service metastore
Skip snapshot for this lookup table.
(Some lookup table is too big eg. > 300MB for snapshot,thus must be marked as limited. Limited lookup table cannot be queried directly and does not support derived dimensions. )
如果維度表超過了300M,沒有勾選Skip snapshot,在cube構建時就會報錯:
tail -f kylin.log #監控kylin.log日志
以上錯誤是兩個Segments overlap 構建的時間維度有重合了。
解決方法:
進行refresh
或者 將狀態 從READY 變成 DISABLE 后進行清除Purge 重新構建
頁面中的Storage記錄了 每次Segment的記錄
2. BitMap的使用
http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html
Count distinct(bitmap) 度量對於許多場景來說都非常重要, 比如統計點擊量, kylin從1.5.3版本開始支持精確去重.
Apache Kylin 實現了基於bitmap的精確去重, 並且使用全局字典將字符串類型編碼為整數類型。當前的全局字典是單線程構建的,對於高基列可能會占用大量的時間和內存。
Kylin v3.0.0 引入了第一版的 Hive global dictionary(KYLIN-3841). 這個功能使用Hive的分布式SQL引擎來構建全局字典。
為了進一步提升性能, kylin v3.1.0 引入了第二版的Hive global dictionary v2(KYLIN-4342), 這個版本在某些步驟使用MapReduce代替HQL進行全局字典的構建。
收益
1.使用分布式的方式來構建全局字典,節省時間。
2.Kylin集群中的Job Server可以做更少的工作, 因此會更加穩定。
3.OneID, Hive Global Dictionary在kylin之外仍然具有可讀性,因此每個人都可以在公司其他場景中重用這個字典。
cubes的sql如下:

SELECT `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MARKET_ID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID` ,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_KEY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY` ,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_TYPE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE` ,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_VALUE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE` ,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_WEIGHT` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT` ,`APP_POINT_USE_WAY_TEST`.`POINT_COUNT` as `APP_POINT_USE_WAY_TEST_POINT_COUNT` ,`APP_POINT_USE_WAY_TEST`.`CHANNEL` as `APP_POINT_USE_WAY_TEST_CHANNEL` ,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID` FROM `HST_APP`.`APP_PROFILE_ACTION_TAGS_PART_CATEGORY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY` LEFT JOIN `HST_APP`.`APP_POINT_USE_WAY_TEST` as `APP_POINT_USE_WAY_TEST` ON `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` = `APP_POINT_USE_WAY_TEST`.`MID` WHERE 1=1
Monitor的jobs工作過程如下:

USE default; DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0; CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 ( \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\` bigint ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\` string ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\` string ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\` string ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\` decimal(32,10) ,\`APP_POINT_USE_WAY_TEST_POINT_COUNT\` bigint ,\`APP_POINT_USE_WAY_TEST_CHANNEL\` bigint ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\` string ) STORED AS SEQUENCEFILE LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-0c6c73f4-9871-1f04-ccc2-2d276126b7cd/kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0'; ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 SET TBLPROPERTIES('auto.purge'='true'); INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MARKET_ID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\` ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_KEY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\` ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_TYPE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\` ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_VALUE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\` ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_WEIGHT\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\` ,\`APP_POINT_USE_WAY_TEST\`.\`POINT_COUNT\` as \`APP_POINT_USE_WAY_TEST_POINT_COUNT\` ,\`APP_POINT_USE_WAY_TEST\`.\`CHANNEL\` as \`APP_POINT_USE_WAY_TEST_CHANNEL\` ,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\` FROM \`HST_APP\`.\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\` LEFT JOIN \`HST_APP\`.\`APP_POINT_USE_WAY_TEST\` as \`APP_POINT_USE_WAY_TEST\` ON \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` = \`APP_POINT_USE_WAY_TEST\`.\`MID\` WHERE 1=1; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 -- hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true USE default; set hive.exec.compress.output=false; set hive.mapred.mode=unstrict; CREATE TABLE IF NOT EXISTS default.MidBitMap_global_dict ( dict_key STRING COMMENT '', dict_val INT COMMENT '' ) COMMENT 'Hive Global Dictionary' PARTITIONED BY (dict_column string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value; CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value ( dict_key STRING COMMENT '' ) COMMENT '' PARTITIONED BY (dict_column string) STORED AS TEXTFILE ; DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict; CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict ( dict_key STRING COMMENT '' , dict_val STRING COMMENT '' ) COMMENT '' PARTITIONED BY (dict_column string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; INSERT OVERWRITE TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') SELECT a.DICT_KEY FROM (SELECT APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID as DICT_KEY FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 GROUP BY APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID) a LEFT JOIN (SELECT DICT_KEY FROM default.MidBitMap_global_dict WHERE DICT_COLUMN = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ) b ON a.DICT_KEY = b.DICT_KEY WHERE b.DICT_KEY IS NULL ; INSERT OVERWRITE TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value PARTITION (DICT_COLUMN = 'KYLIN_MAX_DISTINCT_COUNT') SELECT CONCAT_WS(',', tc.dict_column, cast(tc.total_distinct_val AS String), if(tm.max_dict_val is null, '0', cast(max_dict_val as string))) FROM ( SELECT dict_column, count(1) total_distinct_val FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value WHERE DICT_COLUMN != 'KYLIN_MAX_DISTINCT_COUNT' GROUP BY dict_column) tc LEFT JOIN ( SELECT dict_column, if(max(dict_val) is null, 0, max(dict_val)) as max_dict_val FROM default.MidBitMap_global_dict GROUP BY dict_column) tm ON tc.dict_column = tm.dict_column; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true USE default; set mapreduce.job.reduces=25; set hive.merge.mapredfiles=false; INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT * FROM \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` DISTRIBUTE BY APP_PROFILE_ACTION_TAGS_ PART_CATEGORY_MARKET_ID,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true USE default; set hive.mapred.mode=unstrict; ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict ADD IF NOT EXISTS PARTITION (dict_column='APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID'); INSERT OVERWRITE TABLE default.MidBitMap_global_dict PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') SELECT dict_key, dict_val FROM default.MidBitMap_global_dict WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' UNION ALL SELECT dict_key, dict_val FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true USE default; set hive.exec.compress.output=false; set hive.mapred.mode=unstrict; INSERT OVERWRITE TABLE default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 SELECT a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID ,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY ,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE ,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE ,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT ,a.APP_POINT_USE_WAY_TEST_POINT_COUNT ,a.APP_POINT_USE_WAY_TEST_CHANNEL ,b.dict_val FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 a LEFT OUTER JOIN (SELECT dict_key, dict_val FROM default.MidBitMap_global_dict WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') b ON a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID = b.dict_key; " --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf hive.merge.mapfiles=false --hiveconf mapreduce.reduce.java.opts=-Xms6g --hiveconf mapreduce.map.memory.mb=8192 --hiveconf hive.merge.mapredfiles=false --hiveconf mapreduce.reduce.input.buffer.percent=0.5 --hiveconf hive.exec.compress.output=true --hiveconf mapreduce.reduce.memory.mb=16384 --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf mapreduce.map.java.opts=-Xms3g --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true
Kylin常用函數:
intersect_value 和 intersect_count
select market_id, intersect_value(mid,tag_value,array['女']) as arr_mid_bitmap from APP_PROFILE_ACTION_TAGS_PART_CATEGORY where tag_type = 'gender' and tag_value = '女' group by market_id --- 132 [119787,119798,119823] 164 [136555,140549,141655,145728,145729,145796,14580,1639794] 213 [632938,635520,644039] 150 [588915,1328378] select market_id, intersect_count(mid,tag_value,array['女']) as arr_mid_bitmap from APP_PROFILE_ACTION_TAGS_PART_CATEGORY where tag_type = 'gender' and tag_value = '女' group by market_id -- 132 3 164 8
213 3
150 2