1.准備表結構和數據
create table test_middle_data.spe_count_test( name string, sex string, is_valid string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; insert into test_middle_data.spe_count_test values('jack01','man','N'); insert into test_middle_data.spe_count_test values('jack02','woman','Y'); insert into test_middle_data.spe_count_test values('jack03','man','Y'); insert into test_middle_data.spe_count_test values('jack04','woman','Y'); insert into test_middle_data.spe_count_test values('jack05','man','Y'); insert into test_middle_data.spe_count_test values('jack06','woman','N'); insert into test_middle_data.spe_count_test values('jack07','man','Y'); insert into test_middle_data.spe_count_test values('jack08','man','Y'); insert into test_middle_data.spe_count_test values('jack09','man','N'); insert into test_middle_data.spe_count_test values('jack10','woman','Y'); insert into test_middle_data.spe_count_test values('jack11','man','Y'); insert into test_middle_data.spe_count_test values('jack12','man','Y'); insert into test_middle_data.spe_count_test values('jack13','woman','N');
2. 需求是根據sex分組,並統計有效的個數和總個數
我發現有人會這樣寫
select a.sex,a.is_valid_y,b.total_num FROM( SELECT sex,count(1) is_valid_y FROM test_middle_data.spe_count_test where is_valid = 'Y' group by sex ) a inner join ( SELECT sex, count(1) total_num FROM test_middle_data.spe_count_test group by sex ) b on a.sex = b.sex
結果:
這種兩張表做關聯查詢影響性能,我們可以優化
3. 使用count的特點優化
count(1)或者count(*)都是計算總行數包括字段為NULL,但是count(字段名)是不會統計字段為NULL的數據,我們利用這個特點完成需求
select sex,count(case when is_valid = 'Y' then is_valid else NULL end) is_valid_y, count(1)total_num FROM test_middle_data.spe_count_test group by sex
結果也和第二步的sql一樣