大數據-企業異常發票分析

本文轉載自查看原文 2021-12-06 23:06 93 每日日報

一、 數據說明：

1、數據組成

（1）增值稅發票數據，文件名zzsfp

（2）發票對應貨物明細數據，文件名zzsfp_hwmx

（3）企業信息，文件名nsrxx

2、數據字段說明

（1）zzsfp表字典

字段名稱	字段含義	數據類型	備注
fp_nid	發票id	String	發票唯一標識
xf_id	銷方識別號	String	企業唯一身份標識
gf_id	購方識別號	String	企業唯一身份標識
je	金額	Double
se	稅額	Double
jshj	價稅合計	Double
kpyf	開票月份	String
kprq	開票日期	String
zfbz	作廢標志	String	‘Y’代表作廢

2381619

2483868

zzsfp表內容（$ less zzsfp）

（2）zzsfp_hwmx表

字段名稱	字段含義	數據類型	備注
fp_nid	發票id	String	發票唯一標識
date_key	開票月份	String
hwmc	貨物名稱	String
ggxh	規格型號	String
dw	單位	String
sl	數量	Double
dj	單價	Double
je	金額	Double
se	稅額	Double
spbm	商品編碼	String

zzsfp_hwmx表內容（$ less zzsfp_hwmx）

（3）nsrxx表

字段名稱	字段含義	數據類型	備注
hydm	行業代碼	String
nsr_id	納稅人id	String	企業唯一身份標識
djzclx_dm	登記注冊類型代碼	String	網上可查閱相關代碼含義
kydjrq	開業登記日期	String
xgrq	修改日期	String	給企業打標簽的時間
label	標簽	String	‘0’代表正常企業 ‘1’代表問題企業

nsrxx表內容（$ less nsrxx）

3、關聯數據的必要說明

（1）zzsfp表可通過fp_nid進行關聯

（2）zzsfp表可通過xf_id或者gf_id與nsrxx中的nsr_id進行關聯，分離出銷項發票表和進項發票表

二、 測試要求：

1、數據導入：

要求將三個樣表文件中的數據導入HIVE數據倉庫中。

2、數據分析：

企業異常的判斷標准參考：

（1）、企業增值稅發票進項與出項嚴重不符即出現只出不進或者只進不出的企業；

（2）企業發票數據與詳細流水信息不符；

（3）個人上網查閱企業異常信息數據標准；

3、處理結果入庫：

將上述異常標准的結果分別匯總統計，並將結果數據導出到mySQL數據庫中。

最終結果參考提示：

最終給出的數據情況

企業總數：33,829

非正常企業總數：318

4、數據可視化展示：

利用Echarts將上述統計結果以圖形化展示的方式展現出來：餅圖、柱狀圖、地圖、折線圖等。

三、 測試報告：

1、按照測試題目順序，將實驗步驟說明和結果截圖存儲到答題紙上。

創建表

create table tab051(

fp_nid string,

xf_id string,

gf_id string,

je double,

se double,

jshj double,

kpyf string,

kprq string,

zfbz string

)

row format delimited fields terminated by ','

lines terminated by '\n';

create table tab052(

fp_nid string,

date_key string,

hwmc string,

ggxh string,

dw string,

sl double,

dj double,

je double,

se double,

spbm string

)

row format delimited fields terminated by ','

lines terminated by '\n';

create table tab053(

hydm string,

nsr_id string,

djzclx_dm string,

kydjrq string,

xgrq string,

label string

)

row format delimited fields terminated by ','

lines terminated by '\n';

導入數據

再創三個表用於存放清洗括號后的數據

create table tab51(

fp_nid string,

xf_id string,

gf_id string,

je double,

se double,

jshj double,

kpyf string,

kprq string,

zfbz string

)

row format delimited fields terminated by ','

lines terminated by '\n';

create table tab52(

fp_nid string,

date_key string,

hwmc string,

ggxh string,

dw string,

sl double,

dj double,

je double,

se double,

spbm string

)

row format delimited fields terminated by ','

lines terminated by '\n';

create table tab53(

hydm string,

nsr_id string,

djzclx_dm string,

kydjrq string,

xgrq string,

label string

)

row format delimited fields terminated by ','

lines terminated by '\n';

存放數據

insert overwrite table tab51 select

translate(fp_nid,'(',''),

xf_id,

gf_id,

je,

se,

jshj,

kpyf,

kprq,

translate(zfbz, ')', '')

from tab051;

insert overwrite table tab52 select

translate(fp_nid,'(',''),

date_key,

hwmc,

ggxh,

dw,

sl,

dj,

je,

se,

translate(spbm, ')', '')

from tab052;

insert overwrite table tab53 select

translate(hydm,'(',''),

nsr_id,

djzclx_dm,

kydjrq,

xgrq,

translate(label, ')', '')

from tab053;

1.將沒有進行過交易的企業列入問題企業

nsert into table noxf select hydm from tab53 where not exists (select xf_id from tab51 where tab53.hydm=tab51.xf_id);

insert into table noxf select hydm from tab53 where not exists (select gf_id from tab51 where tab53.hydm=tab51.gf_id);

select distinct id from noxf;

insert into noxfandgf select distinct id from noxf;

有一萬多家。

2．查出企業開出的支票作廢次數，放到zf表。

select xf_id, gf_id ,count(zfbz) from tab51 where zfbz = 'Y' group by xf_id,gf_id;

Insert into table zf(xf_id,gf_id,zfcs) (select xf_id, gf_id ,count(zfbz) from tab51 where zfbz = 'Y' group by xf_id,gf_id);

SELECT COUNT(*) FROM zf WHERE zfcs>17;查詢作廢次數大於中位數17的

有684家。

放入wt表。

create table wt2(id string);

insert overwrite table wt(SELECT xf_id FROM zf WHERE zfcs>17);

insert into table wt(SELECT gf_id FROM zf WHERE zfcs>17);

3.查出企業支票金額相差幅度太大的企業

select xf_id from tab51 join tab52 on tab51.fp_nid=tab52.fp_nid where tab51.je-tab52.je>10000000 or tab52.je-tab51.je>10000000;

然后按次數查出開假支票次數多的企業。

select id,count(id) from jewt group by id;

select id from jewt3 cs where cs>1;

合並三個標准。

insert into wt12 select wt.id from wt join wt2 where wt.id=wt2.id;

insert into wt13 select wt.id from wt join wt3 where wt.id=wt3.id;

insert into wt23 select wt2.id from wt2 join wt3 where wt2.id=wt3.id;

最后查出385家

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 企業構建大數據分析體系的4個層級大數據案例分析國內大數據企業排名【轉】使用Apache Kylin搭建企業級開源大數據分析平台使用Kylin構建企業大數據分析平台的4種部署方式價值百萬的企業大數據分析報告是如何煉成的？大數據分析與挖掘關於“華為”的大數據分析大數據分析案例 python 大數據分析