TPC-DS數據壓測


前言

TPC-DS是一套決策支持系統測試基准,主要針對零售行業。提供99個SQL查詢(SQL99或2003),分析數據量大,測試數據與實際商業數據高度相似,同時具有各種業務模型(分析報告型,數據挖掘型等等)。

使用說明

1、下載工具及安裝

注意:必須輸入郵箱,他會發下載地址到郵箱中,點擊下載地址即可下載,以上2個地址任選一即可。

2、編譯

進入TPC-DS工具包所在目錄,由於下載的是源碼,需要編譯后才能使用。進入/TPC-DS/tools文件夾中

  • Ubuntu:
    sudo apt-get install gcc make flex bison byacc git
  • CentOS/RHEL:
    sudo yum install gcc make flex bison byacc git
  • macOS:
    make OS=MACOS

3、生成數據

在tools目錄下使用./dsdgen生成數據, 執行命令參數如下:

DIR:數據存放目錄。
SCALE:數據量,以GB為單位。
TABLE:生成哪張表的數據,一共有24張表。
PARALLEL:生成的數據一共分為多少份,一般生成TB級數據才會用到。
CHILD:當前數據是第幾份,與PARALLEL配對使用。
FORCE:強制寫入數據。

示例1: 生成1G數據,存放在/TPC-DS/data文件夾下
./dsdgen -scale 1 -dir ../data/
示例2: 生成1TB數據,存放在/TPC-DS/data文件夾下
./dsdgen -scale 1000 -dir ../data/
示例3: 生成30TB數據,存放在/TPC-DS/data文件夾下
./dsdgen -scale 30000 -dir ../data/
示例4: 指定數據表名,生成1G數據,存放在/TPC-DS/data文件夾下
./dsdgen -SCALE 1 -DISTRIBUTIONS tpcds.idx -TERMINATE N -TABLE web_sales -dir ../data/
示例5: 分塊生成1G數據,存放在/TPC-DS/data文件夾下,效率更高
./dsdgen -scale 1 -dir ../data/ -parallel 4 -child 1

4、建表語句

/tpcds-kit/tools目錄下,tpcds.sql、tpcds_ri.sql
很多數據平台可能不能直接使用,需要修改。建表語句的修改主要是依據不同環境支持的數據類型修改和一些基礎語法修正,還需依照生成的數據的分割符在建表時指定分隔符。

表名 header
CALL_CENTER cell
CATALOG_PAGE cell
CATALOG_RETURNS cell
CATALOG_SALES cell
CUSTOMER cell
CUSTOMER_ADDRESS cell
CUSTOMER_DEMOGRAPHICS cell
DATE_DIM cell
DBGEN_VERSION cell
HOUSEHOLD_DEMOGRAPHICS cell
INCOME_BAND cell
INVENTORY cell
ITEM cell
PROMOTION cell
REASON cell
SHIP_MODE cell
STORE cell
STORE_RETURNS cell
STORE_SALES cell
TIME_DIM cell
WAREHOUSE cell
WEB_PAGE cell
WEB_RETURNS cell
WEB_SALES cell
WEB_SITE cell

5、將生成數據導入

copy call_center from '/part2/tpcds/v2.6.0/datas/handled/call_center.dat' with delimiter as '|' NULL '';
copy catalog_page from '/part2/tpcds/v2.6.0/datas/handled/catalog_page.dat' with delimiter as '|' NULL '';
copy catalog_returns from '/part2/tpcds/v2.6.0/datas/handled/catalog_returns.dat' with delimiter as '|' NULL '';
copy catalog_sales from '/part2/tpcds/v2.6.0/datas/handled/catalog_sales.dat' with delimiter as '|' NULL '';
copy customer from '/part2/tpcds/v2.6.0/datas/handled/customer.dat' with delimiter as '|' NULL '';
copy customer_address from '/part2/tpcds/v2.6.0/datas/handled/customer_address.dat' with delimiter as '|' NULL '';
copy customer_demographics from '/part2/tpcds/v2.6.0/datas/handled/customer_demographics.dat' with delimiter as '|' NULL '';
copy date_dim from '/part2/tpcds/v2.6.0/datas/handled/date_dim.dat' with delimiter as '|' NULL '';
copy dbgen_version from '/part2/tpcds/v2.6.0/datas/handled/dbgen_version.dat' with delimiter as '|' NULL '';
copy household_demographics from '/part2/tpcds/v2.6.0/datas/handled/household_demographics.dat' with delimiter as '|' NULL '';
copy income_band from '/part2/tpcds/v2.6.0/datas/handled/income_band.dat' with delimiter as '|' NULL '';
copy inventory from '/part2/tpcds/v2.6.0/datas/handled/inventory.dat' with delimiter as '|' NULL '';
copy item from '/part2/tpcds/v2.6.0/datas/handled/item.dat' with delimiter as '|' NULL '';
copy promotion from '/part2/tpcds/v2.6.0/datas/handled/promotion.dat' with delimiter as '|' NULL '';
copy reason from '/part2/tpcds/v2.6.0/datas/handled/reason.dat' with delimiter as '|' NULL '';
copy ship_mode from '/part2/tpcds/v2.6.0/datas/handled/ship_mode.dat' with delimiter as '|' NULL '';
copy store from '/part2/tpcds/v2.6.0/datas/handled/store.dat' with delimiter as '|' NULL '';
copy store_returns from '/part2/tpcds/v2.6.0/datas/handled/store_returns.dat' with delimiter as '|' NULL '';
copy store_sales from '/part2/tpcds/v2.6.0/datas/handled/store_sales.dat' with delimiter as '|' NULL '';
copy time_dim from '/part2/tpcds/v2.6.0/datas/handled/time_dim.dat' with delimiter as '|' NULL '';
copy warehouse from '/part2/tpcds/v2.6.0/datas/handled/warehouse.dat' with delimiter as '|' NULL '';
copy web_page from '/part2/tpcds/v2.6.0/datas/handled/web_page.dat' with delimiter as '|' NULL '';
copy web_returns from '/part2/tpcds/v2.6.0/datas/handled/web_returns.dat' with delimiter as '|' NULL '';
copy web_sales from '/part2/tpcds/v2.6.0/datas/handled/web_sales.dat' with delimiter as '|' NULL '';
copy web_site from '/part2/tpcds/v2.6.0/datas/handled/web_site.dat' with delimiter as '|' NULL '';

6、生成查詢SQL

進入/tpcds-kit/tools目錄下,常用參數:

-input 輸入,讀取測試用例包含的模板,一般使用/query_templates/templates.lst即可。
-directory 模板所在目錄, 一般使用-directory../query_templates即可。
-dialect 生成某個數據庫的語言,可選項可以查看/query_templates目錄,有oracle、db2、SqlServer等。

執行以下shell命令

for id in `seq 1 99`; do ./dsqgen -DIRECTORY ../query_templates -TEMPLATE "query$id.tpl" -DIALECT oracle -FILTER Y > ./sql/"query$id.sql"; done

示例SQL-1:

with customer_total_return as
(select sr_customer_sk as ctr_customer_sk
,sr_store_sk as ctr_store_sk
,sum(SR_FEE) as ctr_total_return
from store_returns
,date_dim
where sr_returned_date_sk = d_date_sk
and d_year =2000
group by sr_customer_sk
,sr_store_sk)
 select  c_customer_id
from customer_total_return ctr1
,store
,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
from customer_total_return ctr2
where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
and s_store_sk = ctr1.ctr_store_sk
and s_state = 'TN'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM