starRocks安裝測試報告
下載安裝部署
StarRocks的集群部署分為兩種模式,第一種是使用命令部署,第二種是使用 StarRocksManager 自動化部署。自動部署的版本只需要在頁面上簡單進行配置、選擇、輸入后批量完成,並且包含Supervisor進程管理、滾動升級、備份、回滾等功能。因 StarRocksManager並未開源,因此我們只能使用命令部署。
生產環境使用官方推薦配置:BE推薦16核64GB以上(StarRocks的元數據都在內存中保存),FE推薦8核16GB以上
下載地址:
https://www.starrocks.com/zh-CN/download/request-download/4
服務器配置
下載社區版最新版本:StarRocks-1.19.1,因資源限制(使用4台虛機),FE采取單實例部署,BE部署三個實例。
| IP | host | 配置 | 部署 |
|---|---|---|---|
| 192.168.130.178 | fe1 | 4C16G | FE主節點 |
| 192.168.130.183 | be1 | 4C8G | BE節點 |
| 192.168.130.36 | be2 | 4C8G | BE節點 |
| 192.168.130.149 | be3 | 4C8G | BE節點 |
FE單實例部署
# 修改fe配置
cd StarRocks-1.19.1/fe
# 配置文件conf/fe.conf
# 根據FE內存大小調整 -Xmx4096m,為了避免GC建議16G以上,StarRocks的元數據都在內存中保存。
# 創建元數據目錄
mkdir -p meta
# 啟動進程
bin/start_fe.sh --daemon
# 使用瀏覽器訪問8030端口, 打開StarRocks的WebUI, 用戶名為root, 密碼為空
StarRocks UI
http://192.168.130.178:8030/ 打開StarRocks的WebUI,,用戶名為root, 密碼為空。

使用MySQL客戶端訪問FE
第一步: 安裝mysql客戶端(如果已經安裝,可忽略此步):
Ubuntu:sudo apt-get install mysql-client
Centos:sudo yum install mysql-client
wget http://repo.mysql.com/mysql57-community-release-sles12.rpm
rpm -ivh mysql57-community-release-sles12.rpm
# 安裝mysql
yum install mysql-server
# 啟動mysql
service mysqld start
第二步: 使用mysql客戶端連接:
mysql -h 127.0.0.1 -P9030 -uroot
注意:這里默認root用戶密碼為空,端口為fe/conf/fe.conf中的query_port配置項,默認為9030
第三步: 查看FE狀態:
MySQL [(none)]> SHOW PROC '/frontends'\G
*************************** 1. row ***************************
Name: 192.168.130.178_9010_1636811945380
IP: 192.168.130.178
HostName: fe1
EditLogPort: 9010
HttpPort: 8030
QueryPort: 9030
RpcPort: 9020
Role: FOLLOWER
IsMaster: true
ClusterId: 985620692
Join: true
Alive: true
ReplayedJournalId: 43507
LastHeartbeat: 2021-11-15 14:41:48
IsHelper: true
ErrMsg:
1 row in set (0.05 sec)
Role為FOLLOWER說明這是一個能參與選主的FE;IsMaster為true,說明該FE當前為主節點。
如果MySQL客戶端連接不成功,請查看log/fe.warn.log日志文件,確認問題。由於是初次啟動,如果在操作過程中遇到任何意外問題,都可以刪除並重新創建FE的元數據目錄,再從頭開始操作。
BE部署
BE的基本配置
BE的配置文件為StarRocks-1.19.1/be/conf/be.conf,默認配置已經足以啟動集群,不建議初嘗用戶修改配置, 有經驗的用戶可以查看手冊的系統配置章節,為生產環境定制配置。 為了讓用戶更好的理解集群的工作原理, 此處只列出基礎配置。
192.168.130.183
# 修改fe配置
cd StarRocks-1.19.1/be
# 配置文件conf/fe.conf
# 調整BE參數,默認配置已經足以啟動集群,暫不做調整。
# 創建元數據目錄
mkdir -p storage
# 通過mysql客戶端添加BE節點。
# 這里IP地址為和priority_networks設置匹配的IP,portheartbeat_service_port,默認為9050
mysql> ALTER SYSTEM ADD BACKEND "be1:9050";
# 啟動be
bin/start_be.sh --daemon
添加BE節點如出現錯誤,需要刪除BE節點,應用下列命令:
-
alter system decommission backend "be_host:be_heartbeat_service_port"; -
alter system dropp backend "be_host:be_heartbeat_service_port";
# 查看防火牆狀態
[root@be1 be]# systemctl status firewalld
# 如已開啟,需關閉防火牆,防止網絡不通,導致BE和FE無法連接
[root@be1 be]# systemctl stop firewalld
第四步: 查看BE狀態, 確認BE就緒。同樣步驟添加另外兩個BE節點,mysql客戶端中執行
MySQL [(none)]> SHOW PROC '/backends' \G;
*************************** 1. row ***************************
BackendId: 163038
Cluster: default_cluster
IP: 192.168.130.183
HostName: be1
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2021-11-21 13:56:47
LastHeartbeat: 2021-11-21 14:58:35
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 5
DataUsedCapacity: .000
AvailCapacity: 33.451 GB
TotalCapacity: 36.974 GB
UsedPct: 9.53 %
MaxDiskUsedPct: 9.53 %
ErrMsg:
Version: 1.19.1-65e87c3
Status: {"lastSuccessReportTabletsTime":"2021-11-21 14:57:48"}
DataTotalCapacity: 33.451 GB
DataUsedPct: 0.00 %
*************************** 2. row ***************************
BackendId: 163066
Cluster: default_cluster
IP: 192.168.130.36
HostName: be2
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2021-11-21 14:56:34
LastHeartbeat: 2021-11-21 14:58:35
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 5
DataUsedCapacity: .000
AvailCapacity: 33.452 GB
TotalCapacity: 36.974 GB
UsedPct: 9.53 %
MaxDiskUsedPct: 9.53 %
ErrMsg:
Version: 1.19.1-65e87c3
Status: {"lastSuccessReportTabletsTime":"2021-11-21 14:58:35"}
DataTotalCapacity: 33.452 GB
DataUsedPct: 0.00 %
*************************** 3. row ***************************
BackendId: 163072
Cluster: default_cluster
IP: 192.168.130.149
HostName: be3
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2021-11-21 14:58:15
LastHeartbeat: 2021-11-21 14:58:35
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 3
DataUsedCapacity: .000
AvailCapacity: 33.521 GB
TotalCapacity: 36.974 GB
UsedPct: 9.34 %
MaxDiskUsedPct: 9.34 %
ErrMsg:
Version: 1.19.1-65e87c3
Status: {"lastSuccessReportTabletsTime":"2021-11-21 14:58:16"}
DataTotalCapacity: 33.521 GB
DataUsedPct: 0.00 %
如果isAlive為true,則說明BE正常接入集群。如果BE沒有正常接入集群,請查看log目錄下的be.WARNING日志文件確定原因。

至此,安裝完成。
參數設置
- Swappiness
關閉交換區,消除交換內存到虛擬內存時對性能的擾動。
echo 0 | sudo tee /proc/sys/vm/swappiness
- Compaction相關
當使用聚合表或更新模型,導入數據比較快的時候,可在配置文件 be.conf 中修改下列參數以加速compaction。
cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
- 並行度
在客戶端執行命令,修改StarRocks的並行度(類似clickhouse set max_threads= 8)。並行度可以設置為當前機器CPU核數的一半。
set global parallel_fragment_exec_instance_num = 8;
使用MySQL客戶端訪問StarRocks
Root用戶登錄
使用MySQL客戶端連接某一個FE實例的query_port(9030), StarRocks內置root用戶,密碼默認為空:
mysql -h fe_host -P9030 -u root
清理環境:
mysql > drop database if exists example_db;
mysql > drop user test;
創建新用戶
通過下面的命令創建一個普通用戶:
mysql > create user 'test_xxx' identified by 'xxx123456';
創建數據庫
StarRocks中root賬戶才有權建立數據庫,使用root用戶登錄,建立example_db數據庫:
mysql > create database test_xxx_db;
數據庫創建完成之后,可以通過show databases查看數據庫信息:
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| _statistics_ |
| information_schema |
| test_xxx_db |
+--------------------+
3 rows in set (0.00 sec)
information_schema是為了兼容mysql協議而存在,實際中信息可能不是很准確,所以關於具體數據庫的信息建議通過直接查詢相應數據庫而獲得。
賬戶授權
example_db創建完成之后,可以通過root賬戶example_db讀寫權限授權給test賬戶,授權之后采用test賬戶登錄就可以操作example_db數據庫了:
mysql > grant all on test_xxx_db to test_xxx;
退出root賬戶,使用test登錄StarRocks集群:
mysql > exit
mysql -h 127.0.0.1 -P9030 -utest_xxx -pxxx123456
建表
StarRocks支持支持單分區和復合分區兩種建表方式。
在復合分區中:
- 第一級稱為Partition,即分區。用戶可以指定某一維度列作為分區列(當前只支持整型和時間類型的列),並指定每個分區的取值范圍。
- 第二級稱為Distribution,即分桶。用戶可以指定某幾個維度列(或不指定,即所有KEY列)以及桶數對數據進行HASH分布。
以下場景推薦使用復合分區:
- 有時間維度或類似帶有有序值的維度:可以以這類維度列作為分區列。分區粒度可以根據導入頻次、分區數據量等進行評估。
- 歷史數據刪除需求:如有刪除歷史數據的需求(比如僅保留最近N 天的數據)。使用復合分區,可以通過刪除歷史分區來達到目的。也可以通過在指定分區內發送DELETE語句進行數據刪除。
- 解決數據傾斜問題:每個分區可以單獨指定分桶數量。如按天分區,當每天的數據量差異很大時,可以通過指定分區的分桶數,合理划分不同分區的數據,分桶列建議選擇區分度大的列。
用戶也可以不使用復合分區,即使用單分區。則數據只做HASH分布。
下面分別演示兩種分區的建表語句:
- 首先切換數據庫:mysql > use test_xxx_db;
- 建立單分區表建立一個名字為table1的邏輯表。使用全hash分桶,分桶列為siteid,桶數為10。這個表的schema如下:
- siteid:類型是INT(4字節), 默認值為10
- city_code:類型是SMALLINT(2字節)
- username:類型是VARCHAR, 最大長度為32, 默認值為空字符串
- pv:類型是BIGINT(8字節), 默認值是0; 這是一個指標列, StarRocks內部會對指標列做聚合操作, 這個列的聚合方法是求和(SUM)。這里采用了聚合模型,除此之外StarRocks還支持明細模型和更新模型,具體參考數據模型介紹。
建表語句如下:
mysql >
CREATE TABLE table1
(
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
- 建立復合分區表
建立一個名字為table2的邏輯表。這個表的 schema 如下:
- event_day:類型是DATE,無默認值
- siteid:類型是INT(4字節), 默認值為10
- city_code:類型是SMALLINT(2字節)
- username:類型是VARCHAR, 最大長度為32, 默認值為空字符串
- pv:類型是BIGINT(8字節), 默認值是0; 這是一個指標列, StarRocks 內部會對指標列做聚合操作, 這個列的聚合方法是求和(SUM)
我們使用event_day列作為分區列,建立3個分區: p1, p2, p3
- p1:范圍為 [最小值, 2017-06-30)
- p2:范圍為 [2017-06-30, 2017-07-31)
- p3:范圍為 [2017-07-31, 2017-08-31)
每個分區使用siteid進行哈希分桶,桶數為10。
建表語句如下:
CREATE TABLE table2
(
event_day DATE,
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(event_day, siteid, citycode, username)
PARTITION BY RANGE(event_day)
(
PARTITION p1 VALUES LESS THAN ('2017-06-30'),
PARTITION p2 VALUES LESS THAN ('2017-07-31'),
PARTITION p3 VALUES LESS THAN ('2017-08-31')
)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
表建完之后,可以查看example_db中表的信息:
mysql> show tables;
+-------------------------+
| Tables_in_example_db |
+-------------------------+
| table1 |
| table2 |
+-------------------------+
2 rows in set (0.01 sec)
<br/>
mysql> desc table1;
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid | int(11) | Yes | true | 10 | |
| citycode | smallint(6) | Yes | true | N/A | |
| username | varchar(32) | Yes | true | | |
| pv | bigint(20) | Yes | false | 0 | SUM |
+----------+-------------+------+-------+---------+-------+
4 rows in set (0.00 sec)
<br/>
mysql> desc table2;
+-----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-------+---------+-------+
| event_day | date | Yes | true | N/A | |
| siteid | int(11) | Yes | true | 10 | |
| citycode | smallint(6) | Yes | true | N/A | |
| username | varchar(32) | Yes | true | | |
| pv | bigint(20) | Yes | false | 0 | SUM |
+-----------+-------------+------+-------+---------+-------+
5 rows in set (0.00 sec)
MySQL [(none)]> use test_xxx_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MySQL [test_xxx_db]> insert into table1 values(1,3708,'zhaop',0);
Query OK, 1 row affected (0.20 sec)
{'label':'insert_d7aee4d4-52a4-11ec-be98-fa163e7a663b', 'status':'VISIBLE', 'txnId':'2005'}
MySQL [test_xxx_db]> select * from table1;
+--------+----------+----------+------+
| siteid | citycode | username | pv |
+--------+----------+----------+------+
| 1 | 3708 | zhaop | 0 |
+--------+----------+----------+------+
1 row in set (0.03 sec)
導入數據
curl --location-trusted -u test_xxx:xxx123456 -T table1_data -H "label: table1_20211121" \
-H "column_separator:," \
http://127.0.0.1:8030/api/test_xxx_db/table1/_stream_load
## 報錯
curl: Can't open 'table1_data'!
curl: try 'curl --help' or 'curl --manual' for more information
問題
- 執行sql報錯
MySQL [test_xxx_db]> select * from table1;
ERROR 1064 (HY000): Could not initialize class com.starrocks.rpc.BackendServiceProxy
后台日志:
2021-11-21 16:03:49,835 WARN (starrocks-mysql-nio-pool-31|379) [StmtExecutor.execute():456] execute Exception, sql select * from table1
java.lang.NoClassDefFoundError: Could not initialize class com.starrocks.rpc.BackendServiceProxy
at com.starrocks.qe.Coordinator$BackendExecState.execRemoteFragmentAsync(Coordinator.java:1695) ~[starrocks-fe.jar:?]
at com.starrocks.qe.Coordinator.exec(Coordinator.java:522) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:771) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:371) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:248) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:397) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:633) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$480(ReadListener.java:54) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
解決方案:openjdk改為
[root@fe1 jre-1.8.0-openjdk]# java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
# 修改為
[root@fe1 java]# java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mod
- BE啟動失敗,java.net.NoRouteToHostException: No route to host (Host unreachable)
# 報錯信息
W1128 23:13:56.493264 24083 utils.cpp:90] Fail to get master client from cache. host= port=0 code=THRIFT_RPC_ERROR
# 查看防火牆狀態
[root@be1 be]# systemctl status firewalld
# 如已開啟,需關閉防火牆,防止網絡不通,導致BE和FE無法連接
[root@be1 be]# systemctl stop firewalld
至此,starRocks可以進行正常的增刪改查操作。
性能測試
共執行了4類場景,13條查詢語句,分別為單表查詢和多表查詢,結果為毫秒,並發為1。
數據創建
下載ssb-poc工具包並編譯
mkdir poc
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.2.zip
unzip ssb-poc-0.9.2.zip
cd ssb-poc
make && make install
# 編譯成功之后,可以看到output目錄
[root@fe1 ssb-poc]# cd output/
[root@fe1 output]# ll
total 0
drwxr-xr-x. 2 root root 197 Nov 28 21:01 bin
drwxr-xr-x. 2 root root 37 Nov 28 21:01 conf
drwxr-xr-x. 3 root root 22 Nov 28 21:01 lib
drwxr-xr-x. 4 root root 39 Nov 28 21:01 share
所有相關工具都會安裝到output目錄
生成數據
bin/gen-ssb.sh 30 data_dir
這里會在data_dir目錄下生成30GB規模的數據
導入數據
安裝python3
壓測腳本需要使用python3,先安裝python3,參考https://www.jianshu.com/p/a916a22de3eb。詳細步驟不再贅述。
確定測試目錄
[root@fe1 output]# pwd
/root/starrocks/poc/ssb-poc/output
修改配置文件conf/doris.conf
[doris]
# for mysql cmd
mysql_host: fe1
mysql_port: 9030
mysql_user: root
mysql_password:
doris_db: ssb
# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000
...
執行建表腳本
bin/create_db_table.sh ddl_100
完成后我們在ssb下創建了6張表:lineorder, supplier, dates, customer, part, lineorder_flat
MySQL [ssb]> show tables;
+----------------+
| Tables_in_ssb |
+----------------+
| customer |
| dates |
| lineorder |
| lineorder_flat |
| part |
| supplier |
+----------------+
6 rows in set (0.00 sec)
數據導入
通過stream load導入數據
bin/stream_load.sh data_dir
data_dir是之前生成的數據目錄,最大表數據量1.8億
| 表名 | 數據量 | 備注 |
|---|---|---|
| lineorder | 179998372 | SSB商品訂單表 |
| supplier | 60000 | SSB客戶表 |
| dates | 2556 | SSB 零部件表 |
| customer | 900000 | SSB 供應商表 |
| part | 1000000 | 日期表 |
| lineorder_flat | 54675488 | 寬表 |
測試單表查詢
[root@fe1 output]# time bin/benchmark.sh -p -d ssb
------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 3466.0
q2 473.0
q3 464.0
q4 1703.0
q5 818.0
q6 621.0
q7 1807.0
q8 1039.0
q9 591.0
q10 506.0
q11 1865.0
q12 1077.0
q13 945.0
# 二次執行
------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 563.0
q2 466.0
q3 537.0
q4 936.0
q5 850.0
q6 590.0
q7 1597.0
q8 970.0
q9 542.0
q10 540.0
q11 1701.0
q12 1104.0
q13 1008.0
測試ssb寬表查詢
# 生成寬表數據
[root@fe1 output]# bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert. flat insert error, msg: (1064, 'index channel has intoleralbe failure')
# 測試寬表性能
time bin/benchmark.sh -p -d ssb-flat
[root@fe1 output]# bin/benchmark.sh -p -d ssb-flat
------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 6435.0
q2 165.0
q3 74.0
q4 7925.0
q5 5307.0
q6 4514.0
q7 7621.0
q8 6821.0
q9 4740.0
q10 116.0
q11 6568.0
q12 301.0
q13 82.0
[root@fe1 output]# time bin/benchmark.sh -p -d ssb-flat
------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 5693.0
q2 98.0
q3 77.0
q4 5811.0
q5 4549.0
q6 4111.0
q7 6819.0
q8 6389.0
q9 4298.0
q10 143.0
q11 6583.0
q12 231.0
q13 75.0
real 0m51.565s
user 0m0.192s
sys 0m0.152s


結論
在1.8億條單表記錄,5500萬條寬表記錄情況下,通過SSB測試,多表join查詢性能較高,實測性能和官方提供的測試報告有所差距。主要原因可能有以下幾點:
- 測試使用為虛機,配置低(官方使用16核64G ESSD高效雲盤 10Gbits/s網絡帶寬)
- 未經過參數調優
測試SQL附錄
單表測試SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
多表測試SQL
--Q1.1
select sum(lo_revenue) as revenue
from lineorder join dates on lo_orderdate = d_datekey
where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
--Q1.2
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
--Q1.3
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;
--Q2.1
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
inner join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.2
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.3
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;
--Q3.1
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, lo_revenue desc;
--Q3.2
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.3
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.4
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth
= 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q4.1
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;
--Q4.2
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;
--Q4.3
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;
