SQL的的行列倒置已經不是新知識了,但在博主的技術咨詢期間,仍發現其實有很多人並不了解這塊,所以在此專門寫一篇博客記錄。本文將以Mysql為例,並以數據采集指標信息獲取為例子。在下面的例子,你可以在sqlfiddle運行。
首先我們需要創建數據庫Schema:
CREATE TABLE Chart
(`createTime` DateTime, `kpi` varchar(30), `field` varchar(30), `value` double);
INSERT INTO Chart
(`createTime`,`kpi`, `field`, `value`)
VALUES
("2015-02-01 12:00:00", 'disk', 'disk', 20),
("2015-02-01 12:15:00", 'disk', 'disk', 30),
("2015-02-01 12:20:00", 'disk', 'disk', 25),
("2015-02-01 12:30:00", 'disk', 'disk', 25),
("2015-02-01 12:35:00", 'disk', 'disk', 25),
("2015-02-01 12:40:00", 'disk', 'disk', 25),
("2015-02-01 12:00:00", 'disk', 'disk-all', 20),
("2015-02-01 12:20:00", 'disk', 'disk-all', 30),
("2015-02-01 12:25:00", 'disk', 'disk-all', 25),
("2015-02-01 12:30:00", 'disk', 'disk-all', 25),
("2015-02-01 12:35:00", 'disk', 'disk-all', 25),
("2015-02-01 12:40:00", 'disk', 'disk-all', 25),
("2015-02-01 12:40:00", 'cpu', 'cpu-all', 25),
("2015-02-01 12:40:00", 'cpu', 'cpu', 25)
;
在這里字段分別代表:createTime = 數據采集時間,kpi = 數據采集指標,field = 作為指標的小類(一個kpi可以包含多個field),value = 采集的數據
當我們創建好了數據結構,下面因為我們希望獲取出所有的 固定時間范圍內的特定kpi的數據,注意因為可能一個kpi中的多個field,但是某些field漏采了部分時間的數據,所以這里我們需要補充異常點0. 並由於EChart這類圖表庫,希望我們輸入的是橫軸和縱軸為兩個獨立的數組對象表示。所以我們需要如下:
option = {
....
xAxis : [
{
type : 'category',
boundaryGap : false,
data : ['周一','周二','周三','周四','周五','周六','周日']
}
],
yAxis : [
{
type : 'value',
axisLabel : {
formatter: '{value} °C'
}
}
],
series : [
{
....
data:[11, 11, 15, 13, 12, 13, 10]
},
{
....
data:[11, 11, 15, 13, 12, 13, 10]
}
]
};
取出橫軸比較容易,如下:
SELECT createTime,kpi, field, value FROM Chart WHERE kpi = 'disk' and (createTime BETWEEN '2015-02-01 12:00:00' AND '2015-02-01 12:25:00');
但是縱軸如果我們以同樣方式取出,可能存在需要我們自動程序補值,並且需要保證每項數據和橫軸對應,所以我們的程序處理會比較復雜,如下:
SELECT createTime,kpi, field, value FROM Chart WHERE kpi = 'disk' and (createTime BETWEEN '2015-02-01 12:00:00' AND '2015-02-01 12:25:00');
結果為:
createTime kpi field value
February, 01 2015 12:00:00 disk disk 20
February, 01 2015 12:15:00 disk disk 30
February, 01 2015 12:20:00 disk disk 25
February, 01 2015 12:00:00 disk disk-all 20
February, 01 2015 12:20:00 disk disk-all 30
February, 01 2015 12:25:00 disk disk-all 25
有沒有其他方案更佳的呢?當然那就是本文要說的sql的倒置,如果我們能夠把返回數據轉換為如下:
field ‘2015-02-01 12:00:00’ ‘2015-02-01 12:15:00’ ‘2015-02-01 12:20:00’ ‘2015-02-01 12:25:00’
disk 20 30 25 0
disk-all 20 0 30 25
那么程序就很好處理了。在上面我們已經能夠取出所有的橫軸數據並排序,接下來我們將可以很簡單的做到行列倒置:如下:
SELECT field,
SUM(IF(createTime = '2015-02-01 12:00:00', value, 0)) as '2015-02-01 12:00:00',
SUM(IF(createTime = '2015-02-01 12:15:00', value, 0)) as '2015-02-01 12:15:00',
SUM(IF(createTime = '2015-02-01 12:20:00', value, 0)) as '2015-02-01 12:20:00',
SUM(IF(createTime = '2015-02-01 12:25:00', value, 0)) as '2015-02-01 12:25:00'
FROM Chart
WHERE kpi = 'disk' and (createTime BETWEEN '2015-02-01 12:00:00' AND '2015-02-01 12:25:00')
GROUP BY field
這樣返回數據滿足我們的需求了。
下面我們來分析下這句SQL,
- 首先我們利用‘IF(createTime = ‘2015-02-01 12:00:00’, value, 0)’來處理插值,並對每行數據轉為以時間為列數據,並可以利用IF來補’0‘,將會如下:
SQL:
SELECT field,
IF(createTime = '2015-02-01 12:00:00', value, 0) as '2015-02-01 12:00:00',
IF(createTime = '2015-02-01 12:15:00', value, 0) as '2015-02-01 12:15:00',
IF(createTime = '2015-02-01 12:20:00', value, 0) as '2015-02-01 12:20:00',
IF(createTime = '2015-02-01 12:25:00', value, 0) as '2015-02-01 12:25:00'
FROM Chart
WHERE kpi = 'disk' and (createTime BETWEEN '2015-02-01 12:00:00' AND '2015-02-01 12:25:00');
結果為:
field ‘2015-02-01 12:00:00’ ‘2015-02-01 12:15:00’ ‘2015-02-01 12:20:00’ ‘2015-02-01 12:25:00’
disk 20 0 0 0
disk 0 30 0 0
disk 0 0 25 0
disk-all 20 0 0 0
disk-all 0 0 30 0
disk-all 0 0 0 25
- 這下我們就可以利用sql的聚合函數sum和group by來聚合數據行:
SQL:
SELECT field,
SUM(IF(createTime = '2015-02-01 12:00:00', value, 0)) as '2015-02-01 12:00:00',
SUM(IF(createTime = '2015-02-01 12:15:00', value, 0)) as '2015-02-01 12:15:00',
SUM(IF(createTime = '2015-02-01 12:20:00', value, 0)) as '2015-02-01 12:20:00',
SUM(IF(createTime = '2015-02-01 12:25:00', value, 0)) as '2015-02-01 12:25:00'
FROM Chart
WHERE kpi = 'disk' and (createTime BETWEEN '2015-02-01 12:00:00' AND '2015-02-01 12:25:00')
GROUP BY field
效果如上。
對於sql行列轉置可以簡述為分為兩部分:
- 利用條件邏輯(mysql: IF, sql server: case … when(sql server 2005開始支持數據透視表pivot) ..)將 需要倒置的數據變為列。
- 利用聚合函數(sum、max、min…)group by 合並數據。這里需要注意max、min需要注意數據的邊界,如存在負數且默認值采用0,那么max就會存在問題,所以一般sum是最安全的(任何數加0都不會改變結果);但對於特定場景max、min也是安全方案。
我們也可以將上面兩次請求合並為一次,這就需要mysql的動態拼接,如下:
SELECT
@time_sql := group_concat("SUM(IF(createTime = '", t.createTime, "', value, 0)) AS '" , t.createTime, "'")
FROM (
SELECT DISTINCT createTime FROM Chart ORDER BY createTime
) AS t;
set @v_sql = CONCAT("SELECT field", IF(ISNULL(@time_sql) , " ", CONCAT(", ", @time_sql)) ," FROM Chart GROUP BY field");
prepare stmt from @v_sql;
EXECUTE stmt;
deallocate prepare stmt;