请关注个人小站:http://sqlhis.com/
分析数据库的负载高低,通常可以通过CPU利用率,磁盘响应速度等进行分析,Oracle数据库提供AWR报告,报告的内容很多很多,但是却没有一个定性的指标,能回答:
1. 到底数据库当前的负载是高还是低?
2. 我感觉我数据库很慢,我的数据库硬件是是不是应该升级了?
先看一段典型的AWR报告,截取部分片段
Host Name | Platform | CPUs | Cores | Sockets | Memory (GB) |
---|---|---|---|---|---|
ABCD | Linux x86 64-bit | 12 | 12 | 12 | 94.15 |
Snap Id | Snap Time | Sessions | Cursors/Session | Instances | |
---|---|---|---|---|---|
Begin Snap: | 13647 | 24-Aug-18 00:00:40 | 575 | 4.1 | 2 |
End Snap: | 13695 | 25-Aug-18 00:00:06 | 558 | 4.6 | 2 |
Elapsed: | 1,439.43 (mins) | ||||
DB Time: | 2,762.39 (mins) |
这里有三个关键信息:
1. 首先,我们的服务器是12个CPU,12核
2. 这个AWR报告的区间是从 24-Aug-18 00:00:40 到 25-Aug-18 00:00:06,整个区间是1,439.43 (mins), 24小时。
3. 然后是DB TIME: 2,762.39 (mins), 这是一个关键的性能指标,就是所有数据库请求的运行时间加总。
举个栗子,假设一家理发店24小时(1,439.43 (mins))营业,一共有12个理发师(12 CPU),顾客在理发店的累计逗留时间就是 DB TIME:2,762.39 (mins)。
如果拿顾客累计逗留时间除以营业时间,2762.30/1439.43=1.92, 也就是说,在营业时间内,店内平均有1.92个顾客。
顾客的累计逗留时间(DB TIME)不是理发师(CPU)的累计工作时间,如果两个顾客都要烫发,正巧只有一个烫发机,则其中一个顾客需要等待(阻塞),这段时间理发师没有干活,
所以:
DB CPU 是一定会小于 DB Time的, 理发师剪头发的时间一定小于顾客在店内逗留的时间。
继续看AWR报告,给出了理发师的工作时间,DB CPUS(S)=0.9 ,我们有12个理发师,也就是全天CPU的利用率是 0.9/12=7.5%
Load Profile
Per Second | Per Transaction | Per Exec | Per Call | |
---|---|---|---|---|
DB Time(s): | 1.9 | 0.1 | 0.00 | 0.00 |
DB CPU(s): | 0.9 | 0.0 | 0.00 | 0.00 |
Redo size (bytes): | 2,286,644.3 | 69,730.7 | ||
Logical read (blocks): | 117,205.0 | 3,574.1 | ||
Block changes: | 9,588.5 | 292.4 | ||
Physical read (blocks): | 1,768.0 | 53.9 | ||
Physical write (blocks): | 287.0 | 8.8 | ||
Read IO requests: | 201.7 | 6.2 | ||
Write IO requests: | 75.0 | 2.3 | ||
Read IO (MB): | 13.8 | 0.4 | ||
Write IO (MB): | 2.2 | 0.1 | ||
Global Cache blocks received: | 29.1 | 0.9 | ||
Global Cache blocks served: | 37.5 | 1.1 | ||
User calls: | 3,122.8 | 95.2 | ||
Parses (SQL): | 473.3 | 14.4 | ||
Hard parses (SQL): | 9.5 | 0.3 | ||
SQL Work Area (MB): | 15.8 | 0.5 | ||
Logons: | 0.1 | 0.0 | ||
Executes (SQL): | 11,292.7 | 344.4 | ||
Rollbacks: | 0.7 | 0.0 | ||
Transactions: | 32.8 |
全天CPU利用率只有7.5%,整体看比较低,但是不意味着数据库的负荷很低,有两种情况:
1. 负荷可能集中在某个时间段,这就需要我们观察每个时间段的数据库负荷。
2. 数据库的瓶颈可能不在CPU,而在磁盘或者网络
1)数据库内存较少,大部分时间消耗在磁盘的读取上,这个可以从磁盘的响应速度判断
2)网络带宽不足,数据库要发送大量数据到客户端,在这里形成了等待。
所以现在我们想分时间段获得DB TIME和DB CPU这两个指标,如果需要按每个时间段导出一个AWR报告,这是一件令人抓狂的事,还好,AWR报告实际上就是通过一些数据库视图生成的,这些视图就是数据库定时的快照,所以我们可以利用SQL语句很方便的生成。
首先看下快照的主表:
SELECT * FROM DBA_HIST_SNAPSHOT ORDER BY SNAP_ID
SNAP_ID | DBID | INSTANCE_NUMBER | STARTUP_TIME | BEGIN_INTERVAL_TIME | END_INTERVAL_TIME | FLUSH_ELAPSED | SNAP_LEVEL | ERROR_COUNT | SNAP_FLAG | SNAP_TIMEZONE | |
1 | 13648 | 1647988888 | 2 | 10-8月 -18 05.20.05.000 下午 | 24-8月 -18 12.00.40.204 上午 | 24-8月 -18 12.30.25.506 上午 | +00000 00:00:00.5 | 1 | 0 | 0 | +0 08:00:00 |
2 | 13648 | 1647988888 | 1 | 28-7月 -18 08.01.15.000 下午 | 24-8月 -18 12.00.40.229 上午 | 24-8月 -18 12.30.25.524 上午 | +00000 00:00:02.5 | 1 | 0 | 0 | +0 08:00:00 |
3 | 13649 | 1647988888 | 1 | 28-7月 -18 08.01.15.000 下午 | 24-8月 -18 12.30.25.524 上午 | 24-8月 -18 01.00.16.839 上午 | +00000 00:00:02.2 | 1 | 0 | 0 | +0 08:00:00 |
4 | 13649 | 1647988888 | 2 | 10-8月 -18 05.20.05.000 下午 | 24-8月 -18 12.30.25.506 上午 | 24-8月 -18 01.00.16.820 上午 | +00000 00:00:01.2 | 1 | 0 | 0 | +0 08:00:00 |
5 | 13650 | 1647948888 | 1 | 28-7月 -18 08.01.15.000 下午 | 24-8月 -18 01.00.16.839 上午 | 24-8月 -18 01.30.08.125 上午 | +00000 00:00:02.9 | 1 | 0 | 0 | +0 08:00:00 |
6 | 13650 | 1647988888 | 2 | 10-8月 -18 05.20.05.000 下午 | 24-8月 -18 01.00.16.820 上午 | 24-8月 -18 01.30.08.072 上午 | +00000 00:00:00.9 | 1 | 0 | 0 | +0 08:00:00 |
7 | 13651 | 1647988888 | 1 | 28-7月 -18 08.01.15.000 下午 | 24-8月 -18 01.30.08.125 上午 | 24-8月 -18 02.00.02.352 上午 | +00000 00:00:01.6 | 1 | 0 | 0 | +0 08:00:00 |
8 | 13651 | 1647988888 | 2 | 10-8月 -18 05.20.05.000 下午 | 24-8月 -18 01.30.08.072 上午 | 24-8月 -18 02.00.02.334 上午 | +00000 00:00:00.7 | 1 | 0 | 0 | +0 08:00:00 |
通过这个表我们可以判断如下:
1. 半小时一个快照(这个是配置值,可改),快照并不是完全的整点,存在一定的误差,即快照的区间接近半小时,可能会多几十秒或者少几十秒。
2. 这里有两个INSTANCE_NUMBER,因为这个是集群,所以实际上有两个物理机。
DBA_HIST_SYS_TIME_MODEL 表里面有我们需要的DB CPU和DB time,这个值是累计值,即需要算一个区间的数,需要那结束时间的值减去开始时间的值。
SELECT * FROM DBA_HIST_SYS_TIME_MODEL
WHERE SNAP_ID='13648'
AND STAT_NAME IN ('DB CPU','DB time')
SNAP_ID | DBID | INSTANCE_NUMBER | STAT_ID | STAT_NAME | VALUE | |
1 | 13648 | 1647988888 | 2 | 2748282437 | DB CPU | 41061757271 |
2 | 13648 | 1647988888 | 1 | 2748282437 | DB CPU | 2214287108077 |
3 | 13648 | 1647988888 | 2 | 3649082374 | DB time | 105530922248 |
4 | 13648 | 1647988888 | 1 | 3649082374 | DB time | 3992370209915 |
创建一个表存储原始数据,
CREATE TABLE SYSTEMLOAD AS WITH TEMP AS( SELECT a.INSTANCE_NUMBER, to_number(to_char(A.BEGIN_INTERVAL_TIME,'yyyymmdd')) AS BEGIN_DATE, TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48) AS SNAPORDER, SUBSTR('0' || TRUNC(TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48)/2) || (CASE WHEN mod(TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48),2)=1 THEN '30' ELSE '00' END),-4) AS BEGIN_TIME, A.BEGIN_INTERVAL_TIME+0 AS BEGIN_INTERVAL_TIME, A.END_INTERVAL_TIME+0 AS END_INTERVAL_TIME, ((A.END_INTERVAL_TIME+0)-(A.BEGIN_INTERVAL_TIME+0))*60*60*24 AS TIMEINTERVAL, B.SNAP_ID AS STARTSNAP_ID, A.SNAP_ID AS ENDSNAP_ID FROM DBA_HIST_SNAPSHOT A INNER JOIN DBA_HIST_SNAPSHOT B /*寻找前一个SNAP_ID*/ ON A.SNAP_ID-1=B.SNAP_ID AND A.INSTANCE_NUMBER=B.INSTANCE_NUMBER ORDER BY A.SNAP_ID ) SELECT A.*, DBTIME_E.VALUE-DBTIME_S.VALUE AS DBTIME, DBCPU_E.VALUE-DBCPU_S.VALUE AS DBCPU, REDOSIZE_E.VALUE-REDOSIZE_S.VALUE AS REDOSIZE, PHYSICALREAD_E.VALUE-PHYSICALREAD_S.VALUE AS PHYSICALREAD, PHYSICALWRITE_E.VALUE-PHYSICALWRITE_S.VALUE AS PHYSICALWRITE FROM TEMP A LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL DBTIME_S ON a.STARTSNAP_ID=DBTIME_S.SNAP_ID AND A.INSTANCE_NUMBER=DBTIME_S.INSTANCE_NUMBER AND DBTIME_S.STAT_NAME='DB time' LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL DBTIME_E ON a.ENDSNAP_ID=DBTIME_E.SNAP_ID AND A.INSTANCE_NUMBER=DBTIME_E.INSTANCE_NUMBER AND DBTIME_E.STAT_NAME='DB time' LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL DBCPU_S ON a.STARTSNAP_ID=DBCPU_S.SNAP_ID AND A.INSTANCE_NUMBER=DBCPU_S.INSTANCE_NUMBER AND DBCPU_S.STAT_NAME='DB CPU' LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL DBCPU_E ON a.ENDSNAP_ID=DBCPU_E.SNAP_ID AND A.INSTANCE_NUMBER=DBCPU_E.INSTANCE_NUMBER AND DBCPU_E.STAT_NAME='DB CPU' LEFT OUTER JOIN DBA_HIST_SYSSTAT REDOSIZE_S ON A.STARTSNAP_ID=REDOSIZE_S.SNAP_ID AND A.INSTANCE_NUMBER=REDOSIZE_S.INSTANCE_NUMBER AND REDOSIZE_S.STAT_NAME='redo size' LEFT OUTER JOIN DBA_HIST_SYSSTAT REDOSIZE_E ON A.ENDSNAP_ID=REDOSIZE_E.SNAP_ID AND A.INSTANCE_NUMBER=REDOSIZE_E.INSTANCE_NUMBER AND REDOSIZE_E.STAT_NAME='redo size' LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALREAD_S ON A.STARTSNAP_ID=PHYSICALREAD_S.SNAP_ID AND A.INSTANCE_NUMBER=PHYSICALREAD_S.INSTANCE_NUMBER AND PHYSICALREAD_S.STAT_NAME='physical read bytes' LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALREAD_E ON A.ENDSNAP_ID=PHYSICALREAD_E.SNAP_ID AND A.INSTANCE_NUMBER=PHYSICALREAD_E.INSTANCE_NUMBER AND PHYSICALREAD_E.STAT_NAME='physical read bytes' LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALWRITE_S ON A.STARTSNAP_ID=PHYSICALWRITE_S.SNAP_ID AND A.INSTANCE_NUMBER=PHYSICALWRITE_S.INSTANCE_NUMBER AND PHYSICALWRITE_S.STAT_NAME='physical write bytes' LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALWRITE_E ON A.ENDSNAP_ID=PHYSICALWRITE_E.SNAP_ID AND A.INSTANCE_NUMBER=PHYSICALWRITE_E.INSTANCE_NUMBER AND PHYSICALWRITE_E.STAT_NAME='physical write bytes' AND DBTIME_E.VALUE-DBTIME_S.VALUE>0 /*重启可能导致重新累积,不加入这些异常数据*/ --WHERE A.INSTANCE_NUMBER=1 ORDER BY A.STARTSNAP_ID
输出单日的运行指标:
SELECT BEGIN_DATE, MAX(BEGIN_TIME) AS BEGIN_TIME, ROUND(SUM(DBTIME)/SUM(TIMEINTERVAL)/1000000,2) AS AVGDBTIME, ROUND(SUM(DBCPU)/SUM(TIMEINTERVAL)/1000000,2) AS AVGDBCPU, ROUND(SUM(REDOSIZE)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGREDOSIZE , ROUND(SUM(PHYSICALREAD)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGPHYSICALREAD, ROUND(SUM(PHYSICALWRITE)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGPHYSICALWRITE FROM SYSTEMLOAD WHERE INSTANCE_NUMBER=1 AND BEGIN_DATE='20180831' GROUP BY BEGIN_DATE,SNAPORDER ORDER BY BEGIN_DATE,SNAPORDER
以下为输出,可以看出DB TIME和DB CPU其实都不高,如果算CPU的利用率,将AVGDBCPU除以12即可,大概不到20%
BEGIN_DATE | BEGIN_TIME | AVGDBTIME | AVGDBCPU | AVGREDOSIZE | AVGPHYSICALREAD | AVGPHYSICALWRITE | |
1 | 20180831 | 0 | 1.71 | 1.16 | 0.98 | 1.37 | 1.19 |
2 | 20180831 | 30 | 0.98 | 0.89 | 0.31 | 0.14 | 0.44 |
3 | 20180831 | 100 | 1.31 | 0.88 | 0.28 | 0.12 | 0.43 |
4 | 20180831 | 130 | 1.19 | 1.06 | 0.47 | 0.05 | 0.55 |
5 | 20180831 | 200 | 1.79 | 1.38 | 0.49 | 4.81 | 0.69 |
6 | 20180831 | 230 | 1.75 | 1.62 | 0.67 | 0.58 | 0.75 |
7 | 20180831 | 300 | 1.42 | 1.33 | 0.53 | 0.73 | 0.73 |
8 | 20180831 | 330 | 1.44 | 1.28 | 0.59 | 5.46 | 0.67 |
9 | 20180831 | 400 | 0.96 | 0.92 | 0.2 | 0.02 | 0.37 |
10 | 20180831 | 430 | 1.03 | 0.95 | 0.4 | 0.06 | 0.38 |
11 | 20180831 | 500 | 1.16 | 1.02 | 1.03 | 1.47 | 1.32 |
12 | 20180831 | 530 | 1.11 | 1 | 0.4 | 0.65 | 0.4 |
13 | 20180831 | 600 | 1.74 | 1.21 | 2.08 | 5.51 | 1.46 |
14 | 20180831 | 630 | 1.37 | 1.13 | 0.51 | 1.5 | 0.57 |
15 | 20180831 | 700 | 1.24 | 1.12 | 0.41 | 0.66 | 0.64 |
16 | 20180831 | 730 | 1.29 | 1.06 | 0.62 | 2.06 | 0.66 |
17 | 20180831 | 800 | 5.89 | 2.02 | 3.64 | 43.82 | 3.93 |
18 | 20180831 | 830 | 2.28 | 1.66 | 1.82 | 11.51 | 2.21 |
19 | 20180831 | 900 | 3.29 | 2.08 | 3.69 | 12.46 | 3.15 |
20 | 20180831 | 930 | 3.15 | 1.7 | 7.08 | 27.65 | 7.41 |
21 | 20180831 | 1000 | 2.37 | 1.36 | 6.3 | 20.12 | 5.92 |
22 | 20180831 | 1030 | 1.93 | 1.1 | 5.46 | 25.37 | 5.04 |
23 | 20180831 | 1100 | 2.54 | 1.3 | 5.55 | 32.08 | 5.92 |
24 | 20180831 | 1130 | 1.76 | 1.03 | 4.82 | 17.46 | 5.66 |
25 | 20180831 | 1200 | 1.03 | 0.57 | 2.19 | 18.68 | 2.47 |
26 | 20180831 | 1230 | 1.22 | 0.89 | 2.67 | 17.34 | 3.04 |
27 | 20180831 | 1300 | 1.68 | 1.14 | 4.73 | 21.53 | 4.47 |
28 | 20180831 | 1330 | 1.92 | 1.05 | 4.83 | 21.42 | 5.06 |
29 | 20180831 | 1400 | 1.88 | 1.23 | 5.21 | 22.94 | 5.14 |
30 | 20180831 | 1430 | 1.18 | 0.65 | 4.03 | 10.04 | 4.96 |
31 | 20180831 | 1500 | 1.32 | 0.62 | 2.97 | 6.28 | 2.83 |
32 | 20180831 | 1530 | 1.68 | 0.91 | 3.85 | 14.67 | 5.12 |
33 | 20180831 | 1600 | 5.56 | 1.05 | 5.9 | 41.36 | 5.46 |
34 | 20180831 | 1630 | 2.82 | 0.86 | 3.34 | 7.82 | 4.21 |
35 | 20180831 | 1700 | 2.85 | 1.31 | 3.24 | 37.8 | 4.28 |
36 | 20180831 | 1730 | 2.7 | 1.16 | 3.43 | 19.22 | 4.11 |
37 | 20180831 | 1800 | 2.36 | 0.73 | 2.57 | 25.6 | 3.81 |
38 | 20180831 | 1830 | 1.82 | 1.12 | 1.98 | 17.04 | 6.45 |
39 | 20180831 | 1900 | 1.63 | 0.83 | 2.36 | 15.45 | 3.15 |
40 | 20180831 | 1930 | 0.9 | 0.48 | 1.59 | 4.32 | 2.08 |
41 | 20180831 | 2000 | 1.09 | 0.28 | 1.01 | 9.93 | 2.01 |
42 | 20180831 | 2030 | 0.61 | 0.21 | 0.85 | 0.21 | 1.08 |
43 | 20180831 | 2100 | 0.49 | 0.21 | 0.77 | 0.21 | 1.03 |
44 | 20180831 | 2130 | 0.41 | 0.2 | 0.96 | 1.19 | 1.06 |
45 | 20180831 | 2200 | 2.35 | 0.72 | 2.79 | 22.78 | 4.89 |
46 | 20180831 | 2230 | 0.78 | 0.39 | 0.82 | 8.1 | 1.54 |
47 | 20180831 | 2300 | 0.2 | 0.12 | 0.66 | 0.53 | 0.87 |
48 | 20180831 | 2330 | 0.19 | 0.1 | 0.53 | 0.07 | 0.77 |
总结:
统计了DB CPU和DB TIME两个关键指标,但是DB CPU是数据库层面的数据,并不是服务器层面的数据,有可能服务器上面还运行了其他程序,占用了大量的CPU,导致DB CPU这个值很低,但是实际上服务器的CPU非常繁忙。下一篇文章将获得服务器层面的CPU利用率。