经常遇到客户问asmcmd lsdg出来的Usable_file_MB值怎么是负的?
这里就把lsdg里面几个容易混淆的字段进行整理先来看下asmcmd lsdg输出
grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 1927720 2289291 -180785 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710880 914688 898096 0 N REDODG/ MOUNTED NORMAL N 512 4096 1048576 18432 17506 6144 5681 0 Y VOTING/
Total_MB:Total capacity of the disk group (in megabytes)
磁盘组总共容量
15:07:06 SQL> create diskgroup sydg external redundancy disk 15:14:46 2 '/dev/qdata/mpath-s01.3264.01.P0B00S04' size 100g , 15:14:46 3 '/dev/qdata/mpath-s02.3264.01.P0B00S04' size 100g, 15:14:46 4 '/dev/qdata/mpath-s03.3264.01.P0B00S04' size 200g 15:14:46 5 ; Diskgroup created. Elapsed: 00:00:10.12 15:14:56 SQL> select b.name,a.failgroup,a.name,a.total_mb/1024,(a.total_mb-a.free_mb)/1024 used_mb from v$asm_disk a,v$asm_diskgroup b where a.group_number=b.group_number and b.name='SYDG' ; NAME FAILGROUP NAME A.TOTAL_MB/1024 USED_MB ---------------- ---------------- ---------------- --------------- ---------- SYDG SYDG_0000 SYDG_0000 100 7.51660156 SYDG SYDG_0002 SYDG_0002 200 15.0283203 SYDG SYDG_0001 SYDG_0001 100 7.51171875 Elapsed: 00:00:00.02
所以total_mb等于创建磁盘组时所有磁盘空间大小总和,并且asm数据是根据磁盘大小来分布的,上面创建了30G表空间,会根据磁盘比例平均分配到各个磁盘。
Free_mb:Unused capacity of the disk (in megabytes)
磁盘剩余空间,字面意思了所有磁盘剩余空间总和
14:15:21 SQL> create diskgroup testdg normal redundancy failgroup f1 disk
14:16:26 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 14:16:26 3 failgroup f2 disk 14:16:26 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 14:16:26 5 failgroup f3 disk 14:16:26 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s03.3265.01.P0B00S12' size 200g 14:16:26 7 ; Diskgroup created.
ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 593747 307200 143273 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
Req_mir_free_MB:Amount of space that is required to be available in a given disk group in order to restore redundancy after one or more disk failures. The amount of space displayed in this column takes mirroring effects into account.
上面输出说明asm 使用failgroup testdg需要307200MB空余空间来保证当asm failgroup挂了以后有足够空间恢复数据。这里的场景正好是failgroup大小不一样的场景,可以看出来他是选择了最大的failgroup f3。
Usable_file_MB:Amount of free space that can be safely utilized taking mirroring into account and yet be able to restore redundancy after a disk failure
在保证冗余度的情况下可以使用的空间大小,上述输出testdg中该大小为143273MB,
16:24:21 SQL> select (593747-307200)/2 from dual; (593747-307200)/2 ----------------- 143273.5
可以看出来计算公式就是free_mb - required_mirror_free_mb) / N n是副本数量。
sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 532302 307200 112551 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
testdg增加30G表空间可以看到跟预期的一样,free_mb少了60G,Usable_file_MB少了30G,继续加表空间。
sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 347967 307200 20383 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 286522 307200 -10339 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
usable_MB变成了负数!这意味着当我们遇到了testdg failgroup损坏将无法还原成normal冗余度,因为空间不够了,并且useable_file_MB是可以监控的, 当他达到负数的时候你应该立刻进行磁盘组扩容或者清理磁盘组里的表空间数据文件。同时也意味着只要你free_mb足够,asm都不会阻止你使用剩余的磁盘空间。
此时如果再删除磁盘组会怎么样?
16:51:19 SQL> alter diskgroup testdg drop disks in failgroup f1; Diskgroup altered.
看到磁盘还是可以删除,看看rebalance情况及观察asm日志;
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
16:56:31 SQL> select * from v$asm_operation;
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- ----------------------------------------------------------------------------------------
4 REBAL ERRS 11 ORA-15041
Elapsed: 00:00:00.14
SQL> alter diskgroup testdg drop disks in failgroup f1 NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=4 Wed Jul 11 16:51:33 2018 GMON updating for reconfiguration, group 4 at 101 for pid 28, osid 57422 NOTE: group 4 PST updated. Wed Jul 11 16:51:33 2018 WARNING: diskgroup compatibility limits power to 11 NOTE: membership refresh pending for group 4/0x340a0df4 (TESTDG) GMON querying group 4 at 102 for pid 18, osid 23944 SUCCESS: refreshed membership for 4/0x340a0df4 (TESTDG) NOTE: starting rebalance of group 4/0x340a0df4 (TESTDG) at power 11 SUCCESS: alter diskgroup testdg drop disks in failgroup f1 Starting background process ARB0 Wed Jul 11 16:51:36 2018 ARB0 started with pid=34, OS id=57507 NOTE: assigning ARB0 to group 4/0x340a0df4 (TESTDG) with 11 parallel I/Os cellip.ora not found. NOTE: F1X0 copy 1 relocating from 0:2 to 5:2 for diskgroup 4 (TESTDG) NOTE: F1X0 copy 3 relocating from 5:2 to 65534:4294967294 for diskgroup 4 (TESTDG) NOTE: Attempting voting file refresh on diskgroup TESTDG NOTE: Refresh completed on diskgroup TESTDG. No voting file found. Wed Jul 11 16:56:06 2018 ERROR: ORA-15041 thrown in ARB0 for group number 4 Errors in file /opt/ogrid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_57507.trc: ORA-15041: diskgroup "TESTDG" space exhausted Wed Jul 11 16:56:06 2018 NOTE: stopping process ARB0 NOTE: rebalance interrupted for group 4/0x340a0df4 (TESTDG)
确实rebalance无法完成了。
此时磁盘状态
17:18:37 SQL> select name,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE from v$asm_disk where group_number=4; NAME MOUNT_STATUS HEADER_STATUS MODE_STATUS STATE ---------------- -------------- ------------------------ -------------- ---------------- TESTDG_0001 CACHED MEMBER ONLINE DROPPING TESTDG_0000 CACHED MEMBER ONLINE DROPPING TESTDG_0005 CACHED MEMBER ONLINE NORMAL TESTDG_0004 CACHED MEMBER ONLINE NORMAL TESTDG_0003 CACHED MEMBER ONLINE NORMAL TESTDG_0002 CACHED MEMBER ONLINE NORMA
对于未删除的可以使用undrop操作:
17:17:04 SQL> alter diskgroup testdg undrop disks ; Diskgroup altered. Elapsed: 00:00:07.16
但还是报错
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE ------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- ---------------------------------------------------------------------------------------- 4 REBAL ERRS 11 ORA-15041
再次说明当usable为负数的时候磁盘是无法完成rebalance的,删除部分空间
sys@YEDB>alter tablespace test drop datafile '+TESTDG/yedb/datafile/test.262.981218519'; Tablespace altered. ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL Y 512 4096 1048576 716800 347965 307200 20382 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/ 17:23:29 SQL> select * from v$asm_operation; no rows selected
接下来我们看看创建一个只有2个failgroup的nomal模式的dg
17:32:07 SQL> create diskgroup testdg normal redundancy failgroup f1 disk 17:32:17 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 17:32:17 3 failgroup f2 disk 17:32:17 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 17:32:17 5 ; grid@com1:/opt/ogrid/diag/asm/+asm/+ASM1/trace>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 409600 409494 102400 153547 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING
惊奇的发现Req_mir_free_MB变成了100G,这个值不是应该是最大failgroup大小么,那不应该是200G?
仔细想想可以想明白,一个2个failgroup保存了2分副本,当一个failgroup挂了asm还能做到冗余?做不到,所以只能预测一个磁盘掉了以后所需要的空间。
REQUIRED_MIRROR_FREE_MB
和 USABLE_FILE_MB在high模式下情况:
17:57:26 SQL> create diskgroup testdg high redundancy failgroup f1 disk 17:57:42 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g 17:57:42 3 failgroup f2 disk 17:57:42 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g 17:57:42 5 failgroup f3 disk 17:57:42 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g 17:57:42 7 failgroup f4 disk '/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g failgroup f5 disk '/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g failgroup f6 disk '/dev/qdata/mpath-s03.3265.01.P0B00S12' size 100g; Diskgroup created. Elapsed: 00:00:08.32 17:57:50 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 614400 614241 204800 136480 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
可以看到req_mir_free_mb等于2倍的failgroup大小,因为它需要最少2份副本
usable_mir_free_mb=(614241-204800)/3=136480MB符合预期。
18:15:41 SQL> create diskgroup testdg high redundancy failgroup f1 disk 18:15:46 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 18:15:46 3 failgroup f2 disk 18:15:47 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 18:15:47 5 failgroup f3 disk 18:15:47 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s03.3265.01.P0B00S12' size 100g 18:15:47 7 ; Diskgroup created. Elapsed: 00:00:09.93 18:15:57 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 614400 614241 204800 136480 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
这里看到的204800并不是一个failgroup大小,而是2倍disk大小,因为只有3个failgroup,所以当一个failgroup挂了以后是怎么都恢复不了冗余度的,跟前面normal一样,因为2倍冗余所以需要2倍disk大小。
再来看最后一个,
18:11:04 SQL> create diskgroup testdg high redundancy failgroup f1 disk 18:11:13 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g 18:11:13 3 failgroup f2 disk 18:11:13 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g 18:11:13 5 failgroup f3 disk 18:11:13 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g 18:11:13 7 ; Diskgroup created. Elapsed: 00:00:07.11 18:11:20 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 307200 307047 0 102349 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
asm在掉一个盘都无法恢复冗余度时,会直接将req_mir_free_mb设置为0