經常遇到客戶問asmcmd lsdg出來的Usable_file_MB值怎么是負的?
這里就把lsdg里面幾個容易混淆的字段進行整理先來看下asmcmd lsdg輸出
grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 1927720 2289291 -180785 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710880 914688 898096 0 N REDODG/ MOUNTED NORMAL N 512 4096 1048576 18432 17506 6144 5681 0 Y VOTING/
Total_MB:Total capacity of the disk group (in megabytes)
磁盤組總共容量
15:07:06 SQL> create diskgroup sydg external redundancy disk 15:14:46 2 '/dev/qdata/mpath-s01.3264.01.P0B00S04' size 100g , 15:14:46 3 '/dev/qdata/mpath-s02.3264.01.P0B00S04' size 100g, 15:14:46 4 '/dev/qdata/mpath-s03.3264.01.P0B00S04' size 200g 15:14:46 5 ; Diskgroup created. Elapsed: 00:00:10.12 15:14:56 SQL> select b.name,a.failgroup,a.name,a.total_mb/1024,(a.total_mb-a.free_mb)/1024 used_mb from v$asm_disk a,v$asm_diskgroup b where a.group_number=b.group_number and b.name='SYDG' ; NAME FAILGROUP NAME A.TOTAL_MB/1024 USED_MB ---------------- ---------------- ---------------- --------------- ---------- SYDG SYDG_0000 SYDG_0000 100 7.51660156 SYDG SYDG_0002 SYDG_0002 200 15.0283203 SYDG SYDG_0001 SYDG_0001 100 7.51171875 Elapsed: 00:00:00.02
所以total_mb等於創建磁盤組時所有磁盤空間大小總和,並且asm數據是根據磁盤大小來分布的,上面創建了30G表空間,會根據磁盤比例平均分配到各個磁盤。
Free_mb:Unused capacity of the disk (in megabytes)
磁盤剩余空間,字面意思了所有磁盤剩余空間總和
14:15:21 SQL> create diskgroup testdg normal redundancy failgroup f1 disk
14:16:26 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 14:16:26 3 failgroup f2 disk 14:16:26 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 14:16:26 5 failgroup f3 disk 14:16:26 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s03.3265.01.P0B00S12' size 200g 14:16:26 7 ; Diskgroup created.
ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 593747 307200 143273 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
Req_mir_free_MB:Amount of space that is required to be available in a given disk group in order to restore redundancy after one or more disk failures. The amount of space displayed in this column takes mirroring effects into account.
上面輸出說明asm 使用failgroup testdg需要307200MB空余空間來保證當asm failgroup掛了以后有足夠空間恢復數據。這里的場景正好是failgroup大小不一樣的場景,可以看出來他是選擇了最大的failgroup f3。
Usable_file_MB:Amount of free space that can be safely utilized taking mirroring into account and yet be able to restore redundancy after a disk failure
在保證冗余度的情況下可以使用的空間大小,上述輸出testdg中該大小為143273MB,
16:24:21 SQL> select (593747-307200)/2 from dual; (593747-307200)/2 ----------------- 143273.5
可以看出來計算公式就是free_mb - required_mirror_free_mb) / N n是副本數量。
sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 532302 307200 112551 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
testdg增加30G表空間可以看到跟預期的一樣,free_mb少了60G,Usable_file_MB少了30G,繼續加表空間。
sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 347967 307200 20383 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING sys@YEDB>alter tablespace test add datafile '+TESTDG' size 30g autoextend off; Tablespace altered. grid@com1:/home/grid>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 716800 286522 307200 -10339 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
usable_MB變成了負數!這意味着當我們遇到了testdg failgroup損壞將無法還原成normal冗余度,因為空間不夠了,並且useable_file_MB是可以監控的, 當他達到負數的時候你應該立刻進行磁盤組擴容或者清理磁盤組里的表空間數據文件。同時也意味着只要你free_mb足夠,asm都不會阻止你使用剩余的磁盤空間。
此時如果再刪除磁盤組會怎么樣?
16:51:19 SQL> alter diskgroup testdg drop disks in failgroup f1; Diskgroup altered.
看到磁盤還是可以刪除,看看rebalance情況及觀察asm日志;
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
16:56:31 SQL> select * from v$asm_operation;
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- ----------------------------------------------------------------------------------------
4 REBAL ERRS 11 ORA-15041
Elapsed: 00:00:00.14
SQL> alter diskgroup testdg drop disks in failgroup f1 NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=4 Wed Jul 11 16:51:33 2018 GMON updating for reconfiguration, group 4 at 101 for pid 28, osid 57422 NOTE: group 4 PST updated. Wed Jul 11 16:51:33 2018 WARNING: diskgroup compatibility limits power to 11 NOTE: membership refresh pending for group 4/0x340a0df4 (TESTDG) GMON querying group 4 at 102 for pid 18, osid 23944 SUCCESS: refreshed membership for 4/0x340a0df4 (TESTDG) NOTE: starting rebalance of group 4/0x340a0df4 (TESTDG) at power 11 SUCCESS: alter diskgroup testdg drop disks in failgroup f1 Starting background process ARB0 Wed Jul 11 16:51:36 2018 ARB0 started with pid=34, OS id=57507 NOTE: assigning ARB0 to group 4/0x340a0df4 (TESTDG) with 11 parallel I/Os cellip.ora not found. NOTE: F1X0 copy 1 relocating from 0:2 to 5:2 for diskgroup 4 (TESTDG) NOTE: F1X0 copy 3 relocating from 5:2 to 65534:4294967294 for diskgroup 4 (TESTDG) NOTE: Attempting voting file refresh on diskgroup TESTDG NOTE: Refresh completed on diskgroup TESTDG. No voting file found. Wed Jul 11 16:56:06 2018 ERROR: ORA-15041 thrown in ARB0 for group number 4 Errors in file /opt/ogrid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_57507.trc: ORA-15041: diskgroup "TESTDG" space exhausted Wed Jul 11 16:56:06 2018 NOTE: stopping process ARB0 NOTE: rebalance interrupted for group 4/0x340a0df4 (TESTDG)
確實rebalance無法完成了。
此時磁盤狀態
17:18:37 SQL> select name,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE from v$asm_disk where group_number=4; NAME MOUNT_STATUS HEADER_STATUS MODE_STATUS STATE ---------------- -------------- ------------------------ -------------- ---------------- TESTDG_0001 CACHED MEMBER ONLINE DROPPING TESTDG_0000 CACHED MEMBER ONLINE DROPPING TESTDG_0005 CACHED MEMBER ONLINE NORMAL TESTDG_0004 CACHED MEMBER ONLINE NORMAL TESTDG_0003 CACHED MEMBER ONLINE NORMAL TESTDG_0002 CACHED MEMBER ONLINE NORMA
對於未刪除的可以使用undrop操作:
17:17:04 SQL> alter diskgroup testdg undrop disks ; Diskgroup altered. Elapsed: 00:00:07.16
但還是報錯
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE ------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- ---------------------------------------------------------------------------------------- 4 REBAL ERRS 11 ORA-15041
再次說明當usable為負數的時候磁盤是無法完成rebalance的,刪除部分空間
sys@YEDB>alter tablespace test drop datafile '+TESTDG/yedb/datafile/test.262.981218519'; Tablespace altered. ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL Y 512 4096 1048576 716800 347965 307200 20382 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/ 17:23:29 SQL> select * from v$asm_operation; no rows selected
接下來我們看看創建一個只有2個failgroup的nomal模式的dg
17:32:07 SQL> create diskgroup testdg normal redundancy failgroup f1 disk 17:32:17 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 17:32:17 3 failgroup f2 disk 17:32:17 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 17:32:17 5 ; grid@com1:/opt/ogrid/diag/asm/+asm/+ASM1/trace>asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED NORMAL N 512 4096 1048576 409600 409494 102400 153547 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING
驚奇的發現Req_mir_free_MB變成了100G,這個值不是應該是最大failgroup大小么,那不應該是200G?
仔細想想可以想明白,一個2個failgroup保存了2分副本,當一個failgroup掛了asm還能做到冗余?做不到,所以只能預測一個磁盤掉了以后所需要的空間。
REQUIRED_MIRROR_FREE_MB
和 USABLE_FILE_MB在high模式下情況:
17:57:26 SQL> create diskgroup testdg high redundancy failgroup f1 disk 17:57:42 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g 17:57:42 3 failgroup f2 disk 17:57:42 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g 17:57:42 5 failgroup f3 disk 17:57:42 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g 17:57:42 7 failgroup f4 disk '/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g failgroup f5 disk '/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g failgroup f6 disk '/dev/qdata/mpath-s03.3265.01.P0B00S12' size 100g; Diskgroup created. Elapsed: 00:00:08.32 17:57:50 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 614400 614241 204800 136480 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
可以看到req_mir_free_mb等於2倍的failgroup大小,因為它需要最少2份副本
usable_mir_free_mb=(614241-204800)/3=136480MB符合預期。
18:15:41 SQL> create diskgroup testdg high redundancy failgroup f1 disk 18:15:46 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s01.3265.01.P0B00S12' size 100g 18:15:46 3 failgroup f2 disk 18:15:47 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s02.3265.01.P0B00S12' size 100g 18:15:47 5 failgroup f3 disk 18:15:47 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g,'/dev/qdata/mpath-s03.3265.01.P0B00S12' size 100g 18:15:47 7 ; Diskgroup created. Elapsed: 00:00:09.93 18:15:57 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 614400 614241 204800 136480 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
這里看到的204800並不是一個failgroup大小,而是2倍disk大小,因為只有3個failgroup,所以當一個failgroup掛了以后是怎么都恢復不了冗余度的,跟前面normal一樣,因為2倍冗余所以需要2倍disk大小。
再來看最后一個,
18:11:04 SQL> create diskgroup testdg high redundancy failgroup f1 disk 18:11:13 2 '/dev/qdata/mpath-s01.3265.01.P0B00S13' size 100g 18:11:13 3 failgroup f2 disk 18:11:13 4 '/dev/qdata/mpath-s02.3265.01.P0B00S13' size 100g 18:11:13 5 failgroup f3 disk 18:11:13 6 '/dev/qdata/mpath-s03.3265.01.P0B00S13' size 100g 18:11:13 7 ; Diskgroup created. Elapsed: 00:00:07.11 18:11:20 SQL> !asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 1048576 5546479 5543405 2289291 1627057 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 2744064 2710376 914688 897844 0 N REDODG/ MOUNTED EXTERN N 512 4096 1048576 409600 378822 0 378822 0 N SYDG/ MOUNTED HIGH N 512 4096 1048576 307200 307047 0 102349 0 N TESTDG/ MOUNTED NORMAL N 512 4096 1048576 18432 17468 6144 5662 0 Y VOTING/
asm在掉一個盤都無法恢復冗余度時,會直接將req_mir_free_mb設置為0