1 磁盤虛擬化簡介
QEMU-KVM 提供磁盤虛擬化,從虛擬機角度看其自身擁有的磁盤即是實際的物理磁盤。實際上,虛擬機讀寫的磁盤數據保存在 host 上的物理磁盤。
QEMU-KVM 主要有如下幾種方式虛擬磁盤:
-
本地存儲虛擬機鏡像文件。
-
host 上物理磁盤或磁盤分區。
-
LVM(Logical Volume Management),邏輯分區。
-
NFS(Network File System),網絡文件系統。
-
GFS(Gluster File System),分布式文件系統。
2 磁盤虛擬化配置
本節針對常用的虛擬磁盤方式進行介紹,包括本地存儲虛擬機鏡像文件和 LVM 邏輯分區:
2.1 本地存儲鏡像
本地存儲鏡像文件,首先需要在本地創建鏡像文件,然后指定本地鏡像文件為虛擬機的磁盤。
通過 qemu-img 命令創建鏡像文件,qemu-img 是編譯安裝完 QEMU 即默認自帶的軟件程序,常用的 qemu-img 選項有 create 和 info,create 用來創建鏡像 img,info 用來查看鏡像信息:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo.img -o preallocation=off 10G Formatting 'lianhua_demo.img', fmt=raw size=10737418240 preallocation=off real 0m0.040s user 0m0.015s sys 0m0.015s [lianhua@host ~]$ qemu-img info lianhua_demo.img image: lianhua_demo.img file format: raw virtual size: 10G (10737418240 bytes) disk size: 0
如上所示,創建了一個名為 lianhua_demo.img 的格式為 raw 的鏡像,該鏡像大小為 10G。但是,使用 info 查看鏡像信息時,鏡像的實際大小為 0(disk size)。這是由於 raw 格式的鏡像可以指定自己為稀疏文件,如果是稀疏文件,那么只在寫數據到鏡像的時候才會真正為其分配空間。
qemu-img 的 preallocation 選項可以指定是否預分配空間,它有三個值,off/full 和 falloc。off 表示禁止預分配空間;full 表示為鏡像預分配空間,預分配的方式是給鏡像逐字節寫 0;falloc 表示預分配磁盤空間給鏡像文件,但不往鏡像文件中寫數據。比較上述三種分配方式,如下:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_full.img -o preallocation=full 10G Formatting 'lianhua_demo_on.img', fmt=raw size=10737418240 preallocation=full real 0m22.955s user 0m0.013s sys 0m8.930s
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_falloc.img -o preallocation=falloc 10G Formatting 'lianhua_demo_falloc.img', fmt=raw size=10737418240 preallocation=falloc real 0m8.256s user 0m0.008s sys 0m8.114s [lianhua@host ~]$ du -h lianhua_demo*.img 11G lianhua_demo_falloc.img 0 lianhua_demo.img 11G lianhua_demo_full.img
鏡像格式有多種,除了 raw 外,還有常用的磁盤格式 qcow2,vdi 等。
鏡像分配完磁盤空間后,使用 qemu-kvm 創建虛擬機,並且將鏡像文件作為虛擬機的磁盤,命令如下:
[lianhua@host ~]$ /usr/libexec/qemu-kvm -m 1024 -smp 2 -hda lianhua_demo_falloc.img -monitor stdio WARNING: Image format was not specified for 'lianhua_demo_falloc.img' and probing guessed raw. Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted. Specify the 'raw' format explicitly to remove the restrictions. QEMU 2.6.0 monitor - type 'help' for more information (qemu) VNC server running on '::1;5900' (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:1237 id "" Bus 0, device 1, function 1: IDE controller: PCI device 8086:7010 BAR4: I/O at 0xc040 [0xc04f]. id "" Bus 0, device 1, function 3: Bridge: PCI device 8086:7113 IRQ 9. id "" Bus 0, device 3, function 0: Ethernet controller: PCI device 8086:100e IRQ 11. BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff]. BAR1: I/O at 0xc000 [0xc03f]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id ""
可以看出,鏡像在虛擬機中的 pci 號為 00:01:1,且該設備為 IDE 設備。qemu-kvm 的 -hda 選項將鏡像文件作為虛擬機的第一個 IDE 設備,在虛擬機中表現為 /dev/hda 設備或 /dev/sda 設備(驅動不同,表現的設備名稱不同),更多磁盤選項配置可以查看 qemu-kvm 的 man 文檔。
2.2 LVM 邏輯分區
LVM 邏輯分區,首先需要使用 LVM 創建 volume,然后將此 volume attach 到虛擬機上作為虛擬機的磁盤。
在 OpenStack 平台上創建 LVM volume,如下:
[root@host ~]# pvdisplay --- Physical volume --- PV Name /dev/loop2 VG Name cinder-volumes PV Size 602.34 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 154199 Free PE 146519 Allocated PE 7680 PV UUID pTkQ5Z-zNdc-LRrn-qWAX-13D6-bhbG-DdcGFD [root@host ~]# vgdisplay --- Volume group --- VG Name cinder-volumes System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1447 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 602.34 GiB PE Size 4.00 MiB Total PE 154199 Alloc PE / Size 7680 / 30.00 GiB Free PE / Size 146519 / 572.34 GiB VG UUID Mrrh1r-qKQw-bCgW-0WXi-d5Bd-OiVV-cTBrg5 [root@host ~]# lvdisplay --- Logical volume --- LV Path /dev/cinder-volumes/volume-c34555f0-fd26-42fe-a3b2-86098b590be2 LV Name volume-c34555f0-fd26-42fe-a3b2-86098b590be2 VG Name cinder-volumes LV UUID n81Af6-cWEe-LvAm-wjA3-vgKD-RxgV-qMtIq6 LV Write Access read/write LV Creation host, time host.localdomain, 2020-08-02 00:47:30 +0800 LV Status available # open 1 LV Size 26.00 GiB Current LE 6656 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:0
volume 創建成功后將 volume attach 到虛擬機上作為虛擬機的磁盤:
[root@host ~]# openstack volume list +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+ | c34555f0-fd26-42fe-a3b2-86098b590be2 | lianhua-vm1-vol | in-use | 26 | Attached to lianhua-vm1-vol on /dev/vdb | +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
volume attach 到虛擬機上,在虛擬機中的磁盤設備名為 /dev/vdb。進入虛擬機,查看該磁盤設備的詳細信息:
[root@lianhua-vm1:/home/robot] # fdisk -l | grep vdb Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors [root@lianhua-vm1:/home/robot] # lspci ... 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device [root@lianhua-vm1:/home/robot] # lspci -s 00:0b.0 -vvv 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device Subsystem: Red Hat, Inc. Device 0002 Physical Slot: 11 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at 1000 [size=64] Region 1: Memory at c0004000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at c0000000 (64-bit, prefetchable) [size=16K] Capabilities: [98] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> BAR=0 offset=00000000 size=00000000 Capabilities: [70] Vendor Specific Information: VirtIO: Notify BAR=4 offset=00003000 size=00001000 multiplier=00000004 Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg BAR=4 offset=00002000 size=00001000 Capabilities: [50] Vendor Specific Information: VirtIO: ISR BAR=4 offset=00001000 size=00001000 Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg BAR=4 offset=00000000 size=00001000 Kernel driver in use: virtio-pci
不同於 -hda 指定的磁盤設備,這里的磁盤設備名以 vd 開頭,這是因為它們是通過 virtio 半虛擬化方式分配的磁盤設備,從上例可以看出磁盤設備 vdb 的 pci 號為 00:0b:0,其使用的驅動為 virtio-pci。
在 libvirt XML 文件的 devices 標簽下定義 disk, 實現使用 virtio 半虛擬化方式分配磁盤設備:
<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/> <target dev='vdb' bus='virtio'/> <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </disk>
同理,也可以在 qemu-kvm 的 device 和 drive 選項下指定 virtio-blk-pci 參數實現 virtio 半虛擬化方式分配磁盤設備:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# /usr/libexec/qemu-kvm -m 1024 -smp 2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -monitor stdio
3 磁盤虛擬化環境部署
根據上兩節的描述,部署一個簡單的環境實現磁盤虛擬化及磁盤文件共享,部署環境如下:
-
使用 virtio 半虛擬化方式指定鏡像文件實現磁盤虛擬化,虛擬出的磁盤設備名為 vda。
-
使用 virtio 半虛擬化方式指定 volume 實現磁盤虛擬化,虛擬出的磁盤設備名為 vdb。
-
在虛擬機內部使用 LVM 分割磁盤設備 vdb 為 lv volume,並將 volume 指定為文件系統。
-
使用 NFS 方式共享虛擬機的文件系統。
示意圖如下:
1) 查看 virtio 的 libvirt XML 配置:
<devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/> <target dev='vdb' bus='virtio'/> <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </disk> </devices>
vda 磁盤設備所使用的鏡像文件為 /var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk,它在虛擬機的磁盤設備名為 /dev/vda,且 pci 號為 00:06:0。
使用 qemu-info 查看 disk 鏡像文件:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# qemu-img info disk image: disk file format: qcow2 virtual size: 40G (42949672960 bytes) disk size: 1.7G cluster_size: 65536 backing file: /var/lib/nova/instances/_base/52968ae0bfbfeef835844ee0b97be5e45d382e4c Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
可以看出,disk 分配的虛擬磁盤容量為 40G,而它現在占的磁盤空間是 1.7G。
vdb 為 volume 分配的磁盤設備,在虛擬機中的磁盤設備名為 /dev/vdb,且 pci 號為 00:0b:0。
進入虛擬機查看磁盤設備是否分配:
[root@lianhua-vm1:/home/robot] # fdisk -l | grep vd Disk /dev/vda: 40 GiB, 42949672960 bytes, 83886080 sectors # 這里 vda 的磁盤容量為 40G /dev/vda1 * 2048 83886046 83883999 40G 83 Linux Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors [root@lianhua-vm1:/home/robot] # lspci | grep block 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
虛擬機中成功分配磁盤設備,且從 vda 中分出磁盤分區 vda1 給操作系統的文件系統使用。
2) 查看虛擬機內磁盤設備 vdb 分割的 lv volume:
[root@lianhua-vm1:/home/robot] # pvdisplay --- Physical volume --- PV Name /dev/vdb VG Name lianhua-vm1-vol PV Size 26.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 6655 Free PE 1405 Allocated PE 5250 PV UUID OqdKmO-PspN-0ZKe-M0l4-0vGD-cY7k-VjZvTJ [root@lianhua-vm1:/home/robot] # vgdisplay --- Volume group --- VG Name lianhua-vm1-vol System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 7 VG Access read/write VG Status resizable MAX LV 0 Cur LV 6 Open LV 6 Max PV 0 Cur PV 1 Act PV 1 VG Size <26.00 GiB PE Size 4.00 MiB Total PE 6655 Alloc PE / Size 5250 / <20.51 GiB Free PE / Size 1405 / <5.49 GiB VG UUID JcVrao-YnJ7-mRpK-8Rxc-i07i-WVH4-aVgAoD [root@lianhua-vm1:/home/robot] # lvdisplay --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_sys LV Name provider_sys VG Name lianhua-vm1-vol LV UUID C6byt7-5cby-h2RT-xcLg-OJU0-Qq1E-27G6jB LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800 LV Status available # open 1 LV Size <9.77 GiB Current LE 2500 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0 --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_lianhua LV Name provider_lianhua VG Name lianhua-vm1-vol LV UUID vfeZg8-PKVR-kKxv-yidf-rQqp-A7De-CvXqws LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800 LV Status available # open 1 LV Size 4.88 GiB Current LE 1250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:1 --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_log LV Name provider_log VG Name lianhua-vm1-vol LV UUID mHdD60-QjSy-sRlz-GLmK-CFIM-l42c-QGthXa LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:12 +0800 LV Status available # open 1 LV Size 1000.00 MiB Current LE 250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:2
3) 指定 lv 的文件系統為 log/sys/lianhua,並且通過 NFS 的方式共享文件系統:
[root@lianhua-vm1:/home/robot] # df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 40G 7.0G 31G 19% / /dev/mapper/lianhua-vm1-vol-provider_sys 9.1G 37M 8.6G 1% /mnt/sys /dev/mapper/lianhua-vm1-vol-provider_lianhua 4.6G 20M 4.3G 1% /mnt/lianhua /dev/mapper/lianhua-vm1-vol-provider_log 922M 18M 838M 3% /mnt/log
(看這里詳細了解 NFS)
進入 VM2 查看文件系統是否共享成功:
[root@lianhua-vm2:/mnt/log] # ls [root@lianhua-vm2:/mnt/log] # mkdir lianhua [root@lianhua-vm2:/mnt/log] # ls lianhua [root@lianhua-vm1:/mnt/log] # ls lianhua
文件共享成功,環境部署完畢。