目錄
文章目錄
- 目錄
- 前文列表
- 注冊(Enrollment)裸機
- 創建裸金屬實例的 Flavor
- 部署裸金屬實例
- 日志分析
- 問題:Failed to create neutron ports for any PXE enabled port on node
- 問題:獲取 Deploy Image 時 MissingAuthPlugin: An auth plugin is required to determine endpoint URL
- 問題:執行 provide 時長時間卡在 provision_state:clean wait
- 問題:ironic node 無法被調度
- 問題:獲取 swift_temp_url 時 MissingAuthPlugin: An auth plugin is required to determine endpoint URL
- 問題:Timeout reached while waiting for callback for node
前文列表
《Ironic 裸金屬管理服務》
《Ironic 裸金屬管理服務的底層技術支撐》
《Ironic 裸金屬實例的部署流程》
《Ironic 裸金屬管理服務的網絡模型》
《手動集成 Ironic 裸金屬管理服務(Rocky)》
注冊(Enrollment)裸機
首先我們要針對裸金屬服務器的硬件設備來確定所需要使用到的 Ironic Driver 類型,這里我們選擇使用比較通用的 IPMI。
[root@controller ~]# openstack baremetal driver show ipmi
+-------------------------------+----------------------------+
| Field | Value |
+-------------------------------+----------------------------+
| default_bios_interface | no-bios |
| default_boot_interface | pxe |
| default_console_interface | ipmitool-socat |
| default_deploy_interface | iscsi |
| default_inspect_interface | inspector |
| default_management_interface | ipmitool |
| default_network_interface | neutron |
| default_power_interface | ipmitool |
| default_raid_interface | agent |
| default_rescue_interface | no-rescue |
| default_storage_interface | noop |
| default_vendor_interface | ipmitool |
| enabled_bios_interfaces | no-bios |
| enabled_boot_interfaces | pxe |
| enabled_console_interfaces | ipmitool-socat, no-console |
| enabled_deploy_interfaces | iscsi, direct |
| enabled_inspect_interfaces | inspector |
| enabled_management_interfaces | ipmitool |
| enabled_network_interfaces | flat, neutron |
| enabled_power_interfaces | ipmitool |
| enabled_raid_interfaces | agent |
| enabled_rescue_interfaces | no-rescue |
| enabled_storage_interfaces | cinder, noop |
| enabled_vendor_interfaces | ipmitool, no-vendor |
| hosts | baremetal |
| name | ipmi |
| type | dynamic |
+-------------------------------+----------------------------+
下列打印出 IPMI 驅動程序的屬性清單,這些屬性都需要由雲管人員提供。
[root@controller ~]# openstack baremetal driver property list ipmi
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Description |
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| deploy_forces_oob_reboot | Whether Ironic should force a reboot of the Node via the out-of-band channel after deployment is complete. Provides compatibility with older deploy ramdisks. Defaults to False. Optional. |
| deploy_kernel | UUID (from Glance) of the deployment kernel. Required. |
| deploy_ramdisk | UUID (from Glance) of the ramdisk that is mounted at boot time. Required. |
| force_persistent_boot_device | True to enable persistent behavior when the boot device is set during deploy and cleaning operations. Defaults to False. Optional. |
| ipmi_address | IP address or hostname of the node. Required. |
| ipmi_bridging | bridging_type; default is "no". One of "single", "dual", "no". Optional. |
| ipmi_disable_boot_timeout | By default ironic will send a raw IPMI command to disable the 60 second timeout for booting. Setting this option to False will NOT send that command on this node. The [ipmi]disable_boot_timeout will be used if this option is not set. Optional. |
| ipmi_force_boot_device | Whether Ironic should specify the boot device to the BMC each time the server is turned on, eg. because the BMC is not capable of remembering the selected boot device across power cycles; default value is False. Optional. |
| ipmi_local_address | local IPMB address for bridged requests. Used only if ipmi_bridging is set to "single" or "dual". Optional. |
| ipmi_password | password. Optional. |
| ipmi_port | remote IPMI RMCP port. Optional. |
| ipmi_priv_level | privilege level; default is ADMINISTRATOR. One of ADMINISTRATOR, CALLBACK, OPERATOR, USER. Optional. |
| ipmi_protocol_version | the version of the IPMI protocol; default is "2.0". One of "1.5", "2.0". Optional. |
| ipmi_target_address | destination address for bridged request. Required only if ipmi_bridging is set to "single" or "dual". |
| ipmi_target_channel | destination channel for bridged request. Required only if ipmi_bridging is set to "single" or "dual". |
| ipmi_terminal_port | node's UDP port to connect to. Only required for console access. |
| ipmi_transit_address | transit address for bridged request. Required only if ipmi_bridging is set to "dual". |
| ipmi_transit_channel | transit channel for bridged request. Required only if ipmi_bridging is set to "dual". |
| ipmi_username | username; default is NULL user. Optional. |
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
因為本文的環境是 Rocky 版本,所以我們可以手動指定更高的 API 版本。e.g.
In the examples below we will use version 1.11 of the Bare metal API. This gives us the following advantages:
- Explicit power credentials validation before leaving the enroll state.
- Running node cleaning before entering the available state.
- Not exposing half-configured nodes to the scheduler.
$ export IRONIC_API_VERSION=1.11
$ export OS_BAREMETAL_API_VERSION=1.11
- 創建一個執行 IPMI 的 ironic node
[root@controller ~]# openstack baremetal node create --help
usage: openstack baremetal node create [-h] [-f {json,shell,table,value,yaml}]
[-c COLUMN] [--max-width <integer>]
[--fit-width] [--print-empty]
[--noindent] [--prefix PREFIX]
[--chassis-uuid <chassis>] --driver
<driver> [--driver-info <key=value>]
[--property <key=value>]
[--extra <key=value>] [--uuid <uuid>]
[--name <name>]
[--bios-interface <bios_interface>]
[--boot-interface <boot_interface>]
[--console-interface <console_interface>]
[--deploy-interface <deploy_interface>]
[--inspect-interface <inspect_interface>]
[--management-interface <management_interface>]
[--network-interface <network_interface>]
[--power-interface <power_interface>]
[--raid-interface <raid_interface>]
[--rescue-interface <rescue_interface>]
[--storage-interface <storage_interface>]
[--vendor-interface <vendor_interface>]
[--resource-class <resource_class>]
[--conductor-group <conductor_group>]
[root@controller ~]# openstack baremetal node create --driver ipmi --name BM01
+------------------------+--------------------------------------+
| Field | Value |
+------------------------+--------------------------------------+
| chassis_uuid | None |
| clean_step | {} |
| console_enabled | False |
| created_at | 2019-05-09T07:51:38+00:00 |
| driver | ipmi |
| driver_info | {} |
| driver_internal_info | {} |
| extra | {} |
| inspection_finished_at | None |
| inspection_started_at | None |
| instance_info | {} |
| instance_uuid | None |
| last_error | None |
| maintenance | False |
| maintenance_reason | None |
| name | BM01 |
| power_state | None |
| properties | {} |
| provision_state | enroll |
| provision_updated_at | None |
| reservation | None |
| target_power_state | None |
| target_provision_state | None |
| updated_at | None |
| uuid | c1729b3f-9ada-4def-8dcb-43f919b9b997 |
+------------------------+--------------------------------------+
當前許多 ironic node info 都是 None,需要后續繼續更新。
- 設置部署接口類型,現在可支持 iSCSI、Direct、Ansible 等類型,每種類型都有不同的行為模型,可根據實際情況選擇,這里我們選擇最簡單的、但卻在生產環境中並不特別友好(占用 Provisioning Network 的帶寬)的 iSCS 類型。
openstack baremetal --os-baremetal-api-version 1.31 node set e322f49a-ad50-468d-a031-29bde068c290 \
--deploy-interface iscsi \
--raid-interface agent
- 設置 driver_info,這里即 IPMI info,主要是提供 IPMI 的登錄賬戶信息。
openstack baremetal node set e322f49a-ad50-468d-a031-29bde068c290 \
--driver-info ipmi_username=admin \
--driver-info ipmi_password=admin \
--driver-info ipmi_address=172.18.22.106 \
--driver-info ipmi_port=623
NOTE:IPMI Driver 官方文檔
- 設置 Deploy Images,通過 RAMDisk 的方式啟動。
openstack baremetal node set e322f49a-ad50-468d-a031-29bde068c290 \
--driver-info deploy_kernel=e650d33b-8fad-42f7-948c-5c12526bcd07 \
--driver-info deploy_ramdisk=6000a17f-0ab7-418a-990c-2009a59c3392
- 設置 Provisioning/Cleaning Network。
openstack baremetal node set e322f49a-ad50-468d-a031-29bde068c290 \
--driver-info cleaning_network=b90fce07-0f32-4ba5-a1fd-a8e5e00f9c65 \
--driver-info provisioning_network=b90fce07-0f32-4ba5-a1fd-a8e5e00f9c65
- 設置 ironic node 的 PXE 網卡 MAC 地址,在 Provisioning Network 中通過這個 MAC 地址來為其分配 IP 地址。
openstack baremetal port create 2C:60:0C:6E:C2:A8 --node e322f49a-ad50-468d-a031-29bde068c290
NOTE:部署裸金屬實例成功之后 PXE 網卡的 MAC 地址會被綁定到對應的 Tenant Network Port。
- 為 ironic node 設置 Placement 篩選候選人的 Resource Class 類型,nova-compute for Ironic 會自動為其創建 Placement Resource Provider。
$ openstack --os-baremetal-api-version 1.21 baremetal node set e322f49a-ad50-468d-a031-29bde068c290 \
--resource-class BAREMETAL_TEST
$ openstack resource provider list
+--------------------------------------+--------------------------------------+------------+--------------------------------------+----------------------+
| uuid | name | generation | root_provider_uuid | parent_provider_uuid |
+--------------------------------------+--------------------------------------+------------+--------------------------------------+----------------------+
| 841d70e5-c3b1-4ded-8bb2-60f4784f7a0d | controller | 23 | 841d70e5-c3b1-4ded-8bb2-60f4784f7a0d | None |
| da6bcc18-34a3-4ad0-9957-7b057fbb1bbc | compute | 34 | da6bcc18-34a3-4ad0-9957-7b057fbb1bbc | None |
| e322f49a-ad50-468d-a031-29bde068c290 | e322f49a-ad50-468d-a031-29bde068c290 | 1 | e322f49a-ad50-468d-a031-29bde068c290 | None |
+--------------------------------------+--------------------------------------+------------+--------------------------------------+----------------------+
$ openstack resource provider inventory list e322f49a-ad50-468d-a031-29bde068c290
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| VCPU | 1.0 | 2 | 2 | 1 | 1 | 2 |
| MEMORY_MB | 1.0 | 8192 | 8192 | 1 | 1 | 8192 |
| DISK_GB | 1.0 | 100 | 100 | 1 | 1 | 100 |
| CUSTOM_BAREMETAL_TEST | 1.0 | 1 | 1 | 1 | 1 | 1 |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
- 為 ironic node 設置 Placement 篩選候選人的 Resource Traits 標簽。
export OS_PLACEMENT_API_VERSION=1.17
openstack baremetal node add trait e322f49a-ad50-468d-a031-29bde068c290 \
CUSTOM_TRAIT1 HW_CPU_X86_VMX
[root@controller ~]# openstack resource provider trait list e322f49a-ad50-468d-a031-29bde068c290
+----------------+
| name |
+----------------+
| HW_CPU_X86_VMX |
| CUSTOM_TRAIT1 |
+----------------+
NOTE:這個操作需要較高的 Placement API 版本 <= 1.17
- 設置 ironic node 的基礎資源信息,作為 Placement 篩選候選人的參數因子。
openstack baremetal node set e322f49a-ad50-468d-a031-29bde068c290 \
--property cpus=2 \
--property memory_mb=8192 \
--property local_gb=100
- 如果裸機服務器設定的是 UEFI,那么需要設置 ironic node 的 boot mode。
openstack baremetal node set e322f49a-ad50-468d-a031-29bde068c290 --property capabilities='boot_mode:uefi'
- 驗證上述錄入的 ironic node infos 是否合規。
$ openstack baremetal node validate e322f49a-ad50-468d-a031-29bde068c290
+------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Interface | Result | Reason |
+------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bios | False | Driver ipmi does not support bios (disabled or not implemented). |
| boot | False | Cannot validate image information for node e322f49a-ad50-468d-a031-29bde068c290 because one or more parameters are missing from its instance_info and insufficent information is present to boot from a remote volume. Missing are: ['ramdisk', 'kernel', 'image_source'] |
| console | False | Missing 'ipmi_terminal_port' parameter in node's driver_info. |
| deploy | False | Cannot validate image information for node e322f49a-ad50-468d-a031-29bde068c290 because one or more parameters are missing from its instance_info and insufficent information is present to boot from a remote volume. Missing are: ['ramdisk', 'kernel', 'image_source'] |
| inspect | True | |
| management | True | |
| network | True | |
| power | True | |
| raid | True | |
| rescue | False | Driver ipmi does not support rescue (disabled or not implemented). |
| storage | True | |
+------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
NOTE:這里出現了 4 個 False,但沒有致命影響,其中 bios、console 是因為我們沒有提供相應的驅動,屬於可選項。而 boot、deploy 在 Nova Driver for Ironic 的環境中是無法通過驗證的。
- 驗證 ironic node 是否能夠被納管。
# To move a node from enroll to manageable provision state
$ openstack baremetal --os-baremetal-api-version 1.11 node manage e322f49a-ad50-468d-a031-29bde068c290
$ openstack baremetal node show e322f49a-ad50-468d-a031-29bde068c290 | grep provision_state
| provision_state | manageable
# To move a node from manageable to available provision state
$ openstack baremetal --os-baremetal-api-version 1.11 node provide e322f49a-ad50-468d-a031-29bde068c290
$ [root@controller ~]# openstack baremetal node show e322f49a-ad50-468d-a031-29bde068c290 | grep provision_state
| provision_state | available
- 查看當前的 ironic node 狀態
[root@controller ~]# openstack baremetal node show e322f49a-ad50-468d-a031-29bde068c290
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bios_interface | no-bios |
| boot_interface | pxe |
| chassis_uuid | None |
| clean_step | {} |
| conductor_group | |
| console_enabled | False |
| console_interface | ipmitool-socat |
| created_at | 2019-05-09T10:46:33+00:00 |
| deploy_interface | iscsi |
| deploy_step | {} |
| driver | ipmi |
| driver_info | {u'ipmi_port': 623, u'ipmi_username': u'admin', u'deploy_kernel': u'e650d33b-8fad-42f7-948c-5c12526bcd07', u'ipmi_address': u'172.18.22.106', u'deploy_ramdisk': u'6000a17f-0ab7-418a-990c-2009a59c3392', u'ipmi_password': u'******', u'provisioning_network': u'b90fce07-0f32-4ba5-a1fd-a8e5e00f9c65', u'cleaning_network': u'b90fce07-0f32-4ba5-a1fd-a8e5e00f9c65'} |
| driver_internal_info | {u'agent_enable_ata_secure_erase': True, u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': True, u'agent_continue_if_ata_erase_failed': False} |
| extra | {} |
| fault | None |
| inspect_interface | inspector |
| inspection_finished_at | None |
| inspection_started_at | None |
| instance_info | {} |
| instance_uuid | None |
| last_error | None |
| maintenance | False |
| maintenance_reason | None |
| management_interface | ipmitool |
| name | BM01 |
| network_interface | flat |
| power_interface | ipmitool |
| power_state | power off |
| properties | {u'memory_mb': 8192, u'local_gb': 100, u'cpus': 2, u'capabilities': u'boot_mode:uefi'} |
| provision_state | available |
| provision_updated_at | 2019-05-10T09:56:23+00:00 |
| raid_config | {} |
| raid_interface | agent |
| rescue_interface | no-rescue |
| reservation | None |
| resource_class | BAREMETAL_TEST |
| storage_interface | noop |
| target_power_state | None |
| target_provision_state | None |
| target_raid_config | {} |
| traits | [u'CUSTOM_TRAIT1', u'HW_CPU_X86_VMX'] |
| updated_at | 2019-05-10T10:22:41+00:00 |
| uuid | e322f49a-ad50-468d-a031-29bde068c290 |
| vendor_interface | ipmitool |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- 驗證,自動添加了 Ironic Neutron Agent。
[root@baremetal ~]# openstack network agent list
+--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+
| 02ac17a4-9a27-4dd6-b11f-a6eada895432 | Open vSwitch agent | baremetal | None | :-) | UP | neutron-openvswitch-agent |
| 41925586-9119-4709-bc23-4668433bd413 | Metadata agent | controller | None | :-) | UP | neutron-metadata-agent |
| 43281ac1-7699-4a81-a5b6-d4818f8cf8f9 | Open vSwitch agent | controller | None | :-) | UP | neutron-openvswitch-agent |
| 7f879b42-4f93-4c36-b13e-e6cec004ce07 | Baremetal Node | e322f49a-ad50-468d-a031-29bde068c290 | None | :-) | UP | ironic-neutron-agent |
| b815e569-c85d-4a37-84ea-7bdc5fe5653c | DHCP agent | controller | nova | :-) | UP | neutron-dhcp-agent |
| d1ef7214-d26c-42c8-ba0b-2a1580a44446 | L3 agent | controller | nova | :-) | UP | neutron-l3-agent |
| f55311fc-635c-4985-ae6b-162f3fa8f886 | Open vSwitch agent | compute | None | :-) | UP | neutron-openvswitch-agent |
+--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+
創建裸金屬實例的 Flavor
官方文檔:https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html
在 Queen 版本中,Ironic 項目新增 Trait API,節點的 traits 信息可以注冊到計算服務的 Placement API 中,用於創建虛擬機時的調度。添加 Trait API 后,注冊到 Ironic 的裸機也可以通過 Trait API 注冊到 Placement 資源清單中,最終支持裸機的部署調度。
本文我們實踐通過 Placement 來完成裸機的調度,通過 Resource Class 來標識 ironic node 的資源類型,通過 Resource Traits 來標識 ironic node 的特征,還可以通過 resources:VCPU=0、resources:MEMORY_MB=0、resources:DISK_GB=0 來 disable scheduling。
- 創建 Flavor
openstack flavor create --ram 8192 --vcpus 2 --disk 100 my-baremetal-flavor
openstack flavor set --property resources:CUSTOM_BAREMETAL_TEST=1 my-baremetal-flavor
openstack flavor set --property resources:VCPU=0 my-baremetal-flavor
openstack flavor set --property resources:MEMORY_MB=0 my-baremetal-flavor
openstack flavor set --property resources:DISK_GB=0 my-baremetal-flavor
- 驗證,獲取 Placement 候選人
[root@controller ~]# openstack allocation candidate list --os-placement-api-version 1.17 --resource VCPU=2 --resource DISK_GB=100 --resource MEMORY_MB=8192 --resource CUSTOM_BAREMETAL_TEST=1
+---+-----------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------+------------------------------+
| # | allocation | resource provider | inventory used/capacity | traits |
+---+-----------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------+------------------------------+
| 1 | MEMORY_MB=8192,VCPU=2,DISK_GB=100,CUSTOM_BAREMETAL_TEST=1 | e322f49a-ad50-468d-a031-29bde068c290 | VCPU=0/2,MEMORY_MB=0/8192,DISK_GB=0/100,CUSTOM_BAREMETAL_TEST=0/1 | HW_CPU_X86_VMX,CUSTOM_TRAIT1 |
+---+-----------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------+------------------------------+
部署裸金屬實例
NOTE:為了操作方便,這里使用了 DevStack 環境執行部署動作,環境詳情詳情請瀏覽前文列表。
- 確保 ironic node 處於可部署的狀態
# Maintenance = False
[root@localhost ~]# openstack baremetal node list
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
| 30b7ee8f-7643-40d4-ae45-2bab737e1748 | node-0 | None | power off | available | False |
...
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
# Not reserved
[root@localhost ~]# openstack resource provider inventory list 30b7ee8f-7643-40d4-ae45-2bab737e1748
+------------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+------------------+------------------+----------+----------+-----------+----------+-------+
| CUSTOM_BAREMETAL | 1.0 | 1 | 0 | 1 | 1 | 1 |
+------------------+------------------+----------+----------+-----------+----------+-------+
# BareMetal Flavor
[root@localhost ~]# openstack flavor show 6d849a74-845f-414b-a1f3-66ae4a3c814d
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 10 |
| id | 6d849a74-845f-414b-a1f3-66ae4a3c814d |
| name | baremetal |
| os-flavor-access:is_public | True |
| properties | cpu_arch='x86_64', resources:CUSTOM_BAREMETAL='1', resources:DISK_GB='0', resources:MEMORY_MB='0', resources:VCPU='0', trait:CUSTOM_GOLD='required' |
| ram | 1280 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
# PXE Ports
[root@localhost ~]# openstack baremetal port show 0191759f-1869-4bc4-89af-bd08d03e146f
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| address | 52:54:00:c0:b1:82 |
| created_at | 2019-05-11T11:34:17+00:00 |
| extra | {} |
| internal_info | {} |
| is_smartnic | False |
| local_link_connection | {} |
| node_uuid | 30b7ee8f-7643-40d4-ae45-2bab737e1748 |
| physical_network | None |
| portgroup_uuid | None |
| pxe_enabled | True |
| updated_at | 2019-05-11T11:37:47+00:00 |
| uuid | 0191759f-1869-4bc4-89af-bd08d03e146f |
+-----------------------+--------------------------------------+
[root@localhost ~]# openstack baremetal port show 4757e6ec-a190-42b1-a098-b05adaa0cbca
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| address | 52:54:00:30:c7:e3 |
| created_at | 2019-05-11T11:34:19+00:00 |
| extra | {} |
| internal_info | {} |
| is_smartnic | False |
| local_link_connection | {} |
| node_uuid | 30b7ee8f-7643-40d4-ae45-2bab737e1748 |
| physical_network | None |
| portgroup_uuid | None |
| pxe_enabled | True |
| updated_at | 2019-05-11T11:37:47+00:00 |
| uuid | 4757e6ec-a190-42b1-a098-b05adaa0cbca |
+-----------------------+--------------------------------------+
- 執行部署
$ net_id=$(openstack network list | egrep "$PRIVATE_NETWORK_NAME"'[^-]' | awk '{ print $2 }')
$ image=$(openstack image show cirros-0.4.0-x86_64-disk -f value -c id)
$ openstack server create --flavor baremetal --nic net-id=$net_id --image $image testing
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | enLVnvyTQHQ3 |
| config_drive | |
| created | 2019-05-11T19:33:54Z |
| flavor | baremetal (6d849a74-845f-414b-a1f3-66ae4a3c814d) |
| hostId | |
| id | a2ccfb9a-0361-4f20-b11a-d1ea39c0f20b |
| image | cirros-0.4.0-x86_64-disk (3f734758-68fa-40ac-ba07-2bf8bd6f1911) |
| key_name | None |
| name | testing |
| progress | 0 |
| project_id | 920c5d4878f948a9879adb77aa5f6023 |
| properties | |
| security_groups | name='default' |
| status | BUILD |
| updated | 2019-05-11T19:33:54Z |
| user_id | 66dbd5cd1af34411860cf304cb4437ee |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
- 驗證
$ openstack server list
+--------------------------------------+---------+--------+-------------------+--------------------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------+--------+-------------------+--------------------------+-----------+
| a2ccfb9a-0361-4f20-b11a-d1ea39c0f20b | testing | ACTIVE | private=10.0.0.10 | cirros-0.4.0-x86_64-disk | baremetal |
+--------------------------------------+---------+--------+-------------------+--------------------------+-----------+
$ openstack baremetal node list
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| 30b7ee8f-7643-40d4-ae45-2bab737e1748 | node-0 | a2ccfb9a-0361-4f20-b11a-d1ea39c0f20b | power on | active | False |
| 64d4db44-67b6-4efa-974c-141218060b59 | node-1 | None | power off | available | False |
| bf1b7c3d-85e2-4a1c-a547-1189c4efda69 | node-2 | None | power off | available | False |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
$ ip netns exec qdhcp-485c431e-b06d-4192-89f6-a4f57b2f921b ping 10.0.0.10
- 查看端口隱射
[root@localhost ~]# openstack baremetal port show 0191759f-1869-4bc4-89af-bd08d03e146f
+-----------------------+------------------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------------------+
| address | 52:54:00:c0:b1:82 |
| created_at | 2019-05-11T11:34:17+00:00 |
| extra | {} |
| internal_info | {u'tenant_vif_port_id': u'26dd6399-ddec-496e-9ffe-50dc52429166'} |
| is_smartnic | False |
| local_link_connection | {} |
| node_uuid | 30b7ee8f-7643-40d4-ae45-2bab737e1748 |
| physical_network | None |
| portgroup_uuid | None |
| pxe_enabled | True |
| updated_at | 2019-05-11T19:33:58+00:00 |
| uuid | 0191759f-1869-4bc4-89af-bd08d03e146f |
+-----------------------+------------------------------------------------------------------+
[root@localhost ~]# openstack port show 26dd6399-ddec-496e-9ffe-50dc52429166
+-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| allowed_address_pairs | |
| binding_host_id | 30b7ee8f-7643-40d4-ae45-2bab737e1748 |
| binding_profile | |
| binding_vif_details | |
| binding_vif_type | binding_failed |
| binding_vnic_type | baremetal |
| created_at | 2019-05-11T19:33:55Z |
| data_plane_status | None |
| description | |
| device_id | a2ccfb9a-0361-4f20-b11a-d1ea39c0f20b |
| device_owner | compute:nova |
| dns_assignment | None |
| dns_domain | None |
| dns_name | None |
| extra_dhcp_opts | ip_version='4', opt_name='tag:ipxe6,67', opt_value='http://192.168.1.100:3928/boot.ipxe' |
| | ip_version='4', opt_name='150', opt_value='192.168.1.100' |
| | ip_version='4', opt_name='66', opt_value='192.168.1.100' |
| | ip_version='4', opt_name='tag:ipxe,67', opt_value='http://192.168.1.100:3928/boot.ipxe' |
| | ip_version='4', opt_name='tag:!ipxe,67', opt_value='undionly.kpxe' |
| | ip_version='4', opt_name='tag:!ipxe6,67', opt_value='undionly.kpxe' |
| | ip_version='4', opt_name='server-ip-address', opt_value='192.168.1.100' |
| fixed_ips | ip_address='10.0.0.10', subnet_id='04d5bdb0-f939-4b8f-94bd-0fd1a453ed6d' |
| id | 26dd6399-ddec-496e-9ffe-50dc52429166 |
| location | Munch({'project': Munch({'domain_id': 'default', 'id': u'920c5d4878f948a9879adb77aa5f6023', 'name': 'admin', 'domain_name': None}), 'cloud': '', 'region_name': 'RegionOne', 'zone': None}) |
| mac_address | 52:54:00:c0:b1:82 |
| name | |
| network_id | 485c431e-b06d-4192-89f6-a4f57b2f921b |
| port_security_enabled | True |
| project_id | 920c5d4878f948a9879adb77aa5f6023 |
| propagate_uplink_status | None |
| qos_policy_id | None |
| resource_request | None |
| revision_number | 15 |
| security_group_ids | 536df95c-ed29-4ee2-8afe-fd317bc070fc |
| status | DOWN |
| tags | |
| trunk_details | None |
| updated_at | 2019-05-11T19:36:56Z |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
日志分析
# 將 Tenant Network 的 Port 掛載到 BM Node
VIF 26dd6399-ddec-496e-9ffe-50dc52429166 successfully attached to node 30b7ee8f-7643-40d4-ae45-2bab737e1748
# BM Node 進入 deploying state
Node 30b7ee8f-7643-40d4-ae45-2bab737e1748 moved to provision state "deploying" from state "available"; target provision state is "active"
# 關閉 BM Node 電源,如果它處於 power on
Not going to change node 30b7ee8f-7643-40d4-ae45-2bab737e1748 power state because current state = requested state = 'power off'.
# unconfigure_tenant_networks,掛上 Provisioning Network
Unbinding instance ports from node 30b7ee8f-7643-40d4-ae45-2bab737e1748
# 設定 BM Node 的 PXE 啟動模式
Boot mode is not configured for node 30b7ee8f-7643-40d4-ae45-2bab737e1748 explicitly. The default boot mode is "bios", but, the default will be changed to "uefi" in the future. It is recommended to set the boot option into properties/capabilities/boot_mode for all nodes.
# 執行部署
Executing deploying on node 30b7ee8f-7643-40d4-ae45-2bab737e1748, remaining steps: [{'priority': 100, 'interface': 'deploy', 'step': 'deploy', 'argsinfo': None}]
Executing {'priority': 100, 'interface': 'deploy', 'step': 'deploy', 'argsinfo': None} on node 30b7ee8f-7643-40d4-ae45-2bab737e1748
# 下載 User Images 到 Ironic Conductor 本地
Master cache miss for image 3f734758-68fa-40ac-ba07-2bf8bd6f1911, starting download
# 重啟 BM Node
Successfully set node 30b7ee8f-7643-40d4-ae45-2bab737e1748 power state to power on by rebooting.
Deploy step {'priority': 100, 'interface': 'deploy', 'step': 'deploy', 'argsinfo': None} on node 30b7ee8f-7643-40d4-ae45-2bab737e1748 being executed asynchronously, waiting for driver.
# BM Node 進入 wait call-back state,此時為 BM Node 部署 Deploy Image,然后 RAMDisk 里的 IPA 會向 Ironic Conductor 回調
Node 30b7ee8f-7643-40d4-ae45-2bab737e1748 moved to provision state "wait call-back" from state "deploying"; target provision state is "active"
# IPA 向 Ironic Conductor 執行回調,BM Node 再次進入 deploying state,此時正式部署 User Image
Node 30b7ee8f-7643-40d4-ae45-2bab737e1748 moved to provision state "deploying" from state "wait call-back"; target provision state is "active"
# 將 BM Node 的根磁盤掛載到 Ironic Conductor 本地,通過 dd 注入 User Image
iscsiadm -m discovery -t st -p 10.0.0.10:3260
iscsiadm -m node -p 10.0.0.10:3260 -T iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748 --login
iscsiadm -m node -S
iscsiadm -m node -T iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748 -R
dd if=/var/lib/ironic/images/30b7ee8f-7643-40d4-ae45-2bab737e1748/disk of=/dev/disk/by-path/ip-10.0.0.10:3260-iscsi-iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748-lun-1 bs=1M oflag=direct
hexdump -s 440 -n 4 -e "0x%08x" /dev/disk/by-path/ip-10.0.0.10:3260-iscsi-iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748-lun-1
iscsiadm -m node -p 10.0.0.10:3260 -T iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748 --logout
iscsiadm -m node -p 10.0.0.10:3260 -T iqn.2008-10.org.openstack:30b7ee8f-7643-40d4-ae45-2bab737e1748 -o delete
# remove_provisioning_network,重新掛上 Tenant Network
Unbinding instance ports from node 30b7ee8f-7643-40d4-ae45-2bab737e1748
# 重新設定引導方式(Local Disk)並啟動 BM Node,完成部署
Successfully set node 30b7ee8f-7643-40d4-ae45-2bab737e1748 power state to power on by power on.
Node 30b7ee8f-7643-40d4-ae45-2bab737e1748 moved to provision state "active" from state "deploying"; target provision state is "None"
Successfully deployed node 30b7ee8f-7643-40d4-ae45-2bab737e1748 with instance a2ccfb9a-0361-4f20-b11a-d1ea39c0f20b.
問題:Failed to create neutron ports for any PXE enabled port on node
NetworkError: Failed to create neutron ports for any PXE enabled port on node c1729b3f-9ada-4def-8dcb-43f919b9b997.
調試代碼定位到觸發異常的代碼:
def validate_port_info(node, port):
"""Check that port contains enough information for deploy. Neutron network interface requires that local_link_information field is filled before we can use this port. :param node: Ironic node object. :param port: Ironic port object. :returns: True if port info is valid, False otherwise. """
# Note(moshele): client-id in the port extra field indicates an InfiniBand
# port. In this case we don't require local_link_connection to be
# populated because the network topology is discoverable by the Infiniband
# Subnet Manager.
if port.extra.get('client-id'):
return True
if (node.network_interface == 'neutron'
and not port.local_link_connection):
LOG.warning("The local_link_connection is required for "
"'neutron' network interface and is not present "
"in the nodes %(node)s port %(port)s",
{'node': node.uuid, 'port': port.uuid})
return False
return True
即 Neutron Port 沒有 local_link_connection 屬性。
解決
原因是因為 Ironic Conductor 使用了 Neutron Interface 來支持多租戶網絡,但這種實現需要在 Neutron 中插入 networking generic switch ML2 mechanism driver 才得以實現的,主要完成的工作是對裸金屬服務器的上聯物理交換機進行接口端口配置,例如:VLAN ID 的切換。以此來實現 ironic node 的網絡切換(從 Provisioning Network 到 Tenant Network),之中還應用到了 LLDP 鏈路發現協議。
- 修改配置:
default_network_interface=flat
問題:獲取 Deploy Image 時 MissingAuthPlugin: An auth plugin is required to determine endpoint URL
解決
- 是一個 Bug,解決辦法:https://bugs.launchpad.net/openstack-ansible/+bug/1793959
- 獲取 images 的時候 MissingAuthPlugin 是因為沒有配置 [glance]。e.g.
[glance]
url = http://controller:9292
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = glance
password = fanguiju
問題:執行 provide 時長時間卡在 provision_state:clean wait
這是因為 Inspection 階段從 manageable => available 需要執行 cleaning(抹盤、初始化配置)的操作,需要花費非常長的時間,可以關閉auto-clean。
解決
- 重新接管
[root@controller ~]# openstack baremetal node abort e322f49a-ad50-468d-a031-29bde068c290
[root@controller ~]# openstack baremetal node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| e322f49a-ad50-468d-a031-29bde068c290 | BM01 | None | power off | clean failed | True |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
[root@controller ~]# openstack baremetal --os-baremetal-api-version 1.11 node manage e322f49a-ad50-468d-a031-29bde068c290
[root@controller ~]# openstack baremetal node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| e322f49a-ad50-468d-a031-29bde068c290 | BM01 | None | power off | manageable | True |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
- 修改配置文件
# /etc/ironic/ironic.conf
[conductor]
automated_clean = false
clean_callback_timeout = 1800
rescue_callback_timeout = 1800
soft_power_off_timeout = 600
power_state_change_timeout = 30
power_failure_recovery_interval = 300
- 重啟服務
systemctl restart openstack-ironic-conductor
- 重新 provide
openstack baremetal --os-baremetal-api-version 1.11 node provide e322f49a-ad50-468d-a031-29bde068c290
問題:ironic node 無法被調度
Nova Scheduler Log:
Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.
Nova Compute for Ironic Log:
Node e322f49a-ad50-468d-a031-29bde068c290 is not ready for a deployment, reporting resources as reserved for it. Node's provision state is available, power state is power off and maintenance is True
BM01 的狀態:
[root@controller ~]# openstack baremetal node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| e322f49a-ad50-468d-a031-29bde068c290 | BM01 | None | power off | available | True |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
BM01 Resource Provider 狀態:
[root@controller ~]# openstack resource provider inventory list e322f49a-ad50-468d-a031-29bde068c290
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| VCPU | 1.0 | 2 | 2 | 1 | 1 | 2 |
| MEMORY_MB | 1.0 | 8192 | 8192 | 1 | 1 | 8192 |
| DISK_GB | 1.0 | 100 | 100 | 1 | 1 | 100 |
| CUSTOM_BAREMETAL_TEST | 1.0 | 1 | 1 | 1 | 1 | 1 |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
綜上,Ironic node: BM01 被 nova-compute 認定為 not ready,所以通過 resource provider inventory reserved +1 的方式將該節點 “預留” 起來了。
解決
所以,問題是:為什么 BM01 是 not ready 的?原因就在於 ironic node Maintenance: True,節點處於維護狀態!
[root@controller ~]# openstack baremetal node maintenance unset e322f49a-ad50-468d-a031-29bde068c290
[root@controller ~]# openstack baremetal node list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| e322f49a-ad50-468d-a031-29bde068c290 | BM01 | None | power off | available | False |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
再次查看 BM01 Resource Provider 狀態:
[root@controller ~]# openstack resource provider inventory list e322f49a-ad50-468d-a031-29bde068c290
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
| VCPU | 1.0 | 2 | 0 | 1 | 1 | 2 |
| MEMORY_MB | 1.0 | 8192 | 0 | 1 | 1 | 8192 |
| DISK_GB | 1.0 | 100 | 0 | 1 | 1 | 100 |
| CUSTOM_BAREMETAL_TEST | 1.0 | 1 | 0 | 1 | 1 | 1 |
+-----------------------+------------------+----------+----------+-----------+----------+-------+
問題:獲取 swift_temp_url 時 MissingAuthPlugin: An auth plugin is required to determine endpoint URL
原因:因為我們選擇了 Direct 的部署方式,裸機服務器上的 IPA 會從 Swift Object Storage 將 User Image 拉到本地,在裸機端完成鏡像注入。
官方文檔:Some drivers of the Baremetal service (in particular, any drivers using Direct deploy or Ansible deploy interfaces, and some virtual media drivers) require target user images to be available over clean HTTP(S) URL with no authentication involved (neither username/password-based, nor token-based).
When using the Baremetal service integrated in OpenStack, this can be achieved by specific configuration of the Image service and Object Storage service as described below.
解決
因為環境中沒有 Swift 組件,所我們選擇 iSCSI 的部署方式。
openstack baremetal --os-baremetal-api-version 1.31 node set e322f49a-ad50-468d-a031-29bde068c290 \
--deploy-interface iscsi \
--raid-interface agent
問題:Timeout reached while waiting for callback for node
這個 wait call-back 的過程實際上是 Ironic Conductor 在等待裸機上的 RAMDisk IPA 啟動並完成回調。如果裸機能夠從 PXE 啟動並且網絡沒有問題的話,IPA 應當是可以連接到 Ironic Conductor 的。所以問題大概率出現在網絡上。
PXE 啟動流程:
- 終端從 PXE 網卡啟動,向帶內網絡中的 DHCP 服務器索取 IP 地址和搜尋引導文件的位置。
- DHCP 服務器返回分給終端 IP 以及 NBP(網絡啟動程序,會自動加載引導文件並運行操作系統)文件的路徑(通常為 TFTP 服務器)。
- 終端從帶內網絡中的 TFTP 服務器下載 NBP。
- 終端得到了 NBP 后自動從 TFTP 服務器下載引導文件,比如:pxelinux.0(GRUB)、vmlinuz(內核文件),initrd(內存驅動盤)等文件。
- 安裝操作系統
從 PXE 啟動流程可知裸機會從 Provisioning Network 中的 DHCP 獲取 IP 地址,這個 IP 地址是必須要可以訪問運行在 Ironic Conductor 上的 TFTP 服務,所以要求 Provisioning Network 與 TFTP 服務器要處於同一個 Flat 網絡。
openstack subnet create provisioning-subnet-1 --network provisioning-net-1 \
--subnet-range 172.18.22.0/24 --ip-version 4 --gateway 172.18.22.1 \
--allocation-pool start=172.18.22.237,end=172.18.22.240 --dhcp
實際上在 wait call-back 階段所發生的事情無法是:裸機重啟,引導方式為從 PXE 啟動。剩下的就交由 PXE 自動獲取 DHCP 地址,自動獲取 TFTP 地址、自動獲取引導文件、自動部署 Deploy Image 了。
可以在當前環境中調試這個問題的方法可以非常簡單粗暴,顯示屏接上裸機服務器,然后靜候佳音即可。