公司使用NetApp FS8040作為測試環境NFS存儲使用。正好有機會測一下OpenStack的Cinder跟NetApp存儲集成。
說明:
1.OpenStack使用NetApp存儲直接掛載NFS文件沒任何問題,生產中已使用比較穩定測試IOPS在160-220M/s。
2.使用OpenStack的Cinder無法像掛載如Linux主機共享NFS文件那樣直接使用,需要調用NetAPP的API才能實現功能,如果設置為標准驅動故障現象為cinder-volume在開始的時候是正常的,一般十來分鍾后State狀態為down.(暫無截圖)
錯誤配置文件如下:
[DEFAULT]
enabled_backends = nfs
[nfs]
volume_backend_name = nfs //標黃的三處命名應統一,命名內容與使用協議無關如下文命名netapp_nfs
volume_driver = cinder.volume.drivers.nfs.NfsDriver //定義使用的驅動類型,通用的NFS使用該選項,第三方廠商調用的驅動配置各不相同
nfs_sparsed_volumes = True
nfs_shares_config = /etc/cinder/nfs_shares
nfs_mount_point_base = $state_path/mnt
nfs_mount_options = v3
[root@controller1 cinder]# vim nfs_shares
172.16.5.242:/vol/sqmgtvm02/nfs //NetApp存儲IP:/共享的文件目錄 正確的內容應為172.16.5.xx:/vol/sqmgtvm02提供volume不是文件夾nfs為生產環境隔離增加nfs(導致下文報錯2)
檢查/var/log/cinder/volume.log中報錯日志如下:
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service [req-37e3e47a-e1cb-47b8-a950-73374fd8713b - - - - -] Error starting thread.
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service Traceback (most recent call last):
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service service.start()
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/service.py", line 234, in start
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service self.manager.init_host(added_to_cluster=self.added_to_cluster)
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 425, in init_host
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service self.driver.init_capabilities()
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/driver.py", line 704, in init_capabilities
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service stats = self.get_volume_stats(True)
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py", line 512, in get_volume_stats
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service self._update_volume_stats()
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py", line 448, in _update_volume_stats
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service provisioned_capacity = self._get_provisioned_capacity()
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py", line 212, in _get_provisioned_capacity
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service run_as_root=True)
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 123, in execute
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service return processutils.execute(*cmd, **kwargs)
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 389, in execute
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service cmd=sanitized_cmd)
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service ProcessExecutionError: Unexpected error while running command.
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf du --bytes /var/lib/cinder/mnt/3d59744e62f876bf5171140e3a723d34
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service Exit code: 1
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service Stdout: u'4096\t/var/lib/cinder/mnt/3d59744e62f876bf5171140e3a723d34/.snapshot\n8268\t/var/lib/cinder/mnt/3d59744e62f876bf5171140e3a723d34\n'
2017-09-07 22:07:58.983 16612 ERROR oslo_service.service Stderr: '/bin/du: WARNING: Circular directory structure.\nThis almost certainly means that you have a corrupted file system.\nNOTIFY YOUR SYSTEM MANAGER.\nThe following directory is part of the cycle:\n \xe2\x80\x98/var/lib/cinder/mnt/3d59744e62f876bf5171140e3a723d34/.snapshot/sv_nightly.0\xe2\x80\x99\n\n'
2017-09-09 21:33:28.066 154678 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2017-09-09 21:33:28.500 154678 WARNING cinder.keymgr.conf_key_mgr [req-ba7d370f-a96c-4b3f-95fa-c6234277766e - - - - -] This key manager is insecure and is not recommended for production deployments
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume [req-ba7d370f-a96c-4b3f-95fa-c6234277766e - - - - -] Volume service controller2@nfs failed to start.
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume Traceback (most recent call last):
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/cmd/volume.py", line 99, in main
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume cluster=cluster)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/service.py", line 382, in create
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume cluster=cluster)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/service.py", line 202, in __init__
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume *args, **kwargs)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 242, in __init__
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume active_backend_id=curr_active_backend_id)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 44, in import_object
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume return import_class(import_str)(*args, **kwargs)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/common.py", line 75, in __new__
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume na_utils.check_flags(NetAppDriver.REQUIRED_FLAGS, config)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/utils.py", line 79, in check_flags
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume raise exception.InvalidInput(reason=msg)
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume InvalidInput: Invalid input received: Configuration value netapp_storage_protocol is not set.
2017-09-09 21:33:28.513 154678 ERROR cinder.cmd.volume
2017-09-09 21:33:28.517 154678 ERROR cinder.cmd.volume [req-ba7d370f-a96c-4b3f-95fa-c6234277766e - - - - -] No volume service(s) started successfully, terminating.
2017-09-09 21:33:30.401 154691 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2017-09-09 21:33:33.308 154691 WARNING cinder.keymgr.conf_key_mgr [req-44d8acf3-246c-4efb-aaaf-00d092a68f40 - - - - -] This key manager is insecure and is not recommended for production deployments
正確配置文件:
[netapp_nfs]
volume_backend_name = netpp_nfs
volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver
netapp_storage_family = ontap_7mode //NetApp目前產品線分為兩種模式
netapp_storage_protocol = nfs //使用協議
netapp_server_hostname = sqmgtvm02 //改名稱建議修改/etc/hosts來定義主機和IP (原使用共享目錄的172.16.X.X的IP地址提示認證錯誤更改為NetApp的管理地址正常)
netapp_server_port = 80
netapp_transport_type = http //認證協議選擇支持https和http,標准中我使用http模式。(https配置較復雜詳見下文參考鏈接)
netapp_login = root //登錄用戶名,應該為管理員權限,就是登陸onecommand的那個賬號和密碼
netapp_password = netappxxx //登錄密碼
#netapp_vserver = svm_name //具體未知 按照官方文檔推測應該就是sqmgtvm02
nfs_shares_config = /etc/cinder/nfs_shares //配置NetApp NFS存儲共享內容,可以使用showmount -e 172.16.5.xxx 來顯示存儲共享的目錄
nfs_mount_point_base = $state_path/mnt //掛載到本地的掛載點,改命令直接掛載至/var/lib/cinder/mnt/6ff41da189e9ce5bfc54af3394adbcd8
#max_over_subscription_ratio = 1.0 //推測為磁盤超配比
#reserved_percentage = 5 //卷預留空間占比防止卷徹底掛掉,Ceph中也有類似選項避免空間爆掉可以通過釋放改空間來執行緊急刪除或遷移操作
個人排錯跳坑歷程:
too young,too simple
Cinder裝好后認為會跟掛載NFS一樣簡單,按照錯誤配置直接掛載后打完收工竟然可以創建卷並成功掛載,簡單dd命令一把完美准備交付。
為伊消得人憔悴
斷斷續續重啟服務微調參數,各種參數修改但是cinder依然帥不過三秒。
1.嘗試檢查各節點時間與時區是否同步,發現NetApp存儲時間差異較大差點動手調整。
2.檢查NetApp配置,發現啟用為NFS v3版本,調整[nfs]中nfs_mount_options = v3該選項默認是先嘗試v4.1->v4.0->v3.0依次嘗試,生產中建議直接指定為后端的NFS版本。
3.通過搜索發現下面截圖的內容,靈光一閃如思科這種國際大廠都是對共有協議N多修改難道NetApp也有類似的修改定制。
原文鏈接:http://www.cnblogs.com/liaojiafa/p/6392684.html
檢查如下,確認Cinder確有NetApp定制的驅動內容
翻看了RedHat相關文檔內容也支持了這種推論。(囧,竟然是在思科官方找到)
截圖鏈接:https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/flexpod_openstack_osp6_design.html
各種讓人憂傷的報錯截圖:
方向既然有就開始翻看OpenStack官網內容然后才了解原來NetApp存儲還分不同的family如7-Mode和ontap_cluster,檢查公司存儲為7-Mode,其他不同的NetApp型號可能需要單獨查詢。
Newton:https://docs.openstack.org/newton/config-reference/block-storage/drivers/netapp-volume-driver.html
Ocata :https://docs.openstack.org/ocata/config-reference/block-storage/drivers/netapp-volume-driver.html
按照上文配置參數修改后重新配置服務后重啟依然帥不過三秒,果然幸福來的太快都是假的。
不過這次報錯信息很貼心的告訴我服務很快會顯示down,我謝謝你啊。。。。。。
報錯為no sending heartbeat,既然是用heartbeat就說明有聯動的調用關系才會有狀態信息監測。OpenStack官方文檔未找到相關選項點擊進NetApp官方的GitHub看看有沒有思路。

官方配置文檔 : http://netapp.io/openstack/
GitHub: https://github.com/NetApp/cinder
對應參考NFS 7Mode: http://netapp.github.io/openstack-deploy-ops-guide/liberty/content/cinder.7mode.nfs.configuration.html
netapp_transport_type = http 既然是Required所有示例中都沒該參數,修改參數后故障依然繼續服務還是帥不過三秒。
報錯相關鏈接搜索:
http://community.netapp.com/t5/OpenStack-Discussions/Cinder-driver-netapp-problem-KILO-Release/td-p/115209
https://community.netapp.com/t5/OpenStack-Discussions/cinder-iscsi-driver-initialization-failed/td-p/131503
https://platform9.com/support/openstack-cinder-integration-with-netapp-cluster-nfs/
https://review.openstack.org/#/c/499148/
https://bugs.launchpad.net/cinder/+bug/1660870
https://bugs.launchpad.net/cinder/+bug/1705738
https://bugs.launchpad.net/cinder/+bug/1694579
檢查NetApp日志XML報錯提示http認證錯誤,排除賬號密碼問題檢查NetApp發現默認是啟用SSL,關閉SSL認證認證通過在存儲控制器上正常發現Openstack Cinder可以正常連接。
該操作存在問題后續啟用認證並未提示失敗,待測。
HTTPS認證方式可以參考該文章非常不錯可參考:
http://netapp.io/2017/02/15/use-certificate-verification-netapp-ontap-openstack-cinder-driver/
報錯2:
使用中遇到問題:
nfs_sparsed_volumes=True生產中應禁用,該選項會導致創建的卷直接占用磁盤空間存在IOPS保證風險,切記不准隨意開啟。
Cinder成功后每次創建虛機都會先吃存儲卷大小,直接將創建新卷點擊為否可暫時不創建存儲卷。