Ceph 集群 Slow Requests 和 Requests are Blocked 問題分析


Slow Requests, and Requests are Blocked 慢速請求,並且請求被阻止

The ceph-osd daemon is slow to respond to a request and the ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

In addition, the Ceph logs include an error message similar to the following ones:

2015-08-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included below; oldest blocked for > 61.758455 secs

2016-07-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]

What This Means 這意味着什么

An OSD with slow requests is every OSD that is not able to service the I/O operations per second (IOPS) in the queue within the time defined by the osd_op_complaint_time parameter. By default, this parameter is set to 30 seconds. 請求緩慢的OSD是每個不能在osd_op_complaint_time參數定義的時間內在隊列中每秒處理I/O操作(IOPS)的OSD。默認情況下,此參數設置為30秒。

The main causes of OSDs having slow requests are: OSD請求緩慢的主要原因

    Problems with the underlying hardware, such as disk drives, hosts, racks, or network switches  基礎硬件(例如磁盤驅動器,主機,機架或網絡交換機)的問題。
    Problems with network. These problems are usually connected with flapping OSDs. See Section 5.1.4, “Flapping OSDs” for details.  網絡問題。這些問題通常與OSD抖動有關,例如移動OSD。
    System load 系統負荷。

The following table shows the types of slow requests. Use the dump_historic_ops administration socket command to determine the type of a slow request. For details about the administration socket, see the Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 2.
下表顯示了慢速請求的類型。使用dump_historic_ops管理套接字命令來確定慢速請求的類型。有關管理套接字的詳細信息,請參閱《 Red Hat Ceph Storage 2管理指南》中的“使用管理套接字”部分。

Slow request type	         Description
------------------------------------------------------------------------------------------------------------------------
waiting for rw locks 等待RW鎖               The OSD is waiting to acquire a lock on a placement group for the operation. OSD正在等待獲取該操作的放置組上的鎖。

waiting for subops 等待子操作               The OSD is waiting for replica OSDs to apply the operation to the journal. OSD正在等待副本OSD將操作應用於日志。

no flag points reached 未達到標志點         The OSD did not reach any major operation milestone. OSD沒有達到任何主要的操作里程碑。

waiting for degraded object 等待退化的對象  The OSDs have not replicated an object the specified number of times yet. OSD尚未復制對象指定次數。

To Troubleshoot This Problem 解決此問題

1. Determine if the OSDs with slow or block requests share a common piece of hardware, for example a disk drive, host, rack, or network switch. 確定請求緩慢或阻塞的OSD是否共享公用硬件,例如磁盤驅動器,主機,機架或網絡交換機。

2. If the OSDs share a disk:

    i. Use the smartmontools utility to check the health of the disk or the logs to determine any errors on the disk. 使用smartmontools實用工具檢查磁盤或日志的運行狀況,以確定磁盤上的任何錯誤。
      # smartctl -i /dev/sda  檢查磁盤的 Smart 功能是否啟用。
      # smartctl -H /dev/sda  顯示磁盤總體健康狀況。
      # smartctl -l error /dev/sda  顯示磁盤錯誤日志。
      # smartctl -s on -a /dev/sda  檢查非陣列磁盤。
      # smartctl -a -d megaraid,0 /dev/sda  檢查陣列磁盤,其中megaraid,0的0代表的是在megaraid中的物理盤編號。
    
    Note:

    The smartmontools utility is included in the smartmontools package. smartmontools實用程序包含在smartmontools軟件包中。

    ii. Use the iostat utility to get the I/O wait report (%iowai) on the OSD disk to determine if the disk is under heavy load. 使用iostat實用程序獲取OSD磁盤上的I / O等待報告(%iowai),以確定該磁盤是否處於高負載狀態。例如:iostat -c 1 20
    
    Note:

    The iostat utility is included in the sysstat package.  iostat實用程序包含在sysstat軟件包中。

3. If the OSDs share a host:

    i. Check the RAM and CPU utilization 檢查RAM和CPU利用率,例如:free -h, top
    
    ii. Use the netstat utility to see the network statistics on the Network Interface Controllers (NICs) and troubleshoot any networking issues. See also Chapter 3, Troubleshooting Networking Issues for further information. 使用netstat實用工具可以查看網絡接口控制器(NIC)上的網絡統計信息,並解決所有網絡問題。另請參閱第3章,對網絡問題進行故障排除。例如:netstat -an
    
4. If the OSDs share a rack, check the network switch for the rack. For example, if you use jumbo frames, verify that the NIC in the path has jumbo frames set. 如果OSD共享一個機架,請檢查機架的網絡交換機。例如,如果使用巨型幀,請驗證路徑中的NIC是否設置了巨型幀。

5. If you are unable to determine a common piece of hardware shared by OSDs with slow requests, or to troubleshoot and fix hardware and networking problems, open a support ticket. See Chapter 7, Contacting Red Hat Support Service for details. 如果您無法確定請求緩慢的OSD共享的通用硬件,或者無法解決和修復硬件和網絡問題,請打開支持通知單。有關詳細信息,請參見第7章,與Red Hat支持服務聯系。

See Also

The Using the Administration Socket section: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/administration_guide/#using_the_administration_socket

在服務器中通過netstat -na 查看連接狀態

在服務器中通過netstat -na查看連接狀態,可以看到連接正常、連接斷開、TIME_WAIT等等可能狀態的TCP連接,若一個系統出現了大量 TIME_WAIT 狀態的連接,說明該服務器經常主動發起連接關閉操作,這也是不可取的。如一個系統頻繁出現 CLOSE_WAIT 狀態的連接,說明該系統並未立即處理連接關閉請求,系統也存在缺陷。

同時通過觀察 netstat -na 的 send-q 和 recv-q 隊列的大小,可以分析系統服務能力,若send-q過大,說明系統發包速度過快以至於連接無法及時將數據發出。若recv-q過大,說明系統未能及時處理外部發來的請求。

通過netstat還可以檢測服務器是否能正常處理客戶端連接。服務器在調用listen時,會傳遞backlog參數,該參數未已建立連接但未被程序accept的連接數,內核層會根據 /proc/sys/net/core/somaxconn 值與傳入的backlog值,選擇兩者中的小值作為已建立連接但未被服務器accept的連接隊列長度。

netstat -na |grep PORT | grep LISTEN 可以查看到監聽句柄的recv-q隊列大小,如果該值較大升值>=backlog值,說明服務器無法適應當前連接建立速度,不能及時的accept新連接,此時即使服務器內部統計無壓力,各種請求處理指標都正常也會影響外部服務,因為新的連接可能會失敗(不失敗也會等待較長時間才被服務器處理,而此時可能客戶端已經超時重連了...一旦發生這種情形就會惡性循環-連接一直建立,但每個連接都失敗)。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM