近期對微軟雲上的服務器進行了壓測,記錄下遇到的中斷導致的性能瓶頸。
存在瓶頸的服務器為Web服務器,配置了4個CPU核,
而CPU1 在負載過程中達到95%的利用率。其他三個CPU利用率約為60%。
經過溝通,原因可能為Ruby的Application配置選擇的是線程模式運行(為節約內存),可選擇以進程模式運行。 ----該點待驗證。
今天對服務器壓測了5分鍾,取了壓測前后中斷相關的數據,僅供調優參考。
或有片面之處,歡迎拍磚!
差值統計如下:
性能測試5分鍾期間軟中斷差值(取自/proc/interrupts) | ||||||||
CPU內核編號 | 中斷號 | CPU0 | CPU1 | CPU2 | CPU3 | |||
5分鍾內存在差值的中斷 | 7 | 402 | 131753 | 0 | 0 | IO-APIC | hyperv | |
15 | 180 | 0 | 0 | 0 | IO-APIC-edge | ata_piix | ||
LOC: | 45717 | 45707 | 45726 | 45734 | Local timer | interrupts | ||
IWI: | 3439 | 1002 | 721 | 840 | IRQ work | interrupts | ||
RES: | 25074 | 4834 | 11597 | 11956 | Rescheduling | interrupts | ||
CAL: | 61 | 0 | 33 | 0 | Function call | interrupts | ||
TLB: | 41 | 22 | 98 | 61 | TLB | shootdowns | ||
描述:
中斷號7的hyperv中斷(標紅部位)次數,在5分鍾中斷了132155,每秒中斷次數約439.
其中CPU1中斷了131753,占比99.7%.
當CPU1的利用率為95%時,平均CPU利用率約為65%.
導致性能測試期間CPU1的利用率較其他CPU過早消耗完.
其他類型的中斷或中斷次數較少,或CPU內核之間中斷分配較均勻,暫未發現瓶頸.
疑問:
服務器是微軟雲虛擬機,而造成瓶頸的中斷設備為hyperv,
hyperv這個中斷設備在系統中起的作用到底是什么?
示例(cat /proc/interrupts)

cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 14752 0 0 0 IO-APIC-edge timer 1: 9 0 0 0 IO-APIC-edge i8042 4: 478 0 0 0 IO-APIC-edge serial 6: 3 0 0 0 IO-APIC-edge floppy 7: 1834 16517 2843 6 IO-APIC hyperv 8: 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 167 0 0 0 IO-APIC-edge i8042 14: 0 0 0 0 IO-APIC-edge ata_piix 15: 0 0 0 0 IO-APIC-edge ata_piix NMI: 0 0 0 0 Non-maskable interrupts LOC: 201953 201963 201980 201972 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 2415 2776 1292 1848 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 12420 7719 5509 6198 Rescheduling interrupts CAL: 3127 740 1762 3906 Function call interrupts TLB: 541 600 610 565 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 3 3 3 3 Machine check polls ERR: 0 MIS: 0
對部分IRQ標識的解釋

--------------------------------------------------------------------------------------------------------- 注釋: 1. 對部分IRQ標識的解釋 Interrupt as in /proc/interrupts Name as it appears in ftrace log NMI: Non-maskable interrupts : NMI_VECTOR LOC: Local timer interrupts : LOCAL_TIMER_VECTOR SPU: Spurious interrupts : SPURIOUS_APIC_VECTOR PMI: Performance monitoring interrupts : <not added> PND: Performance pending work : LOCAL_PENDING_VECTOR RES: Rescheduling interrupts : RESCHEDULE_VECTOR CAL: Function call interrupts : CALL_FUNCTION_VECTOR or CALL_FUNCTION_SINGLE_VECTOR TLB: TLB shootdowns : INVALIDATE_TLB_VECTOR_START to INVALIDATE_TLB_VECTOR_END TRM: Thermal event interrupts : THERMAL_APIC_VECTOR THR: Threshold APIC interrupts : THRESHOLD_APIC_VECTOR MCE: Machine check exceptions : <not added> MCP: Machine check polls : <not added> ERR: : ERROR_APIC_VECTOR MIS: : <not added> PLT: Platform interrupts : X86_PLATFORM_IPI_VECTOR 2. IO-APIC-edge timer 此處的timer為系統定時器.