近期对微软云上的服务器进行了压测,记录下遇到的中断导致的性能瓶颈。
存在瓶颈的服务器为Web服务器,配置了4个CPU核,
而CPU1 在负载过程中达到95%的利用率。其他三个CPU利用率约为60%。
经过沟通,原因可能为Ruby的Application配置选择的是线程模式运行(为节约内存),可选择以进程模式运行。 ----该点待验证。
今天对服务器压测了5分钟,取了压测前后中断相关的数据,仅供调优参考。
或有片面之处,欢迎拍砖!
差值统计如下:
性能测试5分钟期间软中断差值(取自/proc/interrupts) | ||||||||
CPU内核编号 | 中断号 | CPU0 | CPU1 | CPU2 | CPU3 | |||
5分钟内存在差值的中断 | 7 | 402 | 131753 | 0 | 0 | IO-APIC | hyperv | |
15 | 180 | 0 | 0 | 0 | IO-APIC-edge | ata_piix | ||
LOC: | 45717 | 45707 | 45726 | 45734 | Local timer | interrupts | ||
IWI: | 3439 | 1002 | 721 | 840 | IRQ work | interrupts | ||
RES: | 25074 | 4834 | 11597 | 11956 | Rescheduling | interrupts | ||
CAL: | 61 | 0 | 33 | 0 | Function call | interrupts | ||
TLB: | 41 | 22 | 98 | 61 | TLB | shootdowns | ||
描述:
中断号7的hyperv中断(标红部位)次数,在5分钟中断了132155,每秒中断次数约439.
其中CPU1中断了131753,占比99.7%.
当CPU1的利用率为95%时,平均CPU利用率约为65%.
导致性能测试期间CPU1的利用率较其他CPU过早消耗完.
其他类型的中断或中断次数较少,或CPU内核之间中断分配较均匀,暂未发现瓶颈.
疑问:
服务器是微软云虚拟机,而造成瓶颈的中断设备为hyperv,
hyperv这个中断设备在系统中起的作用到底是什么?
示例(cat /proc/interrupts)

cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 14752 0 0 0 IO-APIC-edge timer 1: 9 0 0 0 IO-APIC-edge i8042 4: 478 0 0 0 IO-APIC-edge serial 6: 3 0 0 0 IO-APIC-edge floppy 7: 1834 16517 2843 6 IO-APIC hyperv 8: 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 167 0 0 0 IO-APIC-edge i8042 14: 0 0 0 0 IO-APIC-edge ata_piix 15: 0 0 0 0 IO-APIC-edge ata_piix NMI: 0 0 0 0 Non-maskable interrupts LOC: 201953 201963 201980 201972 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 2415 2776 1292 1848 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 12420 7719 5509 6198 Rescheduling interrupts CAL: 3127 740 1762 3906 Function call interrupts TLB: 541 600 610 565 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 3 3 3 3 Machine check polls ERR: 0 MIS: 0
对部分IRQ标识的解释

--------------------------------------------------------------------------------------------------------- 注释: 1. 对部分IRQ标识的解释 Interrupt as in /proc/interrupts Name as it appears in ftrace log NMI: Non-maskable interrupts : NMI_VECTOR LOC: Local timer interrupts : LOCAL_TIMER_VECTOR SPU: Spurious interrupts : SPURIOUS_APIC_VECTOR PMI: Performance monitoring interrupts : <not added> PND: Performance pending work : LOCAL_PENDING_VECTOR RES: Rescheduling interrupts : RESCHEDULE_VECTOR CAL: Function call interrupts : CALL_FUNCTION_VECTOR or CALL_FUNCTION_SINGLE_VECTOR TLB: TLB shootdowns : INVALIDATE_TLB_VECTOR_START to INVALIDATE_TLB_VECTOR_END TRM: Thermal event interrupts : THERMAL_APIC_VECTOR THR: Threshold APIC interrupts : THRESHOLD_APIC_VECTOR MCE: Machine check exceptions : <not added> MCP: Machine check polls : <not added> ERR: : ERROR_APIC_VECTOR MIS: : <not added> PLT: Platform interrupts : X86_PLATFORM_IPI_VECTOR 2. IO-APIC-edge timer 此处的timer为系统定时器.