轉自:https://blog.51cto.com/qiangsh/2088862
關於Cyclictest工具,在Wiki上有說明:https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest
Cyclictest is a high resolution test program, written by User:Tglx, maintained by Clark Williams and John Kacur
Documentation Installation
Get the latest sources from the git repository, do a git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git or fetch a released tarball from the archive, untar into a directory of your choice and run make in the source directory. If you want to cross compile, just run make CROSS_COMPILE= (for example make CROSS_COMPILE=arm-v4t-linux-gnueabi-).
You can run the resulting binary from there or install it.
#需要安裝libnuma-devel包后make編譯 yum install numactl-devel git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git cd rt-tests git checkout stable/v1.0 make all make install make cyclictest
Run it
Make sure to be root or use sudo to run cyclictest.
Without parameters cyclictest creates one thread with a 1ms interval timer.
cyclictest -h provides help text for the various options
[root@localhost rt-tests]# ./cyclictest --help cyclictest V 1.00 Usage: cyclictest <options> -a [CPUSET] --affinity Run thread #N on processor #N, if possible, or if CPUSET given, pin threads to that set of processors in round- robin order. E.g. -a 2 pins all threads to CPU 2, but -a 3-5,0 -t 5 will run the first and fifth threads on CPU (0),thread #2 on CPU 3, thread #3 on CPU 4, and thread #5 on CPU 5. -A USEC --aligned=USEC align thread wakeups to a specific offset -b USEC --breaktrace=USEC 當延時大於USEC指定的值時,發送停止跟蹤。USEC,單位為謬秒(us)。 -B --preemptirqs both preempt and irqsoff tracing (used with -b) -c CLOCK --clock=CLOCK 選擇時鍾 cyclictest -c 1 0 = CLOCK_MONOTONIC (默認) 1 = CLOCK_REALTIME -C --context context switch tracing (used with -b) -d DIST --distance=DIST distance of thread intervals in us, default=500 -D --duration=TIME 指定要測試多長時間。默認單位是秒,但是也可以指定m(分),h(小時),d(天) --latency=PM_QOS write PM_QOS to /dev/cpu_dma_latency -E --event event tracing (used with -b) -f --ftrace ftrace函數跟蹤(通常與-b 配套使用,其實通常使用 -b 即可,不使用 -f ) -F --fifo=<path> create a named pipe at path and write stats to it -h --histogram=US 在執行完后在標准輸出設備上畫出延遲的直方圖(很多線程有相同的權限)US為最大的跟蹤時間限制,這個在下面介紹實例時可以用到,結合gnuplot 可以畫出我們測試的結果圖。 -H --histofall=US same as -h except with an additional summary column --histfile=<path> dump the latency histogram to <path> instead of stdout -i INTV --interval=INTV 基本線程間隔,默認為1000(單位為us) -I --irqsoff Irqsoff tracing (used with -b) -l LOOPS --loops=LOOPS 循環的個數,默認為0(無窮個),與 -i 間隔數結合可大致算出整個測試的時間,比如 -i 1000 -l 1000000 ,總的循環時間為1000*1000000=1000000000 us =1000s ,所以大致為16分鍾多。 --laptop Save battery when running cyclictest This will give you poorer realtime results but will not drain your battery so quickly -m --mlockall 鎖定當前和將來的內存分配 -M --refresh_on_max delay updating the screen until a new max latency is hit.//延遲更新屏幕直到新的延時周期的到來 Userful for low bandwidth. -n --nanosleep 使用 clock_nanosleep --notrace suppress tracing -N --nsecs print results in ns instead of us (default us) //每ns打印一次結果,而不是us(默認是us) -o RED --oscope=RED oscilloscope mode, reduce verbose output by RED //示波器模式,減少冗長的輸出通過RED -O TOPT --traceopt=TOPT trace option //跟蹤選項 -p PRIO --priority=PRIO 最高優先級線程的優先級 使用方法: -p 90 / --prio=90 -P --preemptoff Preempt off tracing (used with -b) --policy=NAME policy of measurement thread, where NAME may be one of: other, normal, batch, idle, fifo or rr. --priospread spread priority levels starting at specified value -q --quiet 使用-q 參數運行時不打印信息,只在退出時打印概要內容,結合-h HISTNUM參數會在退出時打印HISTNUM 行統計信息以及一個總的概要信息。 -r --relative use relative timer instead of absolute -R --resolution check clock resolution, calling clock_gettime() many times. List of clock_gettime() values will be reported with -X --secaligned [USEC] align thread wakeups to the next full second and apply the optional offset -s --system use sys_nanosleep and sys_setitimer -S --smp Standard SMP testing: options -a -t -n and same priority of all threads //標准 SMP 測試:選項 -a -t -n ,並且所有的線程要優先級相同 --spike=<trigger> record all spikes > trigger --spike-nodes=[num of nodes] These are the maximum number of spikes we can record. The default is 1024 if not specified --smi Enable SMI counting -t --threads one thread per available processor//每個可用的處理器一個線程 -t [NUM] --threads=NUM number of threads://線程的個數 without NUM, threads = max_cpus //不指定 NUM 時,線程個數為max_cups without -t default = 1 //沒有 -t 選項時,線程個數為1 --tracemark write a trace mark when -b latency is exceeded -T TRACE --tracer=TRACER set tracing function configured tracers: unavailable (debugfs not mounted) -u --unbuffered force unbuffered output for live processing -U --numa Standard NUMA testing (similar to SMP option) thread data structures allocated from local node -v --verbose output values on stdout for statistics //把統計數據輸出到標准輸出 format: n:c:v n=tasknum c=count v=value in us //n=任務個數 c=計數 v=數值(單位:us) -w --wakeup task wakeup tracing (used with -b) //任務喚醒跟蹤(和 -b 一起使用) -W --wakeuprt rt task wakeup tracing (used with -b) //實時任務喚醒跟蹤 --dbg_cyclictest print info useful for debugging cyclictest
推薦參數以及結果實例
[root@localhost rt-tests]# sudo ./cyclictest -p 90 - m -c 0 -i 200 -n -h 100 -q -l 1000000
我們使用 -p 90給cyclictest 賦優先級90,使用-m參數鎖定內存分配,使用 -c 0指定使用默認的MONOTONIC 時鍾,
-i 200 指定一個循環為200us,結合 -l 1000000為總共1000000個循環,-n 為使用nanosleep 而不是簡單的sleep,
-q為在運行時不打印即時信息,-h 100 為總共統計100個信息在最后的結果中。
-----
#/dev/cpu_dma_latency set to 0us -------------(下面都是結束測試/終端測試后打印的信息,這就是 -q 的功效!) #Histogram 000000 000000 000001 000000 000002 000000 000003 000000 000004 000000 000005 000002 -- 延時為5us的在1000000次循環中占2次(下面每行都是這個意思) 000006 000009 ..........此處省略 000099 000005 -- 我們使用 -h 100 ,所以在結果中記錄了延時為 0us ~ 99us 的次數 #Total: 000999914 #Min Latencies: 00005 -- 最小延時 5 us #Avg Latencies: 00012 -- 平均延時 12us #Max Latencies: 19920 -- 最大延時19920 us,那么我們指定histogram = 100也就是只記錄了0us~99us的值而最大延時為19920 也就是說肯定有很多此延時超過99 us,那么記錄到哪了?答案是,沒有記錄具體的超過99us的延時值,只在下面記錄了超過99us 的延時次數(記錄在Overflows),以及第幾次超過了(記錄在Thread 0)。 #Histogram Overflows: 00086 -- 超過99 us的次數 #Histogram Overflow at cycle number: #Thread 0: 65668 162024 164458 166533 171828 174546 179471 182538 188257 198415 202689 209055 211934 224529 227292 239809 267144 311992 312072 335066 341986 353395 355217 355295 355297 385017 411492 417012 443642 453450 453463 453478 453492 453504 453505 453522 453540 482063 482116 482797 483077 486153 515557 517062 517066 522812 538214 560636 574301 574500 598338 602175 610697 620924 678231 692237 692242 692247 713557 779826 797948 851442 860635 860642 860654 860661 861147 875755 880618 883622 884128 884238 885915 887215 887457 896442 925069 928998 942590 947161 947871 955507 955508 982245 982250 992192 //這里記錄的是第幾次循環的延時超過了99us。
$ sudo cyclictest -t 2 // 使用兩個測試線程
policy: other/other: loadavg: 0.00 0.01 0.05 1/346 2595 T: 0 ( 2594) P: 0 I:1000 C: 14090 Min: 32 Act: 200 Avg: 177 Max: 2855 T: 1 ( 2595) P: 0 I:1500 C: 9397 Min: 23 Act: 202 Avg: 170 Max: 2863
輸出結果含義:
T: 0 序號為0的線程
P: 0 線程優先級為0
C: 9397 計數器。線程的時間間隔每達到一次,計數器加1
I: 1000 時間間隔為1000微秒(us)
Min: 最小時延(us)
Act: 最近一次的時延(us)
Avg:平均時延(us)
Max: 最大時延(us)
Expected Results
tglx’s reference machine
All tests have been run on a Pentium III 400MHz based PC.
The tables show comparisons of vanilla Linux 2.6.16, Linux-2.6.16-hrt5 and Linux-2.6.16-rt12. The tests for intervals less than the jiffy resolution have not been run on vanilla Linux 2.6.16. The test thread runs in all cases with SCHED_FIFO and priority 80. All numbers are in microseconds.
案例: clock_nanosleep(TIME_ABSTIME), Interval 10000
microseconds,. 10000 loops, no load.
Commandline: cyclictest -t1 -p 80 -n -i 10000 -l 10000 Kernel min max avg 2.6.16 24 4043 1989 2.6.16-hrt5 12 94 20 2.6.16-rt12 6 40 10
案例: clock_nanosleep(TIME_ABSTIME), Interval 10000 micro
seconds,. 10000 loops, 100% load.
Commandline: cyclictest -t1 -p 80 -n -i 10000 -l 10000 Kernel min max avg 2.6.16 55 4280 2198 2.6.16-hrt5 11 458 55 2.6.16-rt12 6 67 29
案例: POSIX interval timer, Interval 10000 micro seconds,. 10000
loops, no load.
Commandline: cyclictest -t1 -p 80 -i 10000 -l 10000 Kernel min max avg 2.6.16 21 4073 2098 2.6.16-hrt5 22 120 35 2.6.16-rt12 20 60 31
Test case: POSIX interval timer, Interval 10000 micro seconds,. 10000
loops, 100% load.
Commandline: cyclictest -t1 -p 80 -i 10000 -l 10000 Kernel min max avg 2.6.16 82 4271 2089 2.6.16-hrt5 31 458 53 2.6.16-rt12 21 70 35
案例: clock_nanosleep(TIME_ABSTIME), Interval 500 micro
seconds,. 100000 loops, no load.
Commandline: cyclictest -t1 -p 80 -i 500 -n -l 100000 Kernel min max avg 2.6.16-hrt5 5 108 24 2.6.16-rt12 5 48 7
Test case: clock_nanosleep(TIME_ABSTIME), Interval 500 micro
seconds,. 100000 loops, 100% load.
Commandline: cyclictest -t1 -p 80 -i 500 -n -l 100000 Kernel min max avg 2.6.16-hrt5 9 684 56 2.6.16-rt12 10 60 22
案例: POSIX interval timer, Interval 500 micro seconds,. 100000
loops, no load.
Commandline: cyclictest -t1 -p 80 -i 500 -l 100000 Kernel min max avg 2.6.16-hrt5 8 119 22 2.6.16-rt12 12 78 16
案例: POSIX interval timer, Interval 500 micro seconds,. 100000
loops, 100% load.
Commandline: cyclictest -t1 -p 80 -i 500 -l 100000 Kernel min max avg 2.6.16-hrt5 16 489 58 2.6.16-rt12 12 95 29
FAQ
ps shows the wrong scheduling class SCHED_OTHER
Each cyclictest-task consist of one or more threads. ps -ce shows only the main-process not the threads of the main-process. ps -eLc | grep cyclic shows the main-process an the containing threads with the correct scheduler class SCHED_FIFO.
#>./cyclictest -t5 -p 80 -n -i 10000 #> ps -cLe | grep cyclic 4764 4764 TS 19 pts/1 00:00:01 cyclictest 4764 4765 FF 120 pts/1 00:00:00 cyclictest 4764 4766 FF 119 pts/1 00:00:00 cyclictest 4764 4767 FF 118 pts/1 00:00:00 cyclictest 4764 4768 FF 117 pts/1 00:00:00 cyclictest 4764 4769 FF 116 pts/1 00:00:00 cyclictest
chrt shows the wrong scheduling class SCHED_OTHER
Don’t use the PID of the main-process, but the pid of one of the threads from the main-process. The threads are shown with ps -cLe | grep cyclic.
#> chrt -p 4766 pid 4766's current scheduling policy: SCHED_FIFO pid 4766's current scheduling priority: 79