什么是ANR?
ANR:Application Not Responding,即應用無響應
ANR日志Trace文件獲取
系統生成的Trace文件保存在data/anr,可以用過命令adb pull data/anr/取出
traces.txt只保留最后一次ANR的信息,Android系統有個DropBox功能功能,它能記錄系統出現的crash錯誤.因此保留有發生過的ANR的信息.(log路徑:/data/system/dropbox)
獲取系統crash log: adb shell dumpsys dropbox --print >>log.txt
Trace文件怎么生成的?
當APP(包括系統APP和用戶APP)進程出現ANR、應用響應慢或WatchDog的監視沒有得到回饋時,
系統會dump此時的top進程,進程中Thread的運行狀態就都dump到這個Trace文件中了.
導致ANR的常見幾種情況:
1:Input dispatching timed out(5 seconds) 按鍵或觸摸事件處理超時(一般是UI主線程做了耗時的操作,這類ANR最常見)
2:BroadcastTimeout(10 seconds) 廣播的分發和處理超時(一般是onReceiver執行時間過長)
3:ServiceTimeout(20 seconds) Service的啟動和執行超時
另外還有ProviderTimeout和WatchDog等導致的ANR.還有當系統內存或CPU資源不足時容易出現ANR,一般這種情況會有lowmemorykill的log打印.
應用ANR產生的時候,ActivityManagerService的appNotResponding方法就會被調用,然后在/data/anr/traces.txt文件中寫入ANR相關信息.
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { // ... ... if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); // 更新CPU使用率 } // ... ... final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); // dumpStackTraces是輸出traces文件的函數 File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST); String cpuInfo = null; if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); // 再次更新CPU信息 synchronized (mProcessCpuTracker) { // 輸出ANR發生前一段時間內的CPU使用率 cpuInfo = mProcessCpuTracker.printCurrentState(anrTime); } info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); } // 輸出ANR發生后一段時間內的CPU使用率 info.append(processCpuTracker.printCurrentState(anrTime)); Slog.e(TAG, info.toString()); if (tracesFile == null) { // There is no trace file, so dump (only) the alleged culprit's threads to the log Process.sendSignal(app.pid, Process.SIGNAL_QUIT); } // 將ANR信息同時輸出到DropBox中 addErrorToDropBox("anr", app, app.processName, activity, parent, annotation, cpuInfo, tracesFile, null); // ... ... synchronized (this) { // 顯示ANR提示對話框 // Bring up the infamous App Not Responding dialog Message msg = Message.obtain(); HashMap<String, Object> map = new HashMap<String, Object>(); msg.what = SHOW_NOT_RESPONDING_MSG; msg.obj = map; msg.arg1 = aboveSystem ? 1 : 0; map.put("app", app); if (activity != null) { map.put("activity", activity); } mUiHandler.sendMessage(msg); } }
避免ANR?
UI線程盡量只做跟UI相關的工作
耗時的工作(比如數據庫操作,I/O,連接網絡或者別的有可能阻礙UI線程的操作)把它放入單獨的線程處理
盡量用Handler來處理UIthread和別的thread之間的交互
分析ANR的Log:
出現ANR的log如下:
06-22 10:37:46.204 3547 3604 E ActivityManager: ANR in org.code:MessengerService // ANR出現的進程包名 E ActivityManager: PID: 17027 // ANR進程ID E ActivityManager: Reason: executing service org.code/.ipc.MessengerService //導致ANR的原因 E ActivityManager: Load: 8.31 / 8.12 / 8.47 E ActivityManager: CPU usage from 0ms to 6462ms later: //CPU在ANR發生后的使用情況 E ActivityManager: 61% 3547/system_server: 21% user + 39% kernel / faults: 13302 minor 2 major E ActivityManager: 0.2% 475/debuggerd: 0% user + 0.1% kernel / faults: 6086 minor 1 major E ActivityManager: 10% 5742/com.android.phone: 5.1% user + 5.1% kernel / faults: 7597 minor E ActivityManager: 6.9% 5342/com.android.systemui: 2.1% user + 4.8% kernel / faults: 4158 minor E ActivityManager: 0.1% 477/debuggerd64: 0% user + 0.1% kernel / faults: 4013 minor E ActivityManager: 5.7% 16313/org.code: 3.2% user + 2.4% kernel / faults: 2412 minor E ActivityManager: 3.7% 17027/org.code:MessengerService: 1.7% user + 2% kernel / faults: 2571 minor 6 major E ActivityManager: 2.6% 306/cfinteractive: 0% user + 2.6% kernel ... ... E ActivityManager: +0% 17168/kworker/0:1: 0% user + 0% kernel E ActivityManager: 0% TOTAL: 0% user + 0% kernel + 0% softirq // CUP占用情況 E ActivityManager: CPU usage from 5628ms to 6183ms later: E ActivityManager: 42% 3547/system_server: 17% user + 24% kernel / faults: 11 minor E ActivityManager: 12% 3604/ActivityManager: 1.7% user + 10% kernel E ActivityManager: 12% 3609/android.display: 8.7% user + 3.5% kernel E ActivityManager: 3.5% 5304/Binder_6: 1.7% user + 1.7% kernel E ActivityManager: 3.5% 5721/Binder_A: 1.7% user + 1.7% kernel E ActivityManager: 3.5% 5746/Binder_C: 3.5% user + 0% kernel E ActivityManager: 1.7% 3599/Binder_1: 0% user + 1.7% kernel E ActivityManager: 1.7% 3600/Binder_2: 0% user + 1.7% kernel I ActivityManager: Killing 17027:org.code:MessengerService/u0a85 (adj 1): bg anr I art : Wrote stack traces to '/data/anr/traces.txt' //art這個TAG打印對traces操作的信息 D GraphicsStats: Buffer count: 9 W ActivityManager: Scheduling restart of crashed service org.code/.ipc.MessengerService in 1000ms
log打印了ANR的基本信息,我們可以分析CPU使用率推測ANR發生的時候設備在做什么工作;如果CPU使用率很高,接近100%,可能是在進行大規模的計算更可能是陷入死循環;如果CUP使用率很低,說明主線程被阻塞了,並且當IOwait很高,可能是主線程在等待I/O操作的完成.
對於ANR只是分析Log很難知道問題所在,我們還需要通過Trace文件分析stack調用情況.
----- pid 17027 at 2017-06-22 10:37:39 ----- // ANR出現的進程pid和時間(和上述log中pid一致) Cmd line: org.code:MessengerService // ANR出現的進程名 Build fingerprint: 'Homecare/qucii8976v3_64:6.0.1/pansen06141150:eng/test-keys' // 下面記錄系統版本,內存等狀態信息 ABI: 'arm64' Build type: optimized Zygote loaded classes=6576 post zygote classes=13 Intern table: 13780 strong; 17 weak JNI: CheckJNI is on; globals=283 (plus 158 weak) Libraries: /system/lib64/libandroid.so /system/lib64/libcompiler_rt.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libwebviewchromium_loader.so libjavacore.so (7) Heap: 29% free, 8MB/12MB; 75731 objects Dumping cumulative Gc timings Total number of allocations 75731 Total bytes allocated 8MB Total bytes freed 0B Free memory 3MB Free memory until GC 3MB Free memory until OOME 183MB Total memory 12MB Max memory 192MB Zygote space size 3MB Total mutator paused time: 0 Total time waiting for GC to complete: 0 Total GC count: 0 Total GC time: 0 Total blocking GC count: 0 Total blocking GC time: 0 suspend all histogram: Sum: 76us 99% C.I. 0.100us-28us Avg: 7.600us Max: 28us DALVIK THREADS (15): // Signal Catcher是記錄traces信息的線程 // Signal Catche(線程名)、(daemon)表示守護進程、prio(線程優先級,默認是5)、tid(線程唯一標識ID)、Runnable(線程當前狀態) "Signal Catcher" daemon prio=5 tid=3 Runnable //線程組名稱、suspendCount、debugSuspendCount、線程的Java對象地址、線程的Native對象地址 | group="system" sCount=0 dsCount=0 obj=0x12d8f0a0 self=0x5598ae55d0 //sysTid是線程號(主線程的線程號和進程號相同) | sysTid=17033 nice=0 cgrp=default sched=0/0 handle=0x7fb2350450 | state=R schedstat=( 4348125 172343 3 ) utm=0 stm=0 core=1 HZ=100 | stack=0x7fb2254000-0x7fb2256000 stackSize=1013KB | held mutexes= "mutator lock"(shared held) native: #00 pc 0000000000489e28 /system/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, char const*, art::ArtMethod*, void*)+236) native: #01 pc 0000000000458fe8 /system/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+220) native: #02 pc 0000000000465bc8 /system/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+688) native: #03 pc 0000000000466ae0 /system/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+276) native: #04 pc 000000000046719c /system/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+188) native: #05 pc 0000000000467a84 /system/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+492) native: #06 pc 0000000000431194 /system/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+96) native: #07 pc 000000000043e604 /system/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1256) native: #08 pc 000000000043f214 /system/lib64/libart.so (art::SignalCatcher::Run(void*)+452) native: #09 pc 0000000000068714 /system/lib64/libc.so (__pthread_start(void*)+52) native: #10 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16) (no managed stack frames) //main(線程名)、prio(線程優先級,默認是5)、tid(線程唯一標識ID)、Sleeping(線程當前狀態) "main" prio=5 tid=1 Sleeping | group="main" sCount=1 dsCount=0 obj=0x73132d10 self=0x5598a5f5e0 //sysTid是線程號(主線程的線程號和進程號相同) | sysTid=17027 nice=0 cgrp=default sched=0/0 handle=0x7fb6db6fe8 | state=S schedstat=( 420582038 5862546 143 ) utm=24 stm=18 core=6 HZ=100 | stack=0x7fefba3000-0x7fefba5000 stackSize=8MB | held mutexes= // java 堆棧調用信息(這里可查看導致ANR的代碼調用流程)(分析ANR最重要的信息) at java.lang.Thread.sleep!(Native method) - sleeping on <0x0c60f3c7> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:1031) - locked <0x0c60f3c7> (a java.lang.Object) // 鎖住對象0x0c60f3c7 at java.lang.Thread.sleep(Thread.java:985) at android.os.SystemClock.sleep(SystemClock.java:120) at org.code.ipc.MessengerService.onCreate(MessengerService.java:63) //導致ANR的代碼 at android.app.ActivityThread.handleCreateService(ActivityThread.java:2877) at android.app.ActivityThread.access$1900(ActivityThread.java:150) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1427) at android.os.Handler.dispatchMessage(Handler.java:102) at android.os.Looper.loop(Looper.java:148) at android.app.ActivityThread.main(ActivityThread.java:5417) at java.lang.reflect.Method.invoke!(Native method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
在log中顯示的pid在traces文件中與之對應,trace log中會打印當前所有線程的堆棧信息,一般我們主要關注main線程的堆棧(也有分析其他線程的情況,通過分析ANR發生時系統狀態推測出設備正在進行操作)
而這里然后通過查看堆棧調用信息分析ANR的代碼.上述ANR實際上在org.code.ipc.MessengerService.onCreate中63行調用SystemClock.sleep(10000)代碼導致的;這是比較簡單ANR示例.
以上僅為解決ANR問題提供一個思路,具體問題還需要根據實際情況具體分析
線程狀態的分類: java中定義的線程狀態:
// libcore/libart/src/main/java/java/lang/Thread.java /** * A representation of a thread's state. A given thread may only be in one * state at a time. */ public enum State { /** * The thread has been created, but has never been started. */ NEW, /** * The thread may be run. */ RUNNABLE, /** * The thread is blocked and waiting for a lock. */ BLOCKED, /** * The thread is waiting. */ WAITING, /** * The thread is waiting for a specified amount of time. */ TIMED_WAITING, /** * The thread has been terminated. */ TERMINATED }
C代碼中定義的線程狀態:
// /art/runtime/thread_state.h enum ThreadState { // Thread.State JDWP state kTerminated = 66, // TERMINATED TS_ZOMBIE Thread.run has returned, but Thread* still around kRunnable, // RUNNABLE TS_RUNNING runnable kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep() kBlocked, // BLOCKED TS_MONITOR blocked on a monitor kWaiting, // WAITING TS_WAIT in Object.wait() kWaitingForGcToComplete, // WAITING TS_WAIT blocked waiting for GC kWaitingForCheckPointsToRun, // WAITING TS_WAIT GC waiting for checkpoints to run kWaitingPerformingGc, // WAITING TS_WAIT performing GC kWaitingForDebuggerSend, // WAITING TS_WAIT blocked waiting for events to be sent kWaitingForDebuggerToAttach, // WAITING TS_WAIT blocked waiting for debugger to attach kWaitingInMainDebuggerLoop, // WAITING TS_WAIT blocking/reading/processing debugger events kWaitingForDebuggerSuspension, // WAITING TS_WAIT waiting for debugger suspend all kWaitingForJniOnLoad, // WAITING TS_WAIT waiting for execution of dlopen and JNI on load code kWaitingForSignalCatcherOutput, // WAITING TS_WAIT waiting for signal catcher IO to complete kWaitingInMainSignalCatcherLoop, // WAITING TS_WAIT blocking/reading/processing signals kWaitingForDeoptimization, // WAITING TS_WAIT waiting for deoptimization suspend all kWaitingForMethodTracingStart, // WAITING TS_WAIT waiting for method tracing to start kWaitingForVisitObjects, // WAITING TS_WAIT waiting for visiting objects kWaitingForGetObjectsAllocated, // WAITING TS_WAIT waiting for getting the number of allocated objects kStarting, // NEW TS_WAIT native thread started, not yet ready to run managed code kNative, // RUNNABLE TS_RUNNING running in a JNI native method kSuspended, // RUNNABLE TS_RUNNING suspended by GC or debugger };
兩者可以看出在C代碼中定義更為詳細,Traces中顯示的線程狀態都是C代碼定義的.我們可以通過查看線程狀態對應的信息分析ANR問題.
如: TimedWaiting對應的線程狀態是TIMED_WAITING
kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout 執行了無超時參數的wait函數
kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep() 執行了帶有超時參數的sleep函數
ZOMBIE 線程死亡,終止運行 RUNNING/RUNNABLE 線程可運行或正在運行 TIMED_WAIT 執行了帶有超時參數的wait、sleep或join函數 MONITOR 線程阻塞,等待獲取對象鎖 WAIT 執行了無超時參數的wait函數 INITIALIZING 新建,正在初始化,為其分配資源 STARTING 新建,正在啟動 NATIVE 正在執行JNI本地函數 VMWAIT 正在等待VM資源 SUSPENDED 線程暫停,通常是由於GC或debug被暫停
from: https://blog.csdn.net/qq_25804863/article/details/49111005