一、概述
作為 Android 開發者,相信大家都遇到過 ANR。那么為什么會出現 ANR 呢,ANR 之后系統都做了啥。文章將對這個問題詳細解說。
ANR(Application Not responding),是指應用程序未響應,Android系統對於一些事件需要在一定的時間范圍內完成,如果超過預定時間能未能得到有效響應或者響應時間過長,都會造成ANR。一般地,這時往往會彈出一個提示框,告知用戶當前xxx未響應,用戶可選擇繼續等待或者Force Close。
那么哪些場景會造成ANR呢?
-
Service Timeout:比如前台服務在20s內未執行完成;
-
BroadcastQueue Timeout:比如前台廣播在10s內未執行完成
-
ContentProvider Timeout:內容提供者,在publish過超時10s;
-
InputDispatching Timeout: 輸入事件分發超時5s,包括按鍵和觸摸事件。
觸發ANR的過程可分為三個步驟: 埋炸彈, 拆炸彈, 引爆炸彈。
埋炸彈可以理解為發送了一個延遲觸發的消息(炸彈);
拆炸彈可以理解為將這個延遲消息(炸彈)取消了,也就不會觸發了;
引爆炸彈可以理解為延遲時間已達,開始處理延遲消息(炸彈引爆了)。
二、Service
先附上一張 service 啟動流程圖:
Service Timeout是位於”ActivityManager”線程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG
消息時觸發。
對於Service有兩類:
- 對於前台服務,則超時為SERVICE_TIMEOUT = 20s;
- 對於后台服務,則超時為SERVICE_BACKGROUND_TIMEOUT = 200s
由變量ProcessRecord.execServicesFg來決定是否前台啟動。
2.1 埋炸彈
其中在Service進程attach到system_server進程的過程中會調用realStartServiceLocked()
方法來埋下炸彈.
首先咱們先看 service 的啟動中一個方法 realStartServiceLocked:
// ActiveServices.java private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException { ... //發送delay消息(SERVICE_TIMEOUT_MSG) bumpServiceExecutingLocked(r, execInFg, "create"); try { ... //最終執行服務的onCreate()方法 app.thread.scheduleCreateService(r, r.serviceInfo, mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState); } catch (DeadObjectException e) { mAm.appDiedLocked(app); throw e; } finally { ... } } private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) { ... scheduleServiceTimeoutLocked(r.app); } void scheduleServiceTimeoutLocked(ProcessRecord proc) { if (proc.executingServices.size() == 0 || proc.thread == null) { return; } long now = SystemClock.uptimeMillis(); Message msg = mAm.mHandler.obtainMessage( ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; //當超時后仍沒有remove該SERVICE_TIMEOUT_MSG消息,則執行service Timeout流程 mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT)); }
在 AS.realStartServiceLocked 啟動 service 方法中,發送了了一個延時的關於超時的消息,這里又對 service 進行了前后台的區分:
// How long we wait for a service to finish executing. 20s static final int SERVICE_TIMEOUT = 20*1000; // How long we wait for a service to finish executing. 200s static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
2.2 拆炸彈
AS.realStartServiceLocked() 調用的過程會埋下一顆炸彈, 超時沒有啟動完成則會爆炸. 那么什么時候會拆除這顆炸彈的引線呢? 經過Binder等層層調用進入目標進程的主線程handleCreateService()的過程.
// ActivityThread,這里多說一句, ApplicationThread 是其內部類 private void handleCreateService(CreateServiceData data) { ... java.lang.ClassLoader cl = packageInfo.getClassLoader(); Service service = (Service) cl.loadClass(data.info.name).newInstance(); ... try { //創建ContextImpl對象 ContextImpl context = ContextImpl.createAppContext(this, packageInfo); context.setOuterContext(service); //創建Application對象 Application app = packageInfo.makeApplication(false, mInstrumentation); service.attach(context, this, data.info.name, data.token, app, ActivityManagerNative.getDefault()); //調用服務onCreate()方法 service.onCreate(); // ActivityManagerNative.getDefault().serviceDoneExecuting( data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0); } catch (Exception e) { ... } }
在這個過程會創建目標服務對象,以及回調 onCreate() 方法, 緊接再次經過多次調用回到 system_server 來執行 serviceDoneExecuting 。
// ActiveServices private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) { ... if (r.executeNesting <= 0) { if (r.app != null) { r.app.execServicesFg = false; r.app.executingServices.remove(r); if (r.app.executingServices.size() == 0) { //當前服務所在進程中沒有正在執行的service mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); ... } ... }
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
該方法會在 service 啟動完成后移除服務超時消息 SERVICE_TIMEOUT_MSG,時間是 20s。
2.3 引爆炸彈
前面介紹了埋炸彈和拆炸彈的過程, 如果在炸彈倒計時結束之前成功拆卸炸彈,那么就沒有爆炸的機會, 但是世事難料. 總有些極端情況下無法即時拆除炸彈,導致炸彈爆炸, 其結果就是 App 發生 ANR. 接下來,帶大家來看看炸彈爆炸的現場:
在 system_server 進程中有一個Handler線程,當倒計時結束便會向該 Handler 線程發送一條信息SERVICE_TIMEOUT_MSG
,
// ActivityManagerService.java ::MainHandler final class MainHandler extends Handler { public MainHandler(Looper looper) { super(looper, null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { ......case SERVICE_TIMEOUT_MSG: { mServices.serviceTimeout((ProcessRecord)msg.obj); } break; } }
當延時時間到了之后,就會對消息進行處理,下面看下具體處理邏輯:
oid serviceTimeout(ProcessRecord proc) { String anrMessage = null; synchronized(mAm) { if (proc.executingServices.size() == 0 || proc.thread == null) { return; } final long now = SystemClock.uptimeMillis(); final long maxTime = now - (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); ServiceRecord timeout = null; long nextTime = 0; for (int i=proc.executingServices.size()-1; i>=0; i--) {
// 從進程里面獲取正在運行的 service ServiceRecord sr = proc.executingServices.valueAt(i); if (sr.executingStart < maxTime) { timeout = sr; break; } if (sr.executingStart > nextTime) { nextTime = sr.executingStart; } } if (timeout != null && mAm.mLruProcesses.contains(proc)) { Slog.w(TAG, "Timeout executing service: " + timeout); StringWriter sw = new StringWriter(); PrintWriter pw = new FastPrintWriter(sw, false, 1024); pw.println(timeout); timeout.dump(pw, " "); pw.close(); mLastAnrDump = sw.toString(); mAm.mHandler.removeCallbacks(mLastAnrDumpClearer); mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS); anrMessage = "executing service " + timeout.shortName; } } if (anrMessage != null) { //當存在timeout的service,則執行appNotResponding mAm.appNotResponding(proc, null, null, false, anrMessage); } }
其中anrMessage的內容為”executing service [發送超時serviceRecord信息]”;
2.4 前台與后台服務的區別
系統對前台服務啟動的超時為20s,而后台服務超時為200s,那么系統是如何區別前台還是后台服務呢?來看看ActiveServices的核心邏輯:
ComponentName startServiceLocked(...) { final boolean callerFg; if (caller != null) { final ProcessRecord callerApp = mAm.getRecordForAppLocked(caller); callerFg = callerApp.setSchedGroup != ProcessList.SCHED_GROUP_BACKGROUND; } else { callerFg = true; } ... ComponentName cmp = startServiceInnerLocked(smap, service, r, callerFg, addToStarting); return cmp; }
在startService過程根據發起方進程 callerApp 所屬的進程調度組來決定被啟動的服務是屬於前台還是后台。當發起方進程不等於ProcessList.SCHED_GROUP_BACKGROUND (后台進程組) 則認為是前台服務,否則為后台服務,並標記在ServiceRecord的成員變量createdFromFg。
什么進程屬於SCHED_GROUP_BACKGROUND調度組呢?進程調度組大體可分為TOP、前台、后台,進程優先級(Adj)和進程調度組(SCHED_GROUP)算法較為復雜,其對應關系可粗略理解為Adj等於0的進程屬於Top進程組,Adj等於100或者200的進程屬於前台進程組,Adj大於200的進程屬於后台進程組。關於Adj的含義見下表,簡單來說就是Adj>200的進程對用戶來說基本是無感知,主要是做一些后台工作,故后台服務擁有更長的超時閾值,同時后台服務屬於后台進程調度組,相比前台服務屬於前台進程調度組,分配更少的CPU時間片。
前台服務准確來說,是指由處於前台進程調度組的進程發起的服務
。這跟常說的fg-service服務有所不同,fg-service是指掛有前台通知的服務。
需要注意的問題,如果日志中出現 Reason: executing service com.example.baidu/.AnrService 也不一定是因為服務本身耗時導致,比如啟動服務后,執行了耗時的操作,啟動服務時onCreate函數或者 onStartCommand函數不能執行,超時后,仍然會造成anr
三、BroadcastReceiver
BroadcastReceiver Timeout 是位於”ActivityManager”線程中的BroadcastQueue.BroadcastHandler收到BROADCAST_TIMEOUT_MSG
消息時觸發。
對於廣播隊列有兩個: foreground 隊列和 background 隊列:
- 對於前台廣播,則超時為 BROADCAST_FG_TIMEOUT = 10s;
- 對於后台廣播,則超時為 BROADCAST_BG_TIMEOUT = 60s
3.1 埋炸彈
先看發送廣播的邏輯:
// ActivityManagerService.java] public final int broadcastIntent(IApplicationThread caller, Intent intent, String resolvedType, IIntentReceiver resultTo, int resultCode, String resultData, Bundle resultExtras, String[] requiredPermissions, int appOp, Bundle bOptions, boolean serialized, boolean sticky, int userId) { enforceNotIsolatedCaller("broadcastIntent"); synchronized(this) {
// 驗證廣播的有效性 intent = verifyBroadcastLocked(intent); // 獲取發送廣播的進程信息 final ProcessRecord callerApp = getRecordForAppLocked(caller); final int callingPid = Binder.getCallingPid(); final int callingUid = Binder.getCallingUid(); final long origId = Binder.clearCallingIdentity(); try { return broadcastIntentLocked(callerApp, callerApp != null ? callerApp.info.packageName : null, intent, resolvedType, resultTo, resultCode, resultData, resultExtras, requiredPermissions, appOp, bOptions, serialized, sticky, callingPid, callingUid, callingUid, callingPid, userId); } finally { Binder.restoreCallingIdentity(origId); } } }
broadcastIntent()方法有兩個布爾參數 serialized 和 sticky 來共同決定是普通廣播,有序廣播,還是 Sticky 廣播,參數如下:
類型 | serialized | sticky |
---|---|---|
sendBroadcast | false | false |
sendOrderedBroadcast | true | false |
sendStickyBroadcast | false | true |
說完發送廣播,接下去就要講講講收廣播的操作了。
首先廣播發出去之后,肯定會存在一個隊列里面來進行處理。
// ActivityManagerService public ActivityManagerService(Context systemContext, ActivityTaskManagerService atm) { // ...... 創建了三個隊列來保存不同的廣播類型 mFgBroadcastQueue = new BroadcastQueue(this, mHandler, "foreground", foreConstants, false); mBgBroadcastQueue = new BroadcastQueue(this, mHandler, "background", backConstants, true); mOffloadBroadcastQueue = new BroadcastQueue(this, mHandler, "offload", offloadConstants, true); mBroadcastQueues[0] = mFgBroadcastQueue; mBroadcastQueues[1] = mBgBroadcastQueue; mBroadcastQueues[2] = mOffloadBroadcastQueue; }
在 ams 的構造函數里面,可以發現這里對廣播進行了分類,分別有前台廣播,后台廣播,Offload 廣播,並用一個新的數組將這三個隊列放在一起。這里的 handler 是 MainHandler,也就是主線程的。傳入是為了獲取其 looper 。
BroadcastQueue(ActivityManagerService service, Handler handler, String name, BroadcastConstants constants, boolean allowDelayBehindServices) { mService = service;
// 廣播的 handler 主要是獲取到 ams 中 handler looper 來創建的 mHandler = new BroadcastHandler(handler.getLooper()); mQueueName = name; mDelayBehindServices = allowDelayBehindServices; mConstants = constants; mDispatcher = new BroadcastDispatcher(this, mConstants, mHandler, mService); }
下面就說下處理廣播的邏輯:
private final class BroadcastHandler extends Handler { public BroadcastHandler(Looper looper) { super(looper, null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { case BROADCAST_INTENT_MSG: { if (DEBUG_BROADCAST) Slog.v( TAG_BROADCAST, "Received BROADCAST_INTENT_MSG [" + mQueueName + "]");
// 開始處理廣播 processNextBroadcast(true); } break; case BROADCAST_TIMEOUT_MSG: { synchronized (mService) { broadcastTimeoutLocked(true); } } break; } } }
可以發現這里調用的是 processNextBroadcast 方法來處理廣播。
final void processNextBroadcast(boolean fromMsg) { synchronized(mService) { //part1: 處理並行廣播 while (mParallelBroadcasts.size() > 0) { r = mParallelBroadcasts.remove(0); r.dispatchTime = SystemClock.uptimeMillis(); r.dispatchClockTime = System.currentTimeMillis(); final int N = r.receivers.size(); for (int i=0; i<N; i++) { Object target = r.receivers.get(i); //分發廣播給已注冊的receiver deliverToRegisteredReceiverLocked(r, (BroadcastFilter)target, false); } addBroadcastToHistoryLocked(r);//將廣播添加歷史統計 } //part2: 處理當前有序廣播 do { if (mOrderedBroadcasts.size() == 0) { mService.scheduleAppGcsLocked(); //沒有更多的廣播等待處理 if (looped) { mService.updateOomAdjLocked(); } return; } r = mOrderedBroadcasts.get(0); //獲取串行廣播的第一個廣播 boolean forceReceive = false; int numReceivers = (r.receivers != null) ? r.receivers.size() : 0; if (mService.mProcessesReady && r.dispatchTime > 0) { long now = SystemClock.uptimeMillis(); if ((numReceivers > 0) && (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) { broadcastTimeoutLocked(false); //當廣播處理時間超時,則強制結束這條廣播 } } ... if (r.receivers == null || r.nextReceiver >= numReceivers || r.resultAbort || forceReceive) { if (r.resultTo != null) { //處理廣播消息消息,調用到onReceive() performReceiveLocked(r.callerApp, r.resultTo, new Intent(r.intent), r.resultCode, r.resultData, r.resultExtras, false, false, r.userId); } cancelBroadcastTimeoutLocked(); //取消BROADCAST_TIMEOUT_MSG消息 addBroadcastToHistoryLocked(r); mOrderedBroadcasts.remove(0); continue; } } while (r == null); //part3: 獲取下一個receiver r.receiverTime = SystemClock.uptimeMillis(); if (recIdx == 0) { r.dispatchTime = r.receiverTime; r.dispatchClockTime = System.currentTimeMillis(); } if (!mPendingBroadcastTimeoutMessage) { long timeoutTime = r.receiverTime + mTimeoutPeriod; setBroadcastTimeoutLocked(timeoutTime); //設置廣播超時延時消息 } //part4: 處理下條有序廣播 ProcessRecord app = mService.getProcessRecordLocked(targetProcess, info.activityInfo.applicationInfo.uid, false); if (app != null && app.thread != null) { app.addPackage(info.activityInfo.packageName, info.activityInfo.applicationInfo.versionCode, mService.mProcessStats); processCurBroadcastLocked(r, app); //[處理串行廣播] return; ... } //該receiver所對應的進程尚未啟動,則創建該進程 if ((r.curApp=mService.startProcessLocked(targetProcess, info.activityInfo.applicationInfo, true, r.intent.getFlags() | Intent.FLAG_FROM_BACKGROUND, "broadcast", r.curComponent, (r.intent.getFlags()&Intent.FLAG_RECEIVER_BOOT_UPGRADE) != 0, false, false)) == null) { ... return; } } }
對於廣播超時處理時機:
-
首先在part3的過程中setBroadcastTimeoutLocked(timeoutTime) 設置超時廣播消息;
-
然后在part2根據廣播處理情況來處理:
-
當廣播接收者等待時間過長,則調用 broadcastTimeoutLocked(false);也就是引爆炸彈
-
當執行完廣播,則調用 cancelBroadcastTimeoutLocked; 也就是拆除炸彈
-
// BroadcastQueue final void setBroadcastTimeoutLocked(long timeoutTime) { if (! mPendingBroadcastTimeoutMessage) { Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this); mHandler.sendMessageAtTime(msg, timeoutTime); mPendingBroadcastTimeoutMessage = true; } }
設置定時廣播 BROADCAST_TIMEOUT_MSG,即當前往后推 mTimeoutPeriod 時間廣播還沒處理完畢,則進入廣播超時流程。
// BroadcastConstants.java
private static final long DEFAULT_TIMEOUT = 10_000; // Timeout period for this broadcast queue public long TIMEOUT = DEFAULT_TIMEOUT; // Unspecified fields retain their current value rather than revert to default 超時時間還是可以設置的 TIMEOUT = mParser.getLong(KEY_TIMEOUT, TIMEOUT);
來看下具體時間的設置,超時設置的是 10 s。
3.2 拆炸彈
broadcast跟service超時機制大抵相同:
// 取消超時 final void cancelBroadcastTimeoutLocked() { if (mPendingBroadcastTimeoutMessage) { // 移除消息 mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this); mPendingBroadcastTimeoutMessage = false; } }
移除廣播超時消息 BROADCAST_TIMEOUT_MSG,這樣就把詐彈拆除了。
3.3 引爆炸彈
下面看下引爆炸彈的邏輯,前面我們已經介紹了 BroadcastQueue 中的 handler 的實現了,下面直接看下超時的處理邏輯:
//fromMsg = true final void broadcastTimeoutLocked(boolean fromMsg) { if (fromMsg) { mPendingBroadcastTimeoutMessage = false; } if (mOrderedBroadcasts.size() == 0) { return; } long now = SystemClock.uptimeMillis(); BroadcastRecord r = mOrderedBroadcasts.get(0); if (fromMsg) { if (mService.mDidDexOpt) { mService.mDidDexOpt = false; long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod; setBroadcastTimeoutLocked(timeoutTime); return; } if (!mService.mProcessesReady) { return; //當系統還沒有准備就緒時,廣播處理流程中不存在廣播超時 } long timeoutTime = r.receiverTime + mTimeoutPeriod; if (timeoutTime > now) { //如果當前正在執行的receiver沒有超時,則重新設置廣播超時 setBroadcastTimeoutLocked(timeoutTime); return; } } BroadcastRecord br = mOrderedBroadcasts.get(0); if (br.state == BroadcastRecord.WAITING_SERVICES) { //廣播已經處理完成,但需要等待已啟動service執行完成。當等待足夠時間,則處理下一條廣播。 br.curComponent = null; br.state = BroadcastRecord.IDLE; processNextBroadcast(false); return; } r.receiverTime = now; //當前BroadcastRecord的anr次數執行加1操作 r.anrCount++; if (r.nextReceiver <= 0) { return; } ... Object curReceiver = r.receivers.get(r.nextReceiver-1); //查詢App進程 if (curReceiver instanceof BroadcastFilter) { BroadcastFilter bf = (BroadcastFilter)curReceiver; if (bf.receiverList.pid != 0 && bf.receiverList.pid != ActivityManagerService.MY_PID) { synchronized (mService.mPidsSelfLocked) { app = mService.mPidsSelfLocked.get( bf.receiverList.pid); } } } else { app = r.curApp; } if (app != null) { anrMessage = "Broadcast of " + r.intent.toString(); } if (mPendingBroadcast == r) { mPendingBroadcast = null; } //繼續移動到下一個廣播接收者 finishReceiverLocked(r, r.resultCode, r.resultData, r.resultExtras, r.resultAbort, false); scheduleBroadcastsLocked(); if (anrMessage != null) { // 發送 anr 消息,帶上了 anr 進程信息和 anr 消息 mHandler.post(new AppNotResponding(app, anrMessage)); } }
-
mOrderedBroadcasts已處理完成,則不會anr;
-
正在執行dexopt,則不會anr;
-
系統還沒有進入ready狀態(mProcessesReady=false),則不會anr;
-
如果當前正在執行的receiver沒有超時,則重新設置廣播超時,不會anr;
來看下 AppNotResponding 實現:
private final class AppNotResponding implements Runnable { private final ProcessRecord mApp; private final String mAnnotation; public AppNotResponding(ProcessRecord app, String annotation) { mApp = app; mAnnotation = annotation; } @Override public void run() { mApp.appNotResponding(null, null, null, null, false, mAnnotation); } }
最終會讓 ProcessRecord 來處理 anr,並且其內部持有 ActivityManagerService 實例。
3.4 前台與后台廣播超時
前台廣播超時為10s,后台廣播超時為60s,那么如何區分前台和后台廣播呢?來看看AMS的核心邏輯:
BroadcastQueue broadcastQueueForIntent(Intent intent) { final boolean isFg = (intent.getFlags() & Intent.FLAG_RECEIVER_FOREGROUND) != 0; return (isFg) ? mFgBroadcastQueue : mBgBroadcastQueue; } mFgBroadcastQueue = new BroadcastQueue(this, mHandler, "foreground", BROADCAST_FG_TIMEOUT, false); mBgBroadcastQueue = new BroadcastQueue(this, mHandler, "background", BROADCAST_BG_TIMEOUT, true);
根據發送廣播sendBroadcast(Intent intent)中的intent的flags是否包含 FLAG_RECEIVER_FOREGROUND 來決定把該廣播是放入前台廣播隊列或者后台廣播隊列,前台廣播隊列的超時為10s,后台廣播隊列的超時為60s,默認情況下廣播是放入后台廣播隊列,除非指明加上 FLAG_RECEIVER_FOREGROUND 標識。
后台廣播比前台廣播擁有更長的超時閾值,同時在廣播分發過程遇到后台service的啟動(mDelayBehindServices)會延遲分發廣播,等待service的完成,因為等待service而導致的廣播ANR會被忽略掉;后台廣播屬於后台進程調度組,而前台廣播屬於前台進程調度組。簡而言之,后台廣播更不容易發生ANR,同時執行的速度也會更慢。
另外,只有串行處理的廣播才有超時機制,因為接收者是串行處理的,前一個receiver處理慢,會影響后一個receiver;並行廣播通過一個循環一次性向所有的receiver分發廣播事件,所以不存在彼此影響的問題,則沒有廣播超時。
前台廣播准確來說,是指位於前台廣播隊列的廣播
。
四 ContentProvider
ContentProvider Timeout是位於”ActivityManager”線程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息時觸發。
ContentProvider 超時為CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s. 這個跟前面的Service和BroadcastQueue完全不同, 由 Provider 進程啟動過程相關.
4.1 埋炸彈
埋炸彈的過程其實是在進程創建的過程,進程創建后會調用attachApplicationLocked() 進入system_server進程。
// ActivityManagerService private final boolean attachApplicationLocked(IApplicationThread thread, int pid) { ProcessRecord app; if (pid != MY_PID && pid >= 0) { synchronized (mPidsSelfLocked) { app = mPidsSelfLocked.get(pid); // 根據pid獲取ProcessRecord } } ... //系統處於ready狀態或者該app為FLAG_PERSISTENT進程則為true boolean normalMode = mProcessesReady || isAllowedWhileBooting(app.info); List<ProviderInfo> providers = normalMode ? generateApplicationProvidersLocked(app) : null; //app進程存在正在啟動中的provider,則超時10s后發送CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息 if (providers != null && checkAppInLaunchingProvidersLocked(app)) { Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG); msg.obj = app; mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT); } thread.bindApplication(...); ... }
// 10s
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10*1000;
10s 之后引爆該炸彈.
4.2 拆炸彈
當 provider 成功 publish 之后,便會拆除該炸彈.
public final void publishContentProviders(IApplicationThread caller, List<ContentProviderHolder> providers) { ... synchronized (this) { final ProcessRecord r = getRecordForAppLocked(caller); final int N = providers.size(); for (int i = 0; i < N; i++) { ContentProviderHolder src = providers.get(i); ... ContentProviderRecord dst = r.pubProviders.get(src.info.name); if (dst != null) { ComponentName comp = new ComponentName(dst.info.packageName, dst.info.name); mProviderMap.putProviderByClass(comp, dst); //將該provider添加到mProviderMap String names[] = dst.info.authority.split(";"); for (int j = 0; j < names.length; j++) { mProviderMap.putProviderByName(names[j], dst); } int launchingCount = mLaunchingProviders.size(); int j; boolean wasInLaunchingProviders = false; for (j = 0; j < launchingCount; j++) { if (mLaunchingProviders.get(j) == dst) { //將該provider移除mLaunchingProviders隊列 mLaunchingProviders.remove(j); wasInLaunchingProviders = true; j--; launchingCount--; } } //成功pubish則移除該消息 if (wasInLaunchingProviders) { mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r); } synchronized (dst) { dst.provider = src.provider; dst.proc = r; //喚醒客戶端的wait等待方法 dst.notifyAll(); } ... } } } }
4.3 引爆炸彈
在system_server進程中有一個Handler線程, 名叫”ActivityManager”.當倒計時結束便會向該Handler線程發送 一條信息CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG.
MainHandler 是 AMS 的內部類。
final class MainHandler extends Handler { public void handleMessage(Message msg) { switch (msg.what) { case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: { ... ProcessRecord app = (ProcessRecord)msg.obj; synchronized (ActivityManagerService.this) { //【見小節4.3.2】 processContentProviderPublishTimedOutLocked(app); } } break; ... } ... } } private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) { //[見4.3.3] cleanupAppInLaunchingProvidersLocked(app, true); //[見小節4.3.4] removeProcessLocked(app, false, true, "timeout publishing content providers"); } boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) { boolean restart = false; for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) { ContentProviderRecord cpr = mLaunchingProviders.get(i); if (cpr.launchingApp == app) { if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) { restart = true; } else { //移除死亡的provider removeDyingProviderLocked(app, cpr, true); } } } return restart; }
removeDyingProviderLocked()的功能跟進程的存活息息相關:詳見ContentProvider引用計數 []小節4.5]
-
對於stable類型的provider(即conn.stableCount > 0),則會殺掉所有跟該provider建立stable連接的非persistent進程.
-
對於unstable類的provider(即conn.unstableCount > 0),並不會導致client進程被級聯所殺.
五、input超時機制
input的超時檢測機制跟service、broadcast、provider截然不同,為了更好的理解input過程先來介紹兩個重要線程的相關工作:
-
InputReader線程負責通過EventHub(監聽目錄/dev/input)讀取輸入事件,一旦監聽到輸入事件則放入到InputDispatcher的mInBoundQueue隊列,並通知其處理該事件;
-
InputDispatcher線程負責將接收到的輸入事件分發給目標應用窗口,分發過程使用到3個事件隊列:
-
mInBoundQueue用於記錄InputReader發送過來的輸入事件;
-
outBoundQueue用於記錄即將分發給目標應用窗口的輸入事件;
-
waitQueue用於記錄已分發給目標應用,且應用尚未處理完成的輸入事件;
-
input的超時機制並非時間到了一定就會爆炸,而是處理后續上報事件的過程才會去檢測是否該爆炸,所以更像是掃雷的過程,具體如下圖所示。
-
InputReader線程通過EventHub監聽底層上報的輸入事件,一旦收到輸入事件則將其放至mInBoundQueue隊列,並喚醒InputDispatcher線程
-
InputDispatcher開始分發輸入事件,設置埋雷的起點時間。先檢測是否有正在處理的事件(mPendingEvent),如果沒有則取出mInBoundQueue隊頭的事件,並將其賦值給mPendingEvent,且重置ANR的timeout;否則不會從mInBoundQueue中取出事件,也不會重置timeout。然后檢查窗口是否就緒(checkWindowReadyForMoreInputLocked),滿足以下任一情況,則會進入掃雷狀態(檢測前一個正在處理的事件是否超時),終止本輪事件分發,否則繼續執行步驟3。當應用窗口准備就緒,則將mPendingEvent轉移到outBoundQueue隊列
-
對於按鍵類型的輸入事件,則outboundQueue或者waitQueue不為空,
-
對於非按鍵的輸入事件,則waitQueue不為空,且等待隊頭時間超時500ms
-
-
當outBoundQueue不為空,且應用管道對端連接狀態正常,則將數據從outboundQueue中取出事件,放入waitQueue隊列
-
InputDispatcher通過socket告知目標應用所在進程可以准備開始干活
-
App在初始化時默認已創建跟中控系統雙向通信的socketpair,此時App的包工頭(main線程)收到輸入事件后,會層層轉發到目標窗口來處理
-
包工頭完成工作后,會通過socket向中控系統匯報工作完成,則中控系統會將該事件從waitQueue隊列中移除。
input超時機制為什么是掃雷,而非定時爆炸呢?是由於對於input來說即便某次事件執行時間超過timeout時長,只要用戶后續在沒有再生成輸入事件,則不會觸發ANR。 這里的掃雷是指當前輸入系統中正在處理着某個耗時事件的前提下,后續的每一次input事件都會檢測前一個正在處理的事件是否超時(進入掃雷狀態),檢測當前的時間距離上次輸入事件分發時間點是否超過timeout時長。如果前一個輸入事件,則會重置ANR的timeout,從而不會爆炸。
到這里,關於 service ,廣播,provider 的 anr 原因都講清楚了。下面就看看是如何對 anr 信息進行收集的。
六、appNotResponding處理流程
不管是啥 anr ,最終都會調用到 ProcessRecord 的 appNotResponding 方法,下面來看看這個方法里面具體都做了啥:
// ProcessRecord.java void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo, String parentShortComponentName, WindowProcessController parentProcess, boolean aboveSystem, String annotation) { ArrayList<Integer> firstPids = new ArrayList<>(5); SparseArray<Boolean> lastPids = new SparseArray<>(20); mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr", true)); // anr 時間,實際上發生 anr 的時候,此時收集的運行堆棧有可能並不是引起 anr 的堆棧 long anrTime = SystemClock.uptimeMillis(); if (isMonitorCpuUsage()) { mService.updateCpuStatsNow(); } synchronized (mService) { // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down. 關機時發生 anr 會被忽略,因為可能會引起長時間阻塞 if (mService.mAtmInternal.isShuttingDown()) { Slog.i(TAG, "During shutdown skipping ANR: " + this + " " + annotation); return; } else if (isNotResponding()) { Slog.i(TAG, "Skipping duplicate ANR: " + this + " " + annotation); return; } else if (isCrashing()) { Slog.i(TAG, "Crashing app skipping ANR: " + this + " " + annotation); return; } else if (killedByAm) { Slog.i(TAG, "App already killed by AM skipping ANR: " + this + " " + annotation); return; } else if (killed) { Slog.i(TAG, "Skipping died app ANR: " + this + " " + annotation); return; } // In case we come through here for the same app before completing // this one, mark as anring now so we will bail out. 這樣可以避免重復進入 setNotResponding(true); // Log the ANR to the event log. 記錄 anr 到 eventlog EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags, annotation); // Dump thread traces as quickly as we can, starting with "interesting" processes. 將當前進程添加到 firstPids 中 firstPids.add(pid); // Don't dump other PIDs if it's a background ANR if (!isSilentAnr()) { int parentPid = pid; if (parentProcess != null && parentProcess.getPid() > 0) { parentPid = parentProcess.getPid(); } if (parentPid != pid) firstPids.add(parentPid); // 將system_server進程添加到firstPids if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID); for (int i = getLruProcessList().size() - 1; i >= 0; i--) { ProcessRecord r = getLruProcessList().get(i); if (r != null && r.thread != null) { int myPid = r.pid; if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) { if (r.isPersistent()) { firstPids.add(myPid); // 將persistent進程添加到firstPids if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r); } else if (r.treatLikeActivity) { firstPids.add(myPid); // 使用了 BIND_TREAT_LIKE_ACTIVITY if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r); } else { lastPids.put(myPid, Boolean.TRUE); // 其他進程添加到lastPids if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r); } } } } } } // Log the ANR to the main log. 記錄 anr 到 mainlog StringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(processName); if (activityShortComponentName != null) { info.append(" (").append(activityShortComponentName).append(")"); } info.append("\n"); info.append("PID: ").append(pid).append("\n"); if (annotation != null) { info.append("Reason: ").append(annotation).append("\n"); } if (parentShortComponentName != null && parentShortComponentName.equals(activityShortComponentName)) { info.append("Parent: ").append(parentShortComponentName).append("\n"); } // 創建 cpu tracker 對象 ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); // don't dump native PIDs for background ANRs unless it is the process of interest String[] nativeProcs = null; if (isSilentAnr()) { for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) { if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) { nativeProcs = new String[] { processName }; break; } } } else { nativeProcs = NATIVE_STACKS_OF_INTEREST; } // 獲取 native 進程 int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs); ArrayList<Integer> nativePids = null; if (pids != null) { nativePids = new ArrayList<>(pids.length); for (int i : pids) { nativePids.add(i); } } // For background ANRs, don't pass the ProcessCpuTracker to // avoid spending 1/2 second collecting stats to rank lastPids. 收集堆棧信息 File tracesFile = ActivityManagerService.dumpStackTraces(firstPids, (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids, nativePids); String cpuInfo = null;
// 添加 cpu 信息 if (isMonitorCpuUsage()) { mService.updateCpuStatsNow(); synchronized (mService.mProcessCpuTracker) { cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime); } info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); } info.append(processCpuTracker.printCurrentState(anrTime)); Slog.e(TAG, info.toString()); if (tracesFile == null) { // There is no trace file, so dump (only) the alleged culprit's threads to the log Process.sendSignal(pid, Process.SIGNAL_QUIT); } StatsLog.write(StatsLog.ANR_OCCURRED, uid, processName, activityShortComponentName == null ? "unknown": activityShortComponentName, annotation, (this.info != null) ? (this.info.isInstantApp() ? StatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE : StatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE) : StatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE, isInterestingToUserLocked() ? StatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND : StatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND, getProcessClassEnum(), (this.info != null) ? this.info.packageName : ""); final ProcessRecord parentPr = parentProcess != null ? (ProcessRecord) parentProcess.mOwner : null;
// 將traces文件 和 CPU使用率信息保存到dropbox,即data/system/dropbox目錄 mService.addErrorToDropBox("anr", this, processName, activityShortComponentName, parentShortComponentName, parentPr, annotation, cpuInfo, tracesFile, null); if (mWindowProcessController.appNotResponding(info.toString(), () -> kill("anr", true), () -> { synchronized (mService) { mService.mServices.scheduleServiceTimeoutLocked(this); } })) { return; } synchronized (mService) { // mBatteryStatsService can be null if the AMS is constructed with injector only. This // will only happen in tests. if (mService.mBatteryStatsService != null) { mService.mBatteryStatsService.noteProcessAnr(processName, uid); } // 殺死后台 anr 的進程 if (isSilentAnr() && !isDebugging()) {
kill("bg anr", true); return; } // Set the app's notResponding state, and look up the errorReportReceiver makeAppNotRespondingLocked(activityShortComponentName, annotation != null ? "ANR " + annotation : "ANR", info.toString()); // mUiHandler can be null if the AMS is constructed with injector only. This will only // happen in tests. if (mService.mUiHandler != null) { // Bring up the infamous App Not Responding dialog Message msg = Message.obtain(); msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG; msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem); // 發送 anr 彈窗信息 mService.mUiHandler.sendMessage(msg); } } }
/**
* Unless configured otherwise, swallow ANRs in background processes & kill the process.
* Non-private access is for tests only. 如果是后台 ANR 會被吞噬,不會提示 anr,
*/
@VisibleForTesting
boolean isSilentAnr() {
return !getShowBackground() && !isInterestingForBackgroundTraces();
}
當發生ANR時, 會按順序依次執行:
-
輸出ANR Reason信息到EventLog. 也就是說ANR觸發的時間點最接近的就是EventLog中輸出的am_anr信息;
-
收集並輸出重要進程列表中的各個線程的traces信息,該方法較耗時; 【見小節2】
-
輸出當前各個進程的CPU使用情況以及CPU負載情況;
-
將traces文件和 CPU使用情況信息保存到dropbox,即data/system/dropbox目錄
-
根據進程類型,來決定直接后台殺掉,還是彈框告知用戶.
ANR輸出重要進程的traces信息,這些進程包含:
-
firstPids隊列:第一個是ANR進程,第二個是system_server,剩余是所有persistent進程;
-
Native隊列:是指/system/bin/目錄的mediaserver,sdcard 以及surfaceflinger進程;
-
lastPids隊列: 是指mLruProcesses中的不屬於firstPids的所有進程。
下面看下收集各進程堆棧信息邏輯:
// AMS /** * If a stack trace dump file is configured, dump process stack traces. * @param firstPids of dalvik VM processes to dump stack traces for first * @param lastPids of dalvik VM processes to dump stack traces for last * @param nativePids optional list of native pids to dump stack crawls */ public static File dumpStackTraces(ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, ArrayList<Integer> nativePids) { ArrayList<Integer> extraPids = null; Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids); // Measure CPU usage as soon as we're called in order to get a realistic sampling // of the top users at the time of the request. if (processCpuTracker != null) { processCpuTracker.init(); try { Thread.sleep(200); // 等待 200ms } catch (InterruptedException ignored) { } // 測量CPU使用情況 processCpuTracker.update(); // We'll take the stack crawls of just the top apps using CPU. 收集 5 個最高使用 cpu 的 進程 final int N = processCpuTracker.countWorkingStats(); extraPids = new ArrayList<>(); for (int i = 0; i < N && extraPids.size() < 5; i++) { ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i); if (lastPids.indexOfKey(stats.pid) >= 0) { if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + stats.pid); extraPids.add(stats.pid); } else { Slog.i(TAG, "Skipping next CPU consuming process, not a java proc: " + stats.pid); } } } final File tracesDir = new File(ANR_TRACE_DIR); // Each set of ANR traces is written to a separate file and dumpstate will process // all such files and add them to a captured bug report if they're recent enough. 每一個 anr 都保存在單獨的文件中的 maybePruneOldTraces(tracesDir); // NOTE: We should consider creating the file in native code atomically once we've // gotten rid of the old scheme of dumping and lot of the code that deals with paths // can be removed. 創建 anr 文件 File tracesFile = createAnrDumpFile(tracesDir); if (tracesFile == null) { return null; } // 收集 anr 堆棧 dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids); return tracesFile; } // 創建 anr 文件 private static synchronized File createAnrDumpFile(File tracesDir) { if (sAnrFileDateFormat == null) { sAnrFileDateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss-SSS"); } final String formattedDate = sAnrFileDateFormat.format(new Date());
// anr 文件名是 anr_加上時間 final File anrFile = new File(tracesDir, "anr_" + formattedDate); ...return anrFile; }
// 收集堆棧邏輯 public static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids, ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) { Slog.i(TAG, "Dumping to " + tracesFile); // We don't need any sort of inotify based monitoring when we're dumping traces via // tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full // control of all writes to the file in question. // We must complete all stack dumps within 20 seconds. 在 20s 里面完成堆棧收集工作,未完成也會直接退出 long remainingTime = 20 * 1000; // First collect all of the stacks of the most important pids. 收集最重要的幾個進程的信息 if (firstPids != null) { int num = firstPids.size(); for (int i = 0; i < num; i++) { Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i)); final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile, remainingTime); remainingTime -= timeTaken; if (remainingTime <= 0) { Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) + "); deadline exceeded."); return; } } } // Next collect the stacks of the native pids 收集 native 堆棧 if (nativePids != null) { for (int pid : nativePids) { Slog.i(TAG, "Collecting stacks for native pid " + pid); final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime); final long start = SystemClock.elapsedRealtime(); Debug.dumpNativeBacktraceToFileTimeout( pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000)); final long timeTaken = SystemClock.elapsedRealtime() - start; remainingTime -= timeTaken;
... 超時則停止收集
} } // Lastly, dump stacks for all extra PIDs from the CPU tracker. 最后是前面最高的 5 個 if (extraPids != null) { for (int pid : extraPids) { Slog.i(TAG, "Collecting stacks for extra pid " + pid); final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime); remainingTime -= timeTaken; ... } } Slog.i(TAG, "Done dumping"); }
- 收集firstPids進程的stacks;
-
第一個是發生ANR進程;
-
第二個是system_server;
-
mLruProcesses中所有的persistent進程;
-
-
收集Native進程的stacks;(dumpNativeBacktraceToFile)
-
依次是mediaserver,sdcard,surfaceflinger進程;
-
-
收集lastPids進程的stacks;;
-
依次輸出CPU使用率top 5的進程;
-
七、總結
當出現ANR時,都是調用到AMS.appNotResponding()方法,當然這里介紹的 provider 例外.
Timeout時長
-
對於前台服務,則超時為SERVICE_TIMEOUT = 20s;
-
對於后台服務,則超時為SERVICE_BACKGROUND_TIMEOUT = 200s
-
對於前台廣播,則超時為BROADCAST_FG_TIMEOUT = 10s;
-
對於后台廣播,則超時為BROADCAST_BG_TIMEOUT = 60s;
-
ContentProvider超時為CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s;
超時檢測
Service超時檢測機制:
- 超過一定時間沒有執行完相應操作來觸發移除延時消息,則會觸發anr;
BroadcastReceiver超時檢測機制:
-
有序廣播的總執行時間超過 2* receiver個數 * timeout時長,則會觸發anr;
-
有序廣播的某一個receiver執行過程超過 timeout時長,則會觸發anr;
另外:
-
對於Service, Broadcast, Input發生ANR之后,最終都會調用AMS.appNotResponding;
-
對於provider,在其進程啟動時publish過程可能會出現ANR, 則會直接殺進程以及清理相應信息,而不會彈出ANR的對話框. appNotRespondingViaProvider()過程會走appNotResponding(), 這個就不介紹了,很少使用,由用戶自定義超時時間.
最后,真誠感謝 gityuan 的博客。
參考文章
http://gityuan.com/2016/12/02/app-not-response/
http://gityuan.com/2016/07/02/android-anr/