系统触发ANR机制
如系统显示ANR机制分析,都是通过appNotResponding
来触发ANR对话框的显示,我们通过索引搜索可以发现,总共有五个地方可以产生这个调用,这5个调用正是引起ANR的5个原因。
-
com.android.server.am.ActivityManagerService#appNotRespondingViaProvider : ContentProvider 超时,目前没有使用。
-
com.android.server.am.ActiveServices#serviceForegroundTimeout: 应用调用
startForegroundService()
启动的service
必须在com.android.server.am.ActiveServices#SERVICE_START_FOREGROUND_TIMEOUT
也就是5s之内成功,否则发生ANR。 -
com.android.server.am.ActiveServices#serviceTimeoutService 启动超时,如果是前台进程启动的service,不能超过20s,如果是后台进程启动的service,不能超过200s。
-
com.android.server.am.BroadcastQueue#broadcastTimeoutLocked : 广播引起的ANR,在Android系统中,存在两个广播队列。
-
com.android.server.am.ActivityManagerService#inputDispatchingTimedOut(): 输入事件分发延时,超过5s。
广播可以超时的时间如图所示:
ContentPrivider超时触发ANR机制
发生场景
ContentPrivider的ANR不会发生在非系统应用侧,相关机制需要系统权限**android.permission.REMOVE_TASKS**
** **才会触发。如果读者非Android平台应用开发者,平时就不会遇到此类问题。
Android 11的代码来看,主要包含场景:
系统调用ContentProvider的getType耗时:com.android.server.am.ActivityManagerService#getProviderMimeType, 此功能已经在Android API 29过时。本文不在详细解释。- ContentProvider的默认抽象方法执行耗时。查阅代码可知,如下子方法超时都会发生ANR.
- query(Uri, String[], Bundle, CancellationSignal)
- getType(Uri)
- getStreamTypes(Uri, String)
- canonicalize(Uri)
- uncanonicalize(Uri)
- refresh(Uri, Bundle, CancellationSignal)
- checkUriPermission(Uri, int, int)
- insert(Uri, ContentValues, Bundle)
- bulkInsert(Uri, ContentValues[])
- delete(Uri, Bundle)
- update(Uri, ContentValues, Bundle)
- openFile(Uri, String, CancellationSignal)
- openAssetFile(Uri, String, CancellationSignal)
- openTypedAssetFile(Uri, String, Bundle, CancellationSignal)
- applyBatch(String, ArrayList)
- call(String, String, String, Bundle)
接下来分析原理逻辑步骤。
设置超时
如果ContentProvider要产生ANR,需要应用手动调用 android.content.ContentProviderClient#setDetectNotResponding
来设置超时的时间。方法逻辑:
- 创建mAnrRunnable,用来触发ANR显示逻辑。
- 因为此时触发了ANR,允许BinderBlocking,不杀死当前进程。
public void setDetectNotResponding(@DurationMillisLong long timeoutMillis) {
synchronized (ContentProviderClient.class) {
mAnrTimeout = timeoutMillis;
if (timeoutMillis > 0) {
if (mAnrRunnable == null) {
mAnrRunnable = new NotRespondingRunnable();
}
if (sAnrHandler == null) {
sAnrHandler = new Handler(Looper.getMainLooper(), null, true /* async */);
}
// If the remote process hangs, we're going to kill it, so we're
// technically okay doing blocking calls.
Binder.allowBlocking(mContentProvider.asBinder());
} else {
mAnrRunnable = null;
// If we're no longer watching for hangs, revert back to default
// blocking behavior.
Binder.defaultBlocking(mContentProvider.asBinder());
}
}
}
超时原理
获取类型
为了方便,本文还是以 android.content.ContentProviderClient#getType
来分析,代码如下:
- 执行远程操作之前,开始执行AnrRunnable
- 开始执行远程的getType方法
- 取消AnrRunnable事件
/** See {@link ContentProvider#getType ContentProvider.getType} */
@Override
public @Nullable String getType(@NonNull Uri url) throws RemoteException {
Objects.requireNonNull(url, "url");
beforeRemote();
try {
return mContentProvider.getType(url);
} catch (DeadObjectException e) {
if (!mStable) {
mContentResolver.unstableProviderDied(mContentProvider);
}
throw e;
} finally {
afterRemote();
}
}
执行远程操作前beforeRemote
private void beforeRemote() {
if (mAnrRunnable != null) {
sAnrHandler.postDelayed(mAnrRunnable, mAnrTimeout);
}
}
执行远程操作后
private void afterRemote() {
if (mAnrRunnable != null) {
sAnrHandler.removeCallbacks(mAnrRunnable);
}
}
触发ANR通知机制
如果设置的超时时间内没有执行完成相关操作,那么就会触发系统显示ANR的机制。
private class NotRespondingRunnable implements Runnable {
@Override
public void run() {
Log.w(TAG, "Detected provider not responding: " + mContentProvider);
mContentResolver.appNotRespondingViaProvider(mContentProvider);
}
}
Service超时触发ANR机制
发生场景
Android 11的代码来看,主要包含场景:
-
普通Service的生命周期函数里面发生超时。
- onCreate
- onStart
- onStartCommand
- onDestroy
- onBind
- onUnbind
- onRebind
-
前台Service延时场景。
- 如果targetSdkVersion 低于Android O,不会发生前台服务的ANR。
- 通过android.content.ContextWrapper#startForegroundService启动的服务。但是在10s之内没有执行:android.app.Service#startForeground(int, android.app.Notification)
超时时间
Service超时总共有三类超时:
- 前台应用启动的服务超时时间:20s
- 后台进程启动的服务超时时间: 200s
- 前台服务超时:10s,通过startForegroundService启动的服务。需要10s内完成。
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
// How long the startForegroundService() grace period is to get around to
// calling startForeground() before we ANR + stop it.
static final int SERVICE_START_FOREGROUND_TIMEOUT = 10*1000;
普通Service超时原理
Service的超时也是通过Handler发送Message,如果固定时间内,消息还存在于当前的消息队列,就意味着发生了ANR,主要是控制如下两个方法之间的调用时长是否超过了设置的对应的延时。
- com.android.server.am.ActiveServices#bumpServiceExecutingLocked 开始执行超时计划。
- com.android.server.am.ActiveServices#serviceDoneExecutingLocked(com.android.server.am.ServiceRecord, boolean, boolean) 完成计划。
bumpServiceExecutingLocked
当系统开始正真的执行Service的时候,会调用com.android.server.am.ActiveServices#bumpServiceExecutingLocked
方法:
- 如果当前的系统启动阶段还没有达到PHASE_THIRD_PARTY_APPS_CAN_START,不发生ANR。
- 如果当前可执行的服务个数等于1,开始计划Service超时
scheduleServiceTimeoutLocked
(延时之后发送Service超时的消息)。 - 如果是后台进程启动的服务,更改当前的ANR超时时长为200ms。
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, ">>> EXECUTING "
+ why + " of " + r + " in app " + r.app);
else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING, ">>> EXECUTING "
+ why + " of " + r.shortInstanceName);
// For b/34123235: Services within the system server won't start until SystemServer
// does Looper.loop(), so we shouldn't try to start/bind to them too early in the boot
// process. However, since there's a little point of showing the ANR dialog in that case,
// let's suppress the timeout until PHASE_THIRD_PARTY_APPS_CAN_START.
//
// (Note there are multiple services start at PHASE_THIRD_PARTY_APPS_CAN_START too,
// which technically could also trigger this timeout if there's a system server
// that takes a long time to handle PHASE_THIRD_PARTY_APPS_CAN_START, but that shouldn't
// happen.)
boolean timeoutNeeded = true;
if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START)
&& (r.app != null)