寫在前面
我們的系統可能因為正在部署、服務異常終止或者其他問題導致系統處於非健康狀態,這個時候我們需要知道系統的健康狀況,而健康檢查可以幫助我們快速確定系統是否處於正常狀態。一般情況下,我們會提供公開的HTTP接口,用於專門化健康檢查。
NET Core提供的健康檢查庫包括Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions和Microsoft.Extensions.Diagnostics.HealthChecks。這兩個庫共同為我們提供了最基礎的健康檢查的解決方案,后面擴展的組件主要有下面幾個,本文不作其他說明。
AspNetCore.HealthChecks.System
AspNetCore.HealthChecks.Network
AspNetCore.HealthChecks.SqlServer
AspNetCore.HealthChecks.MongoDb
AspNetCore.HealthChecks.Npgsql
AspNetCore.HealthChecks.Redis
AspNetCore.HealthChecks.AzureStorage
AspNetCore.HealthChecks.AzureServiceBus
AspNetCore.HealthChecks.MySql
AspNetCore.HealthChecks.DocumentDb
AspNetCore.HealthChecks.SqLite
AspNetCore.HealthChecks.Kafka
AspNetCore.HealthChecks.RabbitMQ
AspNetCore.HealthChecks.IdSvr
AspNetCore.HealthChecks.DynamoDB
AspNetCore.HealthChecks.Oracle
AspNetCore.HealthChecks.Uris
源碼探究
Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions是.NET Core健康檢查的抽象基礎,從中我們可以看出這個庫的設計意圖。它提供了一個統一的接口IHealthCheck,用於檢查應用程序中各個被監控組件的狀態,包括后台服務、數據庫等。這個接口只有一個方法CheckHealthAsync,
該方法有一個參數是HealthCheckContext,它表示當前健康檢查執行時所關聯的上下文對象,它的返回值HealthCheckResult表示當前健康檢查結束后所產生的被監控組件的運行狀態。
源碼如下所示:
1: public interface IHealthCheck
2: {
3: Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default);
4: }
HealthCheckRegistration
HealthCheckContext里面只有一個成員就是HealthCheckRegistration實例。
而HealthCheckRegistration是一個相當重要的對象,它體現了健康檢查需要關注和注意的地方,其內部涉及到五個屬性,分別用於:
- 標識健康檢查名稱
- 創建IHealthCheck實例
- 健康檢查的超時時間(防止我們因為健康檢查而過多占用資源)
- 失敗狀態標識
- 一個標簽集合(可用於健康檢查過濾)
這五個屬性的相關源碼如下:
1: public Func<IServiceProvider, IHealthCheck> Factory
2: {
3: get => _factory;
4: set
5: {
6: if (value == null)
7: {
8: throw new ArgumentNullException(nameof(value));
9: }
10:
11: _factory = value;
12: }
13: }
14:
15: public HealthStatus FailureStatus { get; set; }
16:
17: public TimeSpan Timeout
18: {
19: get => _timeout;
20: set
21: {
22: if (value <= TimeSpan.Zero && value != System.Threading.Timeout.InfiniteTimeSpan)
23: {
24: throw new ArgumentOutOfRangeException(nameof(value));
25: }
26:
27: _timeout = value;
28: }
29: }
30:
31: public string Name
32: {
33: get => _name;
34: set
35: {
36: if (value == null)
37: {
38: throw new ArgumentNullException(nameof(value));
39: }
40:
41: _name = value;
42: }
43: }
44:
45: public ISet<string> Tags { get; }
HealthCheckResult
HealthCheckResult是一個結構體,可以看出這里更多的是基於承擔數據存儲和性能問題的考量。
HealthCheckResult用於表示健康檢查的相關結果信息,同樣的,通過該類,我們知道了健康檢查需要關注的幾個點:
- 組件的當前狀態
- 異常信息
- 友好的描述信息(不管是異常還是正常)
- 額外可描述當前組件的鍵值對,這是一個開放式的屬性,方面我們記錄更多信息
該類含有四個公共屬性,和三個方法,相關源碼如下:
1: public struct HealthCheckResult
2: {
3: private static readonly IReadOnlyDictionary<string, object> _emptyReadOnlyDictionary = new Dictionary<string, object>();
4:
5: public HealthCheckResult(HealthStatus status, string description = null, Exception exception = null, IReadOnlyDictionary<string, object> data = null)
6: {
7: Status = status;
8: Description = description;
9: Exception = exception;
10: Data = data ?? _emptyReadOnlyDictionary;
11: }
12:
13: public IReadOnlyDictionary<string, object> Data { get; }
14:
15: public string Description { get; }
16:
17: public Exception Exception { get; }
18:
19: public HealthStatus Status { get; }
20:
21: public static HealthCheckResult Healthy(string description = null, IReadOnlyDictionary<string, object> data = null)
22: {
23: return new HealthCheckResult(status: HealthStatus.Healthy, description, exception: null, data);
24: }
25:
26: public static HealthCheckResult Degraded(string description = null, Exception exception = null, IReadOnlyDictionary<string, object> data = null)
27: {
28: return new HealthCheckResult(status: HealthStatus.Degraded, description, exception: exception, data);
29: }
30:
31: public static HealthCheckResult Unhealthy(string description = null, Exception exception = null, IReadOnlyDictionary<string, object> data = null)
32: {
33: return new HealthCheckResult(status: HealthStatus.Unhealthy, description, exception, data);
34: }
35: }
可以看出這個三個方法都是基於HealthStatus這個枚舉而創建不同狀態的HealthCheckResult實例,這個枚舉表達了健康檢查需要關注的幾種狀態,健康、異常以及降級。
HealthStatus的源碼如下:
1: public enum HealthStatus
2: {
3: Unhealthy = 0,
4:
5: Degraded = 1,
6:
7: Healthy = 2,
8: }
IHealthCheckPublisher
健康檢查功能本質上是一種輪詢功能,需要定期執行,.NET Core 抽象定期執行的接口,即IHealthCheckPublisher,我們可以通過實現這個接口,並與我們自定義的定時功能相結合。
同時,作為一次健康檢查,我們還需要關注相關的健康檢查報告,那么我們需要關注那些點呢?
- 額外可描述當前組件的鍵值對,這是一個開放式的屬性,方面我們記錄更多信息
- 友好的描述信息(不管是異常還是正常)
- 組件的當前狀態
- 異常信息
- 當前這次檢查所耗費的時間
- 相關的標簽信息
HealthReportEntry表示單個健康檢查報告,HealthReport表示一組健康檢查報告。HealthReport內部維護了一個HealthReportEntry的字典數據,HealthReport源碼如下所示:
1: public sealed class HealthReport
2: {
3: public HealthReport(IReadOnlyDictionary<string, HealthReportEntry> entries, TimeSpan totalDuration)
4: {
5: Entries = entries;
6: Status = CalculateAggregateStatus(entries.Values);
7: TotalDuration = totalDuration;
8: }
9:
10: public IReadOnlyDictionary<string, HealthReportEntry> Entries { get; }
11:
12: public HealthStatus Status { get; }
13:
14: public TimeSpan TotalDuration { get; }
15:
16: private HealthStatus CalculateAggregateStatus(IEnumerable<HealthReportEntry> entries)
17: {
18: var currentValue = HealthStatus.Healthy;
19: foreach (var entry in entries)
20: {
21: if (currentValue > entry.Status)
22: {
23: currentValue = entry.Status;
24: }
25:
26: if (currentValue == HealthStatus.Unhealthy)
27: {
28: // Game over, man! Game over!
29: // (We hit the worst possible status, so there's no need to keep iterating)
30: return currentValue;
31: }
32: }
33:
34: return currentValue;
35: }
36: }
總結
通過以上內容,我們知道了,一個完整的健康檢查需要關注健康檢查上下文、健康狀態的維護、健康檢查結果、健康檢查報告,同時,為了更好的維護健康檢查,我們可以將健康檢查發布抽象出來,並與外部的定時器相結合,共同守護健康檢查程序。