skywalking告警篇詳細分析(二)


https://blog.csdn.net/feiying0canglang/article/details/121562890

http://www.manongjc.com/detail/26-asnfhftlcafxjai.html

網上看了很多,發現對於Skywalking支持哪些指標名稱metrics,官方文檔跟博客幾乎都是指明了一個路徑,沒有人詳細的解釋,支持哪些指標,這些指標的作用又有什么作用,導致大家自定義指標的時候有很多困難。

所以這里給大家總結下,如有錯誤,及時指正:

Skywalking的oap指標存放在:/apache-skywalking-apm-bin-es78/config/oal/*.oap 目錄下

先來看第一個oap文件:

core.oal

1 / All scope metrics
 2 all_percentile = from(All.latency).percentile(10);  // Multiple values including p50, p75, p90, p95, p99
 3 all_heatmap = from(All.latency).histogram(100, 20); // 
 4 
 5 // Service scope metrics 服務
 6 service_resp_time = from(Service.latency).longAvg(); // 服務的平均響應時間
 7 service_sla = from(Service.*).percent(status == true); // 服務的請求成功率
 8 service_cpm = from(Service.*).cpm(); //服務的每分鍾調用次數
 9 service_percentile = from(Service.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99
10 service_apdex = from(Service.latency).apdex(name, status); // 服務的應用性能指標,apdex的衡量的是衡量滿意的響應時間與不滿意的響應時間的比率,默認的請求滿意時間是500ms
11 
12 // Service relation scope metrics for topology 服務與服務間調用的調用度量指標
13 service_relation_client_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.CLIENT).cpm();//在客戶端檢測到的每分鍾調用次數
14 service_relation_server_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();//在服務端檢測到的每分鍾調用的次數
15 service_relation_client_call_sla = from(ServiceRelation.*).filter(detectPoint == DetectPoint.CLIENT).percent(status == true);//在客戶端檢測到成功率
16 service_relation_server_call_sla = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);//在服務端檢測到的成功率
17 service_relation_client_resp_time = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).longAvg();//在客戶端檢測到的平均響應時間
18 service_relation_server_resp_time = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.SERVER).longAvg();//在服務端檢測到的平均響應時間
19 service_relation_client_percentile = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).percentile(10); // Multiple values including p50, p75, p90, p95, p99
20 service_relation_server_percentile = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99
21 
22 // Service Instance relation scope metrics for topology 服務實例與服務實例之間的調用度量指標
23 service_instance_relation_client_cpm = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.CLIENT).cpm();//在客戶端實例檢測到的每分鍾調用次數
24 service_instance_relation_server_cpm = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();//在服務端實例檢測到的每分鍾調用次數
25 service_instance_relation_client_call_sla = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.CLIENT).percent(status == true);//在客戶端實例檢測到的成功率
26 service_instance_relation_server_call_sla = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);//在服務端實例檢測到的成功率
27 service_instance_relation_client_resp_time = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).longAvg();//在客戶端實例檢測到的平均響應時間
28 service_instance_relation_server_resp_time = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.SERVER).longAvg();//在服務端實例檢測到的平均響應時間
29 service_instance_relation_client_percentile = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).percentile(10); // Multiple values including p50, p75, p90, p95, p99
30 service_instance_relation_server_percentile = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99
31 
32 // Service Instance Scope metrics
33 service_instance_sla = from(ServiceInstance.*).percent(status == true);//服務實例的成功率
34 service_instance_resp_time= from(ServiceInstance.latency).longAvg();//服務實例的平均響應時間
35 service_instance_cpm = from(ServiceInstance.*).cpm();//服務實例的每分鍾調用次數
36 
37 // Endpoint scope metrics
38 endpoint_cpm = from(Endpoint.*).cpm();//端點的每分鍾調用次數
39 endpoint_avg = from(Endpoint.latency).longAvg();//端口平均響應時間
40 endpoint_sla = from(Endpoint.*).percent(status == true);//端點的成功率
41 endpoint_percentile = from(Endpoint.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99
42 
43 // Endpoint relation scope metrics
44 endpoint_relation_cpm = from(EndpointRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();//在服務端端點檢測到的每分鍾調用次數
45 endpoint_relation_resp_time = from(EndpointRelation.rpcLatency).filter(detectPoint == DetectPoint.SERVER).longAvg();//在服務端檢測到的rpc調用的平均耗時
46 endpoint_relation_sla = from(EndpointRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);//在服務端檢測到的請求成功率
47 endpoint_relation_percentile = from(EndpointRelation.rpcLatency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99
48 
49 database_access_resp_time = from(DatabaseAccess.latency).longAvg();//數據庫的處理平均響應時間
50 database_access_sla = from(DatabaseAccess.*).percent(status == true);//數據庫的請求成功率
51 database_access_cpm = from(DatabaseAccess.*).cpm();//數據庫的每分鍾調用次數
52 database_access_percentile = from(DatabaseAccess.latency).percentile(10);

java-agent.oal

// JVM instance metrics
instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg();//jvm 平均cpu耗時百分比
instance_jvm_memory_heap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == true).longAvg();//jvm 堆空間的平均使用空間
instance_jvm_memory_noheap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == false).longAvg();//jvm 非堆空間的平均使用空間
instance_jvm_memory_heap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == true).longAvg();//jvm 最大堆內存的平均值
instance_jvm_memory_noheap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == false).longAvg();//jvm 最大非堆內存的平均值
instance_jvm_young_gc_time = from(ServiceInstanceJVMGC.time).filter(phrase == GCPhrase.NEW).sum();//年輕代gc的耗時
instance_jvm_old_gc_time = from(ServiceInstanceJVMGC.time).filter(phrase == GCPhrase.OLD).sum();//老年代gc的耗時
instance_jvm_young_gc_count = from(ServiceInstanceJVMGC.count).filter(phrase == GCPhrase.NEW).sum();//年輕代gc的次數
instance_jvm_old_gc_count = from(ServiceInstanceJVMGC.count).filter(phrase == GCPhrase.OLD).sum();//老年代gc的次數
instance_jvm_thread_live_count = from(ServiceInstanceJVMThread.liveCount).longAvg();//存活的線程數
instance_jvm_thread_daemon_count = from(ServiceInstanceJVMThread.daemonCount).longAvg();//守護線程數
instance_jvm_thread_peak_count = from(ServiceInstanceJVMThread.peakCount).longAvg();//峰值線程數

  

告警的設置

rules:
    # 告警規則 名稱唯一 必須以_rule 結尾
  service_resp_time_rule:
      # 度量名稱,只支持int long double
    metrics-name: service_resp_time
    # 操作符
    op: ">"
    # 閾值 ms
    threshold: 1000
    # 評估度量的時間長度
    period: 10
    # 度量有多少次符合告警條件后,才會觸發告警
    count: 2
    # 靜默時間 默認情況下,它和周期一樣,在同一個周期內只會觸發一次。
    silence-period: 10
    message: 服務【{name}】的平均響應時間在最近10分鍾內有2分鍾超過1秒
  service_sla_rule:
    metrics-name: service_sla
    op: "<"
    threshold: 8000
    period: 10
    count: 2
    silence-period: 10
    message: 服務【{name}】的成功率在最近10分鍾內有2分鍾低於80%
composite-rules:
  # 規則名稱:在告警信息中顯示的唯一名稱,必須以_rule結尾
  comp_rule:
    # 指定如何組成規則,支持&&, ||, ()操作符
    expression: service_resp_time_rule && service_sla_rule
    message: 服務【{name}】在最近10分鍾內有2分鍾平均響應時間超過1秒並且成功率低於80%

本文介紹SkyWalking的OAL語法的用法。

官網

OAL介紹

https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md

OAL規則語法:https://github.com/apache/skywalking/blob/master/docs/en/concepts-and-designs/oal.md

范圍和字段:https://github.com/apache/skywalking/blob/master/docs/en/concepts-and-designs/scope-definitions.md

OAL簡介
SkyWalking從8.0.0開始支持OAL腳本,它所在路徑為:config/oal/*.oal。我們可以修改它,比如:添加過濾條件或者新的衡量標准,重啟OAP生效。

Apache SkyWalking告警是由一組規則驅動,這些規則定義在config/alarm-settings.yml文件中,alarm-settings.yml中的rules.xxx_rule.metrics-name對應的是config/oal路徑下的配置文件中的詳細規則:core.oal、event.oal,java-agent.oal, browser.oal。

endpoint 規則相比 service、instance 規則耗費更多內存及資源。

OAL(Observability Analysis Language):觀測分析語言。

在流模式(Streaming mode)下,SkyWalking 提供了OAL來分析流入的數據。OAL 聚焦於服務,服務實例以及端點的度量指標,因此 OAL 非常易於學習和使用。

6.3版本以后,OAL引擎嵌入在OAP服務器運行時中,稱為oal-rt(OAL運行時)。OAL腳本現在位於/config文件夾,用戶可以簡單地改變和重新啟動服務器,使其有效。

但是,OAL腳本仍然是編譯語言,OAL運行時動態生成Java代碼。您可以在系統環境上設置SW_OAL_ENGINE_DEBUG=Y,查看生成了哪些類。

配置示例
// 計算Endpoint1 和 Endpoint2 的 p99。
endpoint_p99 = from(Endpoint.latency).filter(name in ("Endpoint1", "Endpoint2")).summary(0.99)

// 計算以“serv”開頭的端點名字的 p99。
serv_Endpoint_p99 = from(Endpoint.latency).filter(name like "serv%").summary(0.99)

// 計算每個端點的響應平均時長
endpoint_avg = from(Endpoint.latency).avg()

// 計算每個端點 p50,p75,p90,p95 and p99 的延遲柱狀圖,每隔 50 毫秒一條柱
endpoint_percentile = from(Endpoint.latency).percentile(10)

// 統計每個服務響應狀態為 true 的百分比
endpoint_success = from(Endpoint.*).filter(status == true).percent()

// 計算每個服務的響應碼為[404, 500, 503]的總和
endpoint_abnormal = from(Endpoint.*).filter(responseCode in [404, 500, 503]).count()

// 計算每個服務的請求類型為[PRC, gRPC]的總和
endpoint_rpc_calls_sum = from(Endpoint.*).filter(type in [RequestType.PRC, RequestType.gRPC]).sum()

// 計算每個端點的端點名稱為["/v1", "/v2"]的總和
endpoint_url_sum = from(Endpoint.*).filter(endpointName in ["/v1", "/v2"]).sum()

// 統計每個服務的調用總量
endpoint_calls = from(Endpoint.*).count()

// 計算每個服務的GET方法的CPM。值的組成為:`tagKey:tagValue`.
// 方案1, 使用`tags contain`.
service_cpm_http_get = from(Service.*).filter(tags contain "http.method:GET").cpm()
// 方案2, 使用 `tag[key]`.
service_cpm_http_get = from(Service.*).filter(tag["http.method"] == "GET").cpm();

// 計算每個服務的除了GET的方法的CPM。值的組成為:`tagKey:tagValue`.
service_cpm_http_other = from(Service.*).filter(tags not contain "http.method:GET").cpm()

// 計算瀏覽應用的錯誤率。分子是FIRST_ERROR,分母是NORMAL
browser_app_error_rate = from(BrowserAppTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR, trafficCategory == BrowserAppTrafficCategory.NORMAL);

disable(segment);
disable(endpoint_relation_server_side);
disable(top_n_database_statement);

默認的配置
config/oal/core.oal

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

// For services using protocols HTTP 1/2, gRPC, RPC, etc., the cpm metrics means "calls per minute",
// for services that are built on top of TCP, the cpm means "packages per minute".

// All scope metrics
all_percentile = from(All.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99
all_heatmap = from(All.latency).histogram(100, 20);

// Service scope metrics
service_resp_time = from(Service.latency).longAvg();
service_sla = from(Service.*).percent(status == true);
service_cpm = from(Service.*).cpm();
service_percentile = from(Service.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99
service_apdex = from(Service.latency).apdex(name, status);

// Service relation scope metrics for topology
service_relation_client_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.CLIENT).cpm();
service_relation_server_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();
service_relation_client_call_sla = from(ServiceRelation.*).filter(detectPoint == DetectPoint.CLIENT).percent(status == true);
service_relation_server_call_sla = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);
service_relation_client_resp_time = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).longAvg();
service_relation_server_resp_time = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.SERVER).longAvg();
service_relation_client_percentile = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).percentile(10); // Multiple values including p50, p75, p90, p95, p99
service_relation_server_percentile = from(ServiceRelation.latency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99

// Service Instance relation scope metrics for topology
service_instance_relation_client_cpm = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.CLIENT).cpm();
service_instance_relation_server_cpm = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();
service_instance_relation_client_call_sla = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.CLIENT).percent(status == true);
service_instance_relation_server_call_sla = from(ServiceInstanceRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);
service_instance_relation_client_resp_time = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).longAvg();
service_instance_relation_server_resp_time = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.SERVER).longAvg();
service_instance_relation_client_percentile = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.CLIENT).percentile(10); // Multiple values including p50, p75, p90, p95, p99
service_instance_relation_server_percentile = from(ServiceInstanceRelation.latency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99

// Service Instance Scope metrics
service_instance_sla = from(ServiceInstance.*).percent(status == true);
service_instance_resp_time= from(ServiceInstance.latency).longAvg();
service_instance_cpm = from(ServiceInstance.*).cpm();

// Endpoint scope metrics
endpoint_cpm = from(Endpoint.*).cpm();
endpoint_avg = from(Endpoint.latency).longAvg();
endpoint_sla = from(Endpoint.*).percent(status == true);
endpoint_percentile = from(Endpoint.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99

// Endpoint relation scope metrics
endpoint_relation_cpm = from(EndpointRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();
endpoint_relation_resp_time = from(EndpointRelation.rpcLatency).filter(detectPoint == DetectPoint.SERVER).longAvg();
endpoint_relation_sla = from(EndpointRelation.*).filter(detectPoint == DetectPoint.SERVER).percent(status == true);
endpoint_relation_percentile = from(EndpointRelation.rpcLatency).filter(detectPoint == DetectPoint.SERVER).percentile(10); // Multiple values including p50, p75, p90, p95, p99

database_access_resp_time = from(DatabaseAccess.latency).longAvg();
database_access_sla = from(DatabaseAccess.*).percent(status == true);
database_access_cpm = from(DatabaseAccess.*).cpm();
database_access_percentile = from(DatabaseAccess.latency).percentile(10);

config/oal/event.oal

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

event_total = from(Event.*).count();

event_normal_count = from(Event.*).filter(type == "Normal").count();
event_error_count = from(Event.*).filter(type == "Error").count();

event_start_count = from(Event.*).filter(name == "Start").count();
event_shutdown_count = from(Event.*).filter(name == "Shutdown").count();

config/oal/java-agent.oal

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

// JVM instance metrics
instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg();
instance_jvm_memory_heap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == true).longAvg();
instance_jvm_memory_noheap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == false).longAvg();
instance_jvm_memory_heap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == true).longAvg();
instance_jvm_memory_noheap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == false).longAvg();
instance_jvm_young_gc_time = from(ServiceInstanceJVMGC.time).filter(phrase == GCPhrase.NEW).sum();
instance_jvm_old_gc_time = from(ServiceInstanceJVMGC.time).filter(phrase == GCPhrase.OLD).sum();
instance_jvm_young_gc_count = from(ServiceInstanceJVMGC.count).filter(phrase == GCPhrase.NEW).sum();
instance_jvm_old_gc_count = from(ServiceInstanceJVMGC.count).filter(phrase == GCPhrase.OLD).sum();
instance_jvm_thread_live_count = from(ServiceInstanceJVMThread.liveCount).longAvg();
instance_jvm_thread_daemon_count = from(ServiceInstanceJVMThread.daemonCount).longAvg();
instance_jvm_thread_peak_count = from(ServiceInstanceJVMThread.peakCount).longAvg();
instance_jvm_thread_runnable_state_thread_count = from(ServiceInstanceJVMThread.runnableStateThreadCount).longAvg();
instance_jvm_thread_blocked_state_thread_count = from(ServiceInstanceJVMThread.blockedStateThreadCount).longAvg();
instance_jvm_thread_waiting_state_thread_count = from(ServiceInstanceJVMThread.waitingStateThreadCount).longAvg();
instance_jvm_thread_timed_waiting_state_thread_count = from(ServiceInstanceJVMThread.timedWaitingStateThreadCount).longAvg();
instance_jvm_class_loaded_class_count = from(ServiceInstanceJVMClass.loadedClassCount).longAvg();
instance_jvm_class_total_unloaded_class_count = from(ServiceInstanceJVMClass.totalUnloadedClassCount).longAvg();
instance_jvm_class_total_loaded_class_count = from(ServiceInstanceJVMClass.totalLoadedClassCount).longAvg();

config/oal/browser.oal

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
// browser app
browser_app_pv = from(BrowserAppTraffic.count).filter(trafficCategory == BrowserAppTrafficCategory.NORMAL).sum();
browser_app_error_rate = from(BrowserAppTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR,trafficCategory == BrowserAppTrafficCategory.NORMAL);
browser_app_error_sum = from(BrowserAppTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).sum();

// browser app single version
browser_app_single_version_pv = from(BrowserAppSingleVersionTraffic.count).filter(trafficCategory == BrowserAppTrafficCategory.NORMAL).sum();
browser_app_single_version_error_rate = from(BrowserAppSingleVersionTraffic.trafficCategory).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR,trafficCategory == BrowserAppTrafficCategory.NORMAL);
browser_app_single_version_error_sum = from(BrowserAppSingleVersionTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).sum();

// browser app page
browser_app_page_pv = from(BrowserAppPageTraffic.count).filter(trafficCategory == BrowserAppTrafficCategory.NORMAL).sum();
browser_app_page_error_rate = from(BrowserAppPageTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR,trafficCategory == BrowserAppTrafficCategory.NORMAL);
browser_app_page_error_sum = from(BrowserAppPageTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).sum();

browser_app_page_ajax_error_sum = from(BrowserAppPageTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).filter(errorCategory == BrowserErrorCategory.AJAX).sum();
browser_app_page_resource_error_sum = from(BrowserAppPageTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).filter(errorCategory == BrowserErrorCategory.RESOURCE).sum();
browser_app_page_js_error_sum = from(BrowserAppPageTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).filter(errorCategory in [BrowserErrorCategory.JS,BrowserErrorCategory.VUE,BrowserErrorCategory.PROMISE]).sum();
browser_app_page_unknown_error_sum = from(BrowserAppPageTraffic.count).filter(trafficCategory != BrowserAppTrafficCategory.NORMAL).filter(errorCategory == BrowserErrorCategory.UNKNOWN).sum();

// browser performance metrics
browser_app_page_redirect_avg = from(BrowserAppPagePerf.redirectTime).longAvg();
browser_app_page_dns_avg = from(BrowserAppPagePerf.dnsTime).longAvg();
browser_app_page_ttfb_avg = from(BrowserAppPagePerf.ttfbTime).longAvg();
browser_app_page_tcp_avg = from(BrowserAppPagePerf.tcpTime).longAvg();
browser_app_page_trans_avg = from(BrowserAppPagePerf.transTime).longAvg();
browser_app_page_dom_analysis_avg = from(BrowserAppPagePerf.domAnalysisTime).longAvg();
browser_app_page_fpt_avg = from(BrowserAppPagePerf.fptTime).longAvg();
browser_app_page_dom_ready_avg = from(BrowserAppPagePerf.domReadyTime).longAvg();
browser_app_page_load_page_avg = from(BrowserAppPagePerf.loadPageTime).longAvg();
browser_app_page_res_avg = from(BrowserAppPagePerf.resTime).longAvg();
browser_app_page_ssl_avg = from(BrowserAppPagePerf.sslTime).longAvg();
browser_app_page_ttl_avg = from(BrowserAppPagePerf.ttlTime).longAvg();
browser_app_page_first_pack_avg = from(BrowserAppPagePerf.firstPackTime).longAvg();
browser_app_page_fmp_avg = from(BrowserAppPagePerf.fmpTime).longAvg();

browser_app_page_fpt_percentile = from(BrowserAppPagePerf.fptTime).percentile(10);
browser_app_page_ttl_percentile = from(BrowserAppPagePerf.ttlTime).percentile(10);
browser_app_page_dom_ready_percentile = from(BrowserAppPagePerf.domReadyTime).percentile(10);
browser_app_page_load_page_percentile = from(BrowserAppPagePerf.loadPageTime).percentile(10);
browser_app_page_first_pack_percentile = from(BrowserAppPagePerf.firstPackTime).percentile(10);
browser_app_page_fmp_percentile = from(BrowserAppPagePerf.fmpTime).percentile(10);

// Disable unnecessary hard core stream, targeting @Stream#name
/
//disable(browser_error_log);

OAL語法
OAL 腳本文件應該以 .oal 為后綴。

// Declare the metrics.
METRICS_NAME = from(SCOPE.(* | [FIELD][,FIELD ...]))
[.filter(FIELD OP [INT | STRING])]
.FUNCTION([PARAM][, PARAM ...])

// Disable hard code
disable(METRICS_NAME);
域(Scope)
域包括全局(All)、服務(Service)、服務實例(Service Instance)、端點(Endpoint)、服務關系(Service Relation)、服務實例關系(Service Instance Relation)、端點關系(Endpoint Relation)。

當然還有一些字段,他們都屬於以上某個域。

過濾器(Filter)
使用在使用過濾器的時候,通過指定字段名或表達式來構建字段值的過濾條件。

表達式可以使用 and,or 和 () 進行組合。

操作符包含==,!=,>,<,>=,<=,in [...],like %...,like ...%,like %...%,他們可以基於字段類型進行類型檢測,

如果類型不兼容會在編譯/代碼生成期間報錯。

聚合函數(Aggregation Function)
默認的聚合函數由 SkyWalking OAP 核心實現。並可自由擴展更多函數。

提供的函數:

longAvg:某個域實體所有輸入的平均值,輸入字段必須是 long 類型。

instance_jvm_memory_max = from(ServiceInstanceJVMMemory.max).longAvg();
在上面的例子中,輸入是 ServiceInstanceJVMMemory 域的每個請求,平均值是基於字段 max 進行求值的。

doubleAvg:某個域實體的所有輸入的平均值,輸入的字段必須是 double 類型。

​​​​​​​instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg();
在上面的例子中,輸入是 ServiceInstanceJVMCPU 域的每個請求,平均值是基於 usePercent 字段進行求值的。

percent:對於輸入中匹配指定條件的百分比數.

endpoint_percent = from(Endpoint.*).percent(status == true);
在上面的例子中,輸入是每個端點的請求,條件是 endpoint.status == true。

rate:對於條件匹配的輸入,比率以100的分數表示。

​​​​​​​browser_app_error_rate = from(BrowserAppTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR, trafficCategory == BrowserAppTrafficCategory.NORMAL);
在上面的例子中,所有的輸入都是每個瀏覽器應用流量的請求。分子的條件是trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR,分母的條件是trafficCategory == BrowserAppTrafficCategory.NORMAL。

其中,第一個參數是分子的條件,第二個參數是分母的條件。

sum:某個域實體的調用總數。

​​​​​​​service_calls_sum = from(Service.*).sum();
在上面的例子中,統計每個服務的調用數。

histogram:熱力圖 更多詳見Heatmap in WIKI。

all_heatmap = from(All.latency).histogram(100, 20);
在上面的例子中,計算了所有傳入請求的熱力學熱圖。

第一個參數是計算延遲的精度,在上面的例子中,在101-200ms組中,113ms和193ms被認為是相同的。

第二個參數是分組數量,在上面的例子中,一共有21組數據分別為0-100ms,101-200ms......1901-2000ms,2000ms以上.

apdex:應用性能指數(Application Performance Index)

service_apdex = from(Service.latency).apdex(name, status);
在上面的例子中,計算了所有服務的應用性能指數。

第一個參數是服務名稱,該名稱的Apdex閾值在配置文件service-apdex-threshold.yml中定義。

第二個參數是請求狀態,狀態(成功或失敗)影響Apdex的計算。

P99,P95,P90,P75,P50:百分位 更多詳見Percentile in WIKI

百分位是自7.0版本引入的第一個多值度量。由於有多個值,可以通過getMultipleLinearIntValuesGraphQL查詢進行查詢。

all_percentile = from(All.latency).percentile(10);
在上面的例子中,計算了所有傳入請求的 P99,P95,P90,P75,P50。參數是百分位計算的精度,在上例中120ms和124被認為是相同的。

度量指標名稱(Metrics Name)
存儲實現,告警以及查詢模塊的度量指標名稱,SkyWalking 內核支持自動類型推斷。

組(Group)
所有度量指標數據都會使用 Scope.ID 和最小時間桶(min-level time bucket) 進行分組。

在端點的域中,Scope.ID 為端點的 ID(基於服務及其端點的唯一標志)。

強制轉換(Cast)
源的字段是靜態類型。在一些情況下,過濾語句和聚合語句所需要的字段類型和源的字段類型不匹配,例如:源的tag的值是String類型,大部分的聚合計算需要是數字類型。強制轉換表達式就是用來解決這個的。

用法

(str->long) or (long), cast string type into long.
(str->int) or (int), cast string type into int.
示例:

mq_consume_latency = from((str->long)Service.tag["transmission.latency"]).longAvg(); // the value of tag is string type.
強制轉換表達式支持如下位置:

From statement. from((cast)source.attre).
Filter expression. .filter((cast)tag["transmission.latency"] > 0)
Aggregation function parameter. .longAvg((cast)strField1== 1, (cast)strField2)
禁用(Disable)
Disable是OAL中的高級語句,只在特定情況下使用。

一些聚合和度量是通過核心硬代碼定義的,這個Disable語句是設計用來讓它們停止活動的,
比如segment, top_n_database_statement。

在默認情況下,沒有被禁用的。
————————————————
版權聲明:本文為CSDN博主「IT利刃出鞘」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/feiying0canglang/article/details/121562890


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM