全鏈路跟蹤skywalking簡介


該文章主要包括以下內容:

  1. skywalking的簡介
  2. skywalking的使用,支持多種調用中間件(httpclent,springmvc,dubbo,mysql等等)
  3. skywalking的traceId與日志組件(log4j,logback,elk等)的集成
  4. skywalking告警模塊使用
  5. skywalking的原理
  6. skywalking的限制

 

1.skywalking的簡介:

 

            Overview:

SkyWalking: an open source observability platform to collect, analyze, aggregate and visualize data from services and cloud native infrastructures.
SkyWalking provides an easy way to keep you have a clear view of your distributed system, even across Cloud.
It is more like a modern APM, specially designed for cloud native, container based and distributed system.

-------

skywalking是一個開放源碼的,用於收集、分析,聚合,可視化來自於不同服務和本地基礎服務的數據的可觀察的平台,
skywalking提供了一個簡單的方法來讓你對你的分布式系統甚至是跨雲的服務有清晰的了解。
它更像是一個現代的系統性能管理,特別為分布式系統而設計。

          Why use SkyWalking?

 

SkyWalking provides solutions for observing and monitoring distributed system, in many different scenarios. 
First of all, like traditional ways, SkyWalking provides auto instrument agents for service, such as Java, C# and Node.js.
At the same time, it provides manual instrument SDKs for Go(Not yet), C++(Not yet).
Also with more languages required, risks in manipulating codes at runtime, cloud native infrastructures grow more powerful,
SkyWalking could use Service Mesher infra probes to collect data for understanding the whole distributed system.
In general, it provides observability capabilities for service(s), service instance(s), endpoint(s).

----------
skywalking提供了在很多不同的場景下用於觀察和監控分布式系統的方式。
首先,像傳統的方法,skywalking為java,c#,Node.js等提供了自動探針代理.
同時,它為Go,C++提供了手工探針。
隨着本地服務越來越多,需要越來越多的語言,掌控代碼的風險也在增加,
Skywalking可以使用網狀服務探針收集數據,以了解整個分布式系統。
通常,skywalking提供了觀察service,service instance,endpoint的能力。

service: 一個服務
Service Instance: 服務的實例(1個服務會啟動多個節點)
Endpoint: 一個服務中的其中一個接口

 

          

   Architecture:

 

 

         2.skywalking的使用:

        第一步:從skywalking的官網http://skywalking.apache.org/downloads/下載包,包的結構如圖。

             

     第二步:啟動skywalking收集器服務,啟動腳本是E:\apache-skywalking-apm-bin\bin\startup.sh,啟動之后我們就可以訪問http://localhost:8080/就可以看到skywalking的ui界面了。

          

 

     第三步:啟動項目:  拷貝skywalking-agent目錄到所需位置,探針包含整個目錄,請不要改變目錄結構,可修改agent.config配置agent.application_code=xxl-job為自己的應用名

              增加JVM啟動參數,-javaagent:/path/to/skywalking-agent/skywalking-agent.jar。參數值為skywalking-agent.jar的絕對路徑。

   通過以上幾步之后,我們就可以直接訪問我們的項目的接口,看skywalking界面上能否收集到我們的調用信息了。

下圖為skywalking的首頁,主要展示全局的性能信息。

    為了驗證skywalking具有發現系統拓撲(系統依賴)的功能,啟動4個服務,4個服務的接口路徑分別為hello/start1,hello/start2,hello/start3,hello/start4,

      在服務的依賴關系為: start1依賴start2,start2依賴start3和start4。

       訪問start1接口,skywalking展示的項目拓撲圖如下:

      

       全鏈路性能跟蹤展示頁面:

         

     skywalking默認支持調用性能監控的類型有DB(1),RPC_FRAMEWORK(2),HTTP(3),MQ(4),CACHE(5),此外還支持自定義插件來監控未支持的組件。

      下面來看下調用dubbo和db的效果:(服務start2中調用db和項目4的dubbo服務)

 

      3.skywalking的traceId與日志組件(log4j,logback,elk等)的集成:

     以logback為例,只要在日志配置xml中增加以下配置,則在打印日志的時候,自動把當前上下文中的traceId加入到日志中去。

        

    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
        <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
             <pattern>
                 %d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %tid - %msg%n
             </pattern>
        </layout>
    </appender>

         效果如下圖所示,鏈路中的所有節點的traceId是一樣的,這樣就可以在skywalking上面發現性能差的traceId后,再去日志組件中查看日志是否有異常日志。

       服務1中打印的日志:

       2019-08-14 16:46:22 [http-nio-9091-exec-1] INFO  c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service1 logger with traceId

       服務2中打印的日志:

       2019-08-14 16:46:24 [http-nio-9092-exec-9] INFO  c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service2 logger with traceId

       服務3中打印的日志:

       2019-08-14 16:46:24 [http-nio-9093-exec-1] INFO  c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service3 logger with traceId

       服務4中打印的日志:

        2019-08-14 16:46:24 [http-nio-9094-exec-1] INFO  c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service4 logger with traceId    

   

    4.skywalking告警模塊的使用:

     下圖為告警頁面的ui界面,可以看到可以從三個維度來監控,分別為服務(service)、服務實例(service instance),端點(endpoint/接口)。

        告警規則可以在安裝包下的配置文件-(apache-skywalking-apm-bin/config/alarm-settings.yml)中,自由定義。

        默認配置監控服務和服務實例,不監控端點,因為 # Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.# Because the number of endpoint is much more than service and instance.

      

  下面代碼為配置告警規則的代碼,skywalking還支持使用者配置告警接口,來及時發送通知,如發送短信/郵件等。如配置文件中的webhooks中。

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Sample alarm rules.
rules:
  # Rule unique name, must be ended with `_rule`.
  service_resp_time_rule:
    metrics-name: service_resp_time
    op: ">"
    threshold: 1000
    period: 10
    count: 3
    silence-period: 5
    message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
  service_sla_rule:
    # Metrics value need to be long, double or int
    metrics-name: service_sla
    op: "<"
    threshold: 8000
    # The length of time to evaluate the metrics
    period: 10
    # How many times after the metrics match the condition, will trigger alarm
    count: 2
    # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
    silence-period: 3
    message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
  service_p90_sla_rule:
    # Metrics value need to be long, double or int
    metrics-name: service_p90
    op: ">"
    threshold: 1000
    period: 10
    count: 3
    silence-period: 5
    message: 90% response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes
  service_instance_resp_time_rule:
    metrics-name: service_instance_resp_time
    op: ">"
    threshold: 1000
    period: 10
    count: 2
    silence-period: 5
    message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
#  Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.
#  Because the number of endpoint is much more than service and instance.
#
  endpoint_avg_rule:
    metrics-name: endpoint_avg
    op: ">"
    threshold: 1000
    period: 10
    count: 2
    silence-period: 5
    message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes

#webhooks:
#  - http://127.0.0.1/notify/
#  - http://127.0.0.1/go-wechat/

 

 

5.skywalking的原理:

       skywalaking總體架構分為三部分:

  1.    skywalking-collector:鏈路數據歸集器,數據可以落地ElasticSearch,單機也可以落地H2,不推薦,H2僅作為臨時演示用
  2.    skywalking-web:web可視化平台,用來展示落地的數據
  3.    skywalking-agent:探針,用來收集和發送數據到歸集器

skywalking的核心在於agent部分,下圖展示了一次調用跨多個進程里agent的詳細的運行過程:

 

agent支持多種客戶端和服務端,支持的插件明細:--->https://github.com/apache/skywalking/blob/master/docs/en/setup/service-agent/java-agent/Supported-list.md

以攔截dubbo請求為例,skywalking的dubbo攔截插件實現的代碼實現:

源碼使用的是攔截dubbo中的MonitorFilter這個類中的invoke方法。具體如DubboInterceptor所示,通過獲取dubbo的上下文RpcContext先對消費者調用之前加入sky walking的跨進程協議header信息sw:traceId,然后到生產者取出。

 

package org.apache.skywalking.apm.plugin.dubbo;
public class DubboInstrumentation extends ClassInstanceMethodsEnhancePluginDefine {

    private static final String ENHANCE_CLASS = "com.alibaba.dubbo.monitor.support.MonitorFilter";
    private static final String INTERCEPT_CLASS = "org.apache.skywalking.apm.plugin.dubbo.DubboInterceptor";

    @Override
    protected ClassMatch enhanceClass() {
        return NameMatch.byName(ENHANCE_CLASS);
    }

    @Override
    public ConstructorInterceptPoint[] getConstructorsInterceptPoints() {
        return null;
    }

    @Override
    public InstanceMethodsInterceptPoint[] getInstanceMethodsInterceptPoints() {
        return new InstanceMethodsInterceptPoint[] {
            new InstanceMethodsInterceptPoint() {
                @Override
                public ElementMatcher<MethodDescription> getMethodsMatcher() {
                    return named("invoke");
                }

                @Override
                public String getMethodsInterceptor() {
                    return INTERCEPT_CLASS;
                }

                @Override
                public boolean isOverrideArgs() {
                    return false;
                }
            }
        };
    }
}

以下代碼為Dubbo攔截器的實現:

package org.apache.skywalking.apm.plugin.dubbo;

import com.alibaba.dubbo.common.URL;
import com.alibaba.dubbo.rpc.Invocation;
import com.alibaba.dubbo.rpc.Invoker;
import com.alibaba.dubbo.rpc.Result;
import com.alibaba.dubbo.rpc.RpcContext;
import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.context.ContextCarrier;
import org.apache.skywalking.apm.agent.core.context.tag.Tags;
import org.apache.skywalking.apm.agent.core.context.CarrierItem;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.SpanLayer;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.EnhancedInstance;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstanceMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;
import org.apache.skywalking.apm.network.trace.component.ComponentsDefine;

/**
 * {@link DubboInterceptor} define how to enhance class {@link com.alibaba.dubbo.monitor.support.MonitorFilter#invoke(Invoker,
 * Invocation)}. the trace context transport to the provider side by {@link RpcContext#attachments}.but all the version
 * of dubbo framework below 2.8.3 don't support {@link RpcContext#attachments}, we support another way to support it.
 *
 * @author zhangxin
 */
public class DubboInterceptor implements InstanceMethodsAroundInterceptor {
    /**
     * <h2>Consumer:</h2> The serialized trace context data will
     * inject to the {@link RpcContext#attachments} for transport to provider side.
     * <p>
     * <h2>Provider:</h2> The serialized trace context data will extract from
     * {@link RpcContext#attachments}. current trace segment will ref if the serialize context data is not null.
     */
    @Override
    public void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments,
        Class<?>[] argumentsTypes, MethodInterceptResult result) throws Throwable {
        Invoker invoker = (Invoker)allArguments[0];
        Invocation invocation = (Invocation)allArguments[1];
        RpcContext rpcContext = RpcContext.getContext();
        boolean isConsumer = rpcContext.isConsumerSide();
        URL requestURL = invoker.getUrl();

        AbstractSpan span;

        final String host = requestURL.getHost();
        final int port = requestURL.getPort();
        if (isConsumer) {
            final ContextCarrier contextCarrier = new ContextCarrier();
            span = ContextManager.createExitSpan(generateOperationName(requestURL, invocation), contextCarrier, host + ":" + port);
            //invocation.getAttachments().put("contextData", contextDataStr);
            //@see https://github.com/alibaba/dubbo/blob/dubbo-2.5.3/dubbo-rpc/dubbo-rpc-api/src/main/java/com/alibaba/dubbo/rpc/RpcInvocation.java#L154-L161
            CarrierItem next = contextCarrier.items();
            while (next.hasNext()) {
                next = next.next();
                rpcContext.getAttachments().put(next.getHeadKey(), next.getHeadValue());
            }
        } else {
            ContextCarrier contextCarrier = new ContextCarrier();
            CarrierItem next = contextCarrier.items();
            while (next.hasNext()) {
                next = next.next();
                next.setHeadValue(rpcContext.getAttachment(next.getHeadKey()));
            }

            span = ContextManager.createEntrySpan(generateOperationName(requestURL, invocation), contextCarrier);
        }

        Tags.URL.set(span, generateRequestURL(requestURL, invocation));
        span.setComponent(ComponentsDefine.DUBBO);
        SpanLayer.asRPCFramework(span);
    }

    @Override
    public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments,
        Class<?>[] argumentsTypes, Object ret) throws Throwable {
        Result result = (Result)ret;
        if (result != null && result.getException() != null) {
            dealException(result.getException());
        }

        ContextManager.stopSpan();
        return ret;
    }

    @Override
    public void handleMethodException(EnhancedInstance objInst, Method method, Object[] allArguments,
        Class<?>[] argumentsTypes, Throwable t) {
        dealException(t);
    }

    /**
     * Log the throwable, which occurs in Dubbo RPC service.
     */
    private void dealException(Throwable throwable) {
        AbstractSpan span = ContextManager.activeSpan();
        span.errorOccurred();
        span.log(throwable);
    }

    /**
     * Format operation name. e.g. org.apache.skywalking.apm.plugin.test.Test.test(String)
     *
     * @return operation name.
     */
    private String generateOperationName(URL requestURL, Invocation invocation) {
        StringBuilder operationName = new StringBuilder();
        operationName.append(requestURL.getPath());
        operationName.append("." + invocation.getMethodName() + "(");
        for (Class<?> classes : invocation.getParameterTypes()) {
            operationName.append(classes.getSimpleName() + ",");
        }

        if (invocation.getParameterTypes().length > 0) {
            operationName.delete(operationName.length() - 1, operationName.length());
        }

        operationName.append(")");

        return operationName.toString();
    }

    /**
     * Format request url.
     * e.g. dubbo://127.0.0.1:20880/org.apache.skywalking.apm.plugin.test.Test.test(String).
     *
     * @return request url.
     */
    private String generateRequestURL(URL url, Invocation invocation) {
        StringBuilder requestURL = new StringBuilder();
        requestURL.append(url.getProtocol() + "://");
        requestURL.append(url.getHost());
        requestURL.append(":" + url.getPort() + "/");
        requestURL.append(generateOperationName(url, invocation));
        return requestURL.toString();
    }
}

 在調用結束后結束,把span的詳情信息發送給collector(數據收集器).具體實現在類org.apache.skywalking.apm.agent.core.context.TracingContext的stopSpan(AbstractSpan span)方法,

下面是stopSpan的具體實現方法:

@Override
    public boolean stopSpan(AbstractSpan span) {
        AbstractSpan lastSpan = peek();
        if (lastSpan == span) {
            if (lastSpan instanceof AbstractTracingSpan) {
                AbstractTracingSpan toFinishSpan = (AbstractTracingSpan)lastSpan;
                if (toFinishSpan.finish(segment)) {
                    pop();
                }
            } else {
                pop();
            }
        } else {
            throw new IllegalStateException("Stopping the unexpected span = " + span);
        }

        finish();

        return activeSpanStack.isEmpty();
    }

具體發送數據的邏輯在finish方法中

/**
     * Finish this context, and notify all {@link TracingContextListener}s, managed by {@link
     * TracingContext.ListenerManager}
     */
    private void finish() {
        if (isRunningInAsyncMode) {
            asyncFinishLock.lock();
        }
        try {
            if (activeSpanStack.isEmpty() && running && (!isRunningInAsyncMode || asyncSpanCounter.get() == 0)) {
                TraceSegment finishedSegment = segment.finish(isLimitMechanismWorking());
                /*
                 * Recheck the segment if the segment contains only one span.
                 * Because in the runtime, can't sure this segment is part of distributed trace.
                 *
                 * @see {@link #createSpan(String, long, boolean)}
                 */
                if (!segment.hasRef() && segment.isSingleSpanSegment()) {
                    if (!samplingService.trySampling()) {
                        finishedSegment.setIgnore(true);
                    }
                }

                /*
                 * Check that the segment is created after the agent (re-)registered to backend,
                 * otherwise the segment may be created when the agent is still rebooting and should
                 * be ignored
                 */
                if (segment.createTime() < RemoteDownstreamConfig.Agent.INSTANCE_REGISTERED_TIME) {
                    finishedSegment.setIgnore(true);
                }

                TracingContext.ListenerManager.notifyFinish(finishedSegment); //通知監控追蹤容器的監聽者,監聽者會把數據發送給collector.

                running = false;
            }
        } finally {
            if (isRunningInAsyncMode) {
                asyncFinishLock.unlock();
            }
        }
    }

 5.skywalking的限制

Just effect frameworks or libraries. 
Because of the changing codes by agents, it also means the codes are already known by agent plugin developers.
So, there is always a supported list in this kind of probes. Like SkyWalking Java agent supported list. Across thread can't be supported all the time.
Like we said about in process propagation, most codes run in a single thread per request, especially business codes.
But in some other scenarios, they do things in different threads, such as job assignment, task pool or batch process.
Or some languages provide coroutine or similar thing like Goroutine, then developer could run async process with low payload, even been encouraged. In those cases, auto instrument will face problems.

1.只支持已知的代理,如果使用的中間件還未被支持,需要自己寫插件。

2.跨線程的場景不支持自動代理,比如任務分配,任務池,批處理的場景。

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM