分布式應用監控：SkyWalking 快速接入實踐

本文轉載自查看原文 2019-12-31 14:32 1410 程序員/ java

分布式應用，會存在各種問題。而要解決這些難題，除了要應用自己做一些監控埋點外，還應該有一些外圍的系統進行主動探測，主動發現。

APM工具就是干這活的，SkyWalking 是國人開源的一款優秀的APM應用，已成為apache的頂級項目。

今天我們就來實踐下 SkyWalking 下吧。

實踐目標：達到監控現有的幾個系統，清楚各調用關系，可以找到出性能問題點。

實踐步驟：

SkyWalking 服務端安裝運行；
應用端的接入；
后台查看效果；
分析排查問題；
深入了解（如有心情）；
SkyWalking 服務端安裝

下載應用包:

# 主下載頁 http://skywalking.apache.org/downloads/ # 點開具體下載地址后進行下載，如： wget http://mirrors.tuna.tsinghua.edu.cn/apache/skywalking/6.5.0/apache-skywalking-apm-6.5.0.tar.gz

解壓安裝包:

 tar -xzvf apache-skywalking-apm-6.5.0.tar.gz

使用默認配置端口，默認存儲方式 h2, 直接啟動服務：

  ./bin/startup.sh

好產品就是這么簡單！

現在服務端就啟起來了，可以打開后台地址查看(默認是8080端口): http://localhost:8080 界面如下：
分布式應用監控：SkyWalking 快速接入實踐

當然，上面是已存在應用的頁面。現在你是看不到任何應用的，因為你還沒有接入嘛。

應用端的接入

我們只以java應用接入方式實踐。

直接使用 javaagent 進行啟動即可：

java -javaagent:/root/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.service_name=app1 -Dskywalking.collector.backend_service=localhost:11800 -jar myapp.jar

參數說明：

# 參數解釋
 skywalking.agent.service_name: 本應用在skywalking中的名稱 skywalking.collector.backend_service: skywalking 服務端地址，grpc上報地址，默認端口是 11800 # 上面兩個參數也可以使用另外的表現形式 SW_AGENT_COLLECTOR_BACKEND_SERVICES: 與 skywalking.collector.backend_service 含義相同 SW_AGENT_NAME: 與 skywalking.agent.service_name 含義相同

隨便訪問幾個接口或頁面，使監控抓取到數據。

再回管理頁面，已經看到有節點了。截圖如上。

現在我們還可以查看各應用之間的關系了!
分布式應用監控：SkyWalking 快速接入實踐

關系清晰吧！一目了然，代碼再復雜也不怕了。

我們還可以追蹤具體鏈路：
分布式應用監控：SkyWalking 快速接入實踐

只要知道問題發生的時間點，即可以很快定位到發生問題的接口、系統，快速解決。

SkyWalking 配置文件

如上，我們並沒有改任何配置文件，就讓系統跑起來了。幸運的同時，我們應該要知道更多！至少配置得知道。

config/application.yml : 收集器服務端配置

webapp/webapp.yml : 配置 Web 的端口及獲取數據的 OAP(Collector)的IP和端口

agent/config/agent.config : 配置 Agent 信息，如 Skywalking OAP(Collector)的地址和名稱

下面是 skywalking 的默認配置，我們可以不用更改就能跑起來一個樣例！更改以生產化配置！

config/application.yml

cluster:
 standalone:
 # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+  # library the oap-libs folder with your ZooKeeper 3.4.x library. # zookeeper: # nameSpace: ${SW_NAMESPACE:""} # hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:2181} # #Retry Policy # baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries # maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry # # Enable ACL # enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default # schema: ${SW_ZK_SCHEMA:digest} # only support digest schema # expression: ${SW_ZK_EXPRESSION:skywalking:skywalking} # kubernetes: # watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:60} # namespace: ${SW_CLUSTER_K8S_NAMESPACE:default} # labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,release=skywalking} # uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID} # consul: # serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"} # Consul cluster nodes, example: 10.0.0.1:8500,10.0.0.2:8500,10.0.0.3:8500 # hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:8500} # nacos: # serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"} # hostPort: ${SW_CLUSTER_NACOS_HOST_PORT:localhost:8848} # # Nacos Configuration namespace # namespace: 'public' # etcd: # serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"} # etcd cluster nodes, example: 10.0.0.1:2379,10.0.0.2:2379,10.0.0.3:2379 # hostPort: ${SW_CLUSTER_ETCD_HOST_PORT:localhost:2379} core: default:  # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate  # Receiver: Receive agent data, Level 1 aggregate  # Aggregator: Level 2 aggregate role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator restHost: ${SW_CORE_REST_HOST:0.0.0.0} restPort: ${SW_CORE_REST_PORT:12800} restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/} gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0} gRPCPort: ${SW_CORE_GRPC_PORT:11800} downsampling: - Hour - Day - Month  # Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted. enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close. dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month  # Cache metric data for 1 minute to reduce database queries, and if the OAP cluster changes within that minute,  # the metrics may not be accurate within that minute. enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true} storage: # elasticsearch: # nameSpace: ${SW_NAMESPACE:""} # clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200} # protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"} # trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"} # trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""} # user: ${SW_ES_USER:""} # password: ${SW_ES_PASSWORD:""} # indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2} # indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0} # # Those data TTL settings will override the same settings in core module. # recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day # otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day # monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month # # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html # bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests # flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests # concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests # resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000} # metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} # segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200} h2: driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource} url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db} user: ${SW_STORAGE_H2_USER:sa} metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000} # mysql: # properties: # jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"} # dataSource.user: ${SW_DATA_SOURCE_USER:root} # dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root@1234} # dataSource.cachePrepStmts: ${SW_DATA_SOURCE_CACHE_PREP_STMTS:true} # dataSource.prepStmtCacheSize: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_SIZE:250} # dataSource.prepStmtCacheSqlLimit: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_LIMIT:2048} # dataSource.useServerPrepStmts: ${SW_DATA_SOURCE_USE_SERVER_PREP_STMTS:true} # metadataQueryMaxSize: ${SW_STORAGE_MYSQL_QUERY_MAX_SIZE:5000} receiver-sharing-server: default: receiver-register: default: receiver-trace: default: bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/} # Path to trace buffer files, suggest to use absolute path bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false} sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default. slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms. receiver-jvm: default: receiver-clr: default: service-mesh: default: bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/} # Path to trace buffer files, suggest to use absolute path bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false} istio-telemetry: default: envoy-metric: default: # alsHTTPAnalysis: ${SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS:k8s-mesh} #receiver_zipkin: # default: # host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0} # port: ${SW_RECEIVER_ZIPKIN_PORT:9411} # contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/} query: graphql: path: ${SW_QUERY_GRAPHQL_PATH:/graphql} alarm: default: telemetry: none: configuration: none: # apollo: # apolloMeta: http://106.12.25.204:8080 # apolloCluster: default # # apolloEnv: # defaults to null # appId: skywalking # period: 5 # nacos: # # Nacos Server Host # serverAddr: 127.0.0.1 # # Nacos Server Port # port: 8848 # # Nacos Configuration Group # group: 'skywalking' # # Nacos Configuration namespace # namespace: '' # # Unit seconds, sync period. Default fetch every 60 seconds. # period : 60 # # the name of current cluster, set the name if you want to upstream system known. # clusterName: "default" # zookeeper: # period : 60 # Unit seconds, sync period. Default fetch every 60 seconds. # nameSpace: /default # hostPort: localhost:2181 # #Retry Policy # baseSleepTimeMs: 1000 # initial amount of time to wait between retries # maxRetries: 3 # max number of times to retry # etcd: # period : 60 # Unit seconds, sync period. Default fetch every 60 seconds. # group : 'skywalking' # serverAddr: localhost:2379 # clusterName: "default" # consul: # # Consul host and ports, separated by comma, e.g. 1.2.3.4:8500,2.3.4.5:8500 # hostAndPorts: ${consul.address} # # Sync period in seconds. Defaults to 60 seconds. # period: 1  #exporter: # grpc: # targetHost: ${SW_EXPORTER_GRPC_HOST:127.0.0.1} # targetPort: ${SW_EXPORTER_GRPC_PORT:9870}

webapp/webapp.yml

 server:
 port: 8080

collector: path: /graphql ribbon: ReadTimeout: 10000 # Point to all backend's restHost:restPort, split by , listOfServers: 127.0.0.1:12800

agent/config/agent.config

 # The agent namespace # agent.namespace=${SW_AGENT_NAMESPACE:default-namespace}  # The service name in UI agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}  # The number of sampled traces per 3 seconds # Negative number means sample traces as many as possible, most likely 100% # agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-1}  # Authentication active is based on backend setting, see application.yml for more details. # agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx}  # The max amount of spans in a single segment. # Through this config item, skywalking keep your application memory cost estimated. # agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:300}  # Ignore the segments if their operation names end with these suffix. # agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg}  # If true, skywalking agent will save all instrumented classes files in `/debugging` folder. # Skywalking team may ask for these files in order to resolve compatible problem. # agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true}  # The operationName max length # agent.operation_name_threshold=${SW_AGENT_OPERATION_NAME_THRESHOLD:500}  # Backend service addresses. collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}  # Logging file_name logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}  # Logging level logging.level=${SW_LOGGING_LEVEL:DEBUG}  # Logging dir # logging.dir=${SW_LOGGING_DIR:""}  # Logging max_file_size, default: 300 * 1024 * 1024 = 314572800 # logging.max_file_size=${SW_LOGGING_MAX_FILE_SIZE:314572800}  # The max history log files. When rollover happened, if log files exceed this number, # then the oldest file will be delete. Negative or zero means off, by default. # logging.max_history_files=${SW_LOGGING_MAX_HISTORY_FILES:-1}  # mysql plugin configuration # plugin.mysql.trace_sql_parameters=${SW_MYSQL_TRACE_SQL_PARAMETERS:false}

SkyWalking 架構

來自官網的圖片，感受一下！無須細說，大概原理就是：針對各種不同客戶端實現不同的指標采集，統一通過grpc/http發送到apm服務端，然后經過分析引擎后存儲到es/h2/mysql等等存儲系統，最后由前端通過查詢引擎進行展現。
分布式應用監控：SkyWalking 快速接入實踐

可以用來干啥

發現系統耗時或者說瓶頸在哪里。

發現各系統之間的調用關系。

監控服務異常。

排查系統故障。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 分布式應用監控: SkyWalking 快速接入實踐 SkyWalking —— 分布式應用監控與鏈路追蹤分布式應用框架Akka快速入門分布式應用概述一 Docker 1.12實踐：Docker Service、Stack與分布式應用捆綁包《深入實踐Spring Boot》閱讀筆記之二：分布式應用開發基於消息隊列 RocketMQ 的大型分布式應用上雲最佳實踐 Zookeeper-5分鍾快速掌握分布式應用程序協調服使用NServiceBus開發分布式應用 Java分布式應用技術架構介紹