
1、概述
1.1 介紹
在分布式架構、微服務以及k8s生態相關技術環境下,對應用的請求鏈路進行追蹤(也叫做APM,Application Performance Management)是非常有必要的,鏈路追蹤簡單來說就是將應用從流量到達前端開始,一直到最后端的數據庫核心,中間經過的每一層請求鏈路的完整行為都記錄下來,而且通過可視化的形式實現鏈路信息查詢、依賴關系、性能分析、拓撲展示等等,利用鏈路追蹤系統可以很好的幫我們定位問題,這是常規監控手段實現起來比較困難的
常用的鏈路追蹤系統有商業版本和開源版本,比較出名(我了解過的)的有如下:
- 商業版本
- 聽雲
- 博睿宏遠
- 開源版本
- Skywalking:中國,個人開源,目前隸屬於
Apache基金會,作者近期剛剛入選Apache首位中國董事 - Pinpoint:韓國,個人開源
- Zipkin:美國,Twitter公司開源
- Cat:中國,美團開源
- Skywalking:中國,個人開源,目前隸屬於
具體每一款鏈路追蹤系統的的詳細信息可以在網上找到,其中商業版本這里不做評價
開源版本中后兩款對業務代碼有侵入性,前兩款的對比可以參考下圖
1.2 組件
本文采用的是SkyWalking,簡單來說分為以下幾個組成部分(以本文中的部署方式划分)
- skywalking-oap-server:后端服務
- skywalking-ui:
ui前端 - skywalking-es-init:初始化
es集群數據使用 - elasticsearch:存儲
skywalking的數據指標
2、基礎准備
2.1 准備helm環境
helm3版本只需要一個二進制包即可,我這里的版本如下
# helm version
version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}
2.2 創建單獨的ns
將skywalking部署在單獨的命名空間下
# kubectl create ns monitoring
namespace/monitoring created
2.3 創建secret
這里記錄的是在內網環境下部署的skywalking,本地電腦為helm部署客戶端可以訪問外網,k8s集群無外網。因此需要將skywalking用到的鏡像全部由內網環境私有鏡像倉庫提供
2.3.1 拉取鏡像的secret
# kubectl create secret docker-registry registry-pull-secret --docker-username=admin --docker-password=123456 --docker-email=admin@admin.com --docker-server=hub.ssgeek.com -n monitoring
secret/registry-pull-secret created
2.3.2 用於https安全訪問的secret
可選步驟,我的集群中有cert-manager自動頒發證書,提供給skywalking ui的ingress使用,對應需要修改后面的chart包相關部分
# cat certificate.yaml
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: skywalking
namespace: monitoring
spec:
secretName: skywalking
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
duration: 2160h
renewBefore: 360h
keyEncoding: pkcs1
dnsNames:
- skywalking.ssgeek.com
# kubectl apply -f certificate.yaml
certificate.cert-manager.io/skywalking created
# kubectl get certificate,secret -n monitoring|grep skywalking
certificate.cert-manager.io/skywalking True skywalking 2m50s
secret/skywalking kubernetes.io/tls 3 2m49s
2.3.3 用於skywalking ui訪問控制的secret
skywalking的ui界面默認沒有訪問控制,可以通過下面基於Nginx Ingress的basic auth方案,也可以使用我之前文章中記錄的基於k8s Ingress Nginx+OAuth2+Gitlab無代碼侵入實現自定義服務的外部驗證
畫重點:這里使用basic有個小坑,參考官方文檔經過測試,在創建secret之前通過htpasswd工具生成的記錄用戶名密碼的文件的文件名,必須叫auth,不然經過后續的一頓操作,最終訪問的結果還是503,這與傳統方式配置nginx的basic auth是不同的,可能在源碼中將此參數硬編碼了,具體原因沒有深究
# htpasswd -c auth skywalking
New password:
Re-type new password:
Adding password for user skywalking
# kubectl -n monitoring create secret generic ui-auth --from-file=auth
secret/ui-auth created
2.4 私有倉庫鏡像存儲
把部署涉及到的相關鏡像存儲到內部倉庫,部署的是目前最新版本的skywalking
apache/skywalking-ui:8.4.0
hub.ssgeek.com/skywalking/skywalking-ui:8.4.0
apache/skywalking-oap-server:8.4.0-es7
hub.ssgeek.com/skywalking/skywalking-oap-server:8.4.0-es7
busybox:1.30
hub.ssgeek.com/skywalking/busybox:1.30
docker.elastic.co/elasticsearch/elasticsearch:7.5.2
hub.ssgeek.com/skywalking/elasticsearch:7.5.2
3、獲取chart並更新依賴和value相關參數
獲取官方最新的chart,並更新chart依賴,更新依賴會自動下載一個子chart包,也就是elasticsearch的官方chart,下載的包不用解壓更改,所有參數都通過父chart的value.yaml全局指定
# git clone https://github.com/apache/skywalking-kubernetes.git
# cd skywalking-kubernetes/chart
# helm dep up skywalking
Hang tight while we grab the latest from your chart repositories...
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading elasticsearch from repo https://helm.elastic.co/
Deleting outdated charts
修改value.yaml,下面的內容中只列出了我修改后的部分內容,其中關於elasticsearch還有很多參數及優化可供配置,這里僅使用精簡配置,更多內容可以參考官方的說明
...
imagePullSecrets:
- name: registry-pull-secret
initContainer:
image: hub.ssgeek.com/skywalking/busybox
tag: '1.30'
oap:
name: oap
# When 'dynamicConfigEnabled' set to true, enable oap dynamic configuration through k8s configmap,
# Note: The default configmap data is empty, please refer to the detailed documentation (https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/dynamic-config.md)
# Sync period in seconds. Defaults to 60 seconds. env: SW_CONFIG_CONFIGMAP_PERIOD
dynamicConfigEnabled: false
image:
repository: hub.ssgeek.com/skywalking/skywalking-oap-server
tag: 8.4.0-es7 # Must be set explicitly
pullPolicy: IfNotPresent
storageType: elasticsearch7 # 存儲類型為es7
...
tolerations: []
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 1Gi
...
env:
# more env, please refer to https://hub.docker.com/r/apache/skywalking-oap-server
# or https://github.com/apache/skywalking-docker/blob/master/6/6.4/oap/README.md#sw_telemetry
SW_NAMESPACE: "skywalking" # 指定es索引前綴為skywalking_, 其中下划線_會自動加上
...
ui:
name: ui
replicas: 1
image:
repository: hub.ssgeek.com/skywalking/skywalking-ui
tag: 8.4.0 # Must be set explicitly
pullPolicy: IfNotPresent
# podAnnotations:
# example: oap-foo
nodeAffinity: {}
nodeSelector: {}
tolerations: []
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
# 指定basic auth相關注解
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: ui-auth
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
path: /
hosts:
- skywalking.ssgeek.com
tls:
- secretName: skywalking
hosts:
- skywalking.ssgeek.com
...
elasticsearch:
enabled: true
config: # For users of an existing elasticsearch cluster,takes effect when `elasticsearch.enabled` is false
port:
http: 9200
# host: elasticsearch # es service on kubernetes or host
host: elasticsearch-logging.logging.svc
user: "elastic" # [optional]
password: "elastic" # [optional]
clusterName: "elasticsearch"
nodeGroup: "logging"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "elasticsearch-logging"
...
image: "hub.ssgeek.com/skywalking/elasticsearch"
imageTag: "7.5.2"
imagePullPolicy: "IfNotPresent"
...
resources:
requests:
cpu: "100m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
...
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "ceph-rbd"
resources:
requests:
storage: 30Gi
...
persistence:
enabled: true
annotations: {}
...
imagePullSecrets:
- name: registry-pull-secret
4、helm安裝skywalking
前面的准備工作都做完后,就可以開始通過helm一鍵部署skywalking了
# helm install skywalking skywalking -n monitoring --values ./skywalking/values.yaml
NAME: skywalking
LAST DEPLOYED: Thu Mar 18 18:45:03 2021
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
************************************************************************
* *
* SkyWalking Helm Chart by SkyWalking Team *
* *
************************************************************************
Thank you for installing skywalking.
Your release is named skywalking.
Learn more, please visit https://skywalking.apache.org/
Get the UI URL by running these commands:
https://skywalking.ssgeek.com/
5、檢查
觀察pod日志,直到出現create instance_jvm_thread_peak_count index template finished
2021-03-18 10:48:32,242 - org.apache.skywalking.oap.server.core.storage.model.ModelInstaller -139765 [main] INFO [] - table: instance_jvm_thread_peak_count does not exist
2021-03-18 10:48:32,243 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -139766 [main] INFO [] - index skywalking_instance_jvm_thread_peak_count's columnTypeEsMapping builder str: {properties={service_id={type=keyword}, count={index=false, type=long}, time_bucket={type=long}, entity_id={type=keyword}, value={type=long}, summation={index=false, type=long}}}
2021-03-18 10:48:32,614 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140137 [main] INFO [] - create instance_jvm_thread_peak_count index template finished, isAcknowledged: true
2021-03-18 10:48:33,319 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140842 [main] INFO [] - create instance_jvm_thread_peak_count-20210318 index finished, isAcknowledged: true
......
2021-03-18 10:48:33,583 - org.eclipse.jetty.server.handler.ContextHandler -141106 [main] INFO [] - Started o.e.j.s.ServletContextHandler@12e4822b{/,null,AVAILABLE}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.AbstractConnector -141120 [main] INFO [] - Started ServerConnector@5cc9d3d0{HTTP/1.1, (http/1.1)}{0.0.0.0:12800}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.Server -141120 [main] INFO [] - Started @141185ms
2021-03-18 10:48:33,599 - org.apache.skywalking.oap.server.core.storage.PersistenceTimer -141122 [main] INFO [] - persistence timer start
2021-03-18 10:48:33,603 - org.apache.skywalking.oap.server.core.cache.CacheUpdateTimer -141126 [main] INFO [] - Cache updateServiceInventory timer start
2021-03-18 10:48:41,499 - org.apache.skywalking.oap.server.starter.OAPServerBootstrap -149022 [main] INFO [] - OAP starts up in init mode successfully, exit now...
查看pod狀態
# kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 0 5m54s
elasticsearch-logging-1 1/1 Running 0 5m53s
elasticsearch-logging-2 1/1 Running 0 5m53s
skywalking-es-init-t7ndj 0/1 Completed 0 5m54s
skywalking-oap-57d7f454f5-8gbh5 1/1 Running 2 5m54s
skywalking-oap-57d7f454f5-vqh2d 1/1 Running 2 5m54s
skywalking-ui-698cdb4dbc-xxktt 1/1 Running 0 5m54s
訪問web ui,通過界面訪問並輸入basic auth設置的用戶名和密碼后,成功訪問到skywalking的主界面
到這里,基於k8s+helm在內網環境下部署的skywalking服務端就結束了,如果是完全沒有內網的環境,可以把前面修改完成后的chart包打包上傳到私有helm倉庫例如harbor中,這樣chart+image都是內網,部署時就完全不需要外網了
后面會繼續實踐后並分享采集端的接入以及具體使用,歡迎催更~ ☺
更多參考
