Collector的配置和使用
Collector配置
collector通過pipeline處理service中啟用的數據。pipeline由接收遙測數據的組件構成,包括:
其次還可以通過擴展來為Collector添加功能,但擴展不需要直接訪問遙測數據,且不是pipeline的一部分。擴展同樣可以在service中啟用。
Receivers
receiver定義了數據如何進入OpenTelemetry Collector。必須配置一個或多個receiver,默認不會配置任何receivers。
下面給出了所有可用的receivers的基本例子,更多配置可以參見receiver文檔。
receivers:
opencensus:
address: "localhost:55678"
zipkin:
address: "localhost:9411"
jaeger:
protocols:
grpc:
thrift_http:
thrift_tchannel:
thrift_compact:
thrift_binary:
prometheus:
config:
scrape_configs:
- job_name: "caching_cluster"
scrape_interval: 5s
static_configs:
- targets: ["localhost:8889"]
Processors
Processors運行在數據的接收和導出之間。雖然Processors是可選的,但有時候會建議使用Processors。
下面給出了所有可用的Processors的基本例子,更多參見Processors文檔。
processors:
attributes/example:
actions:
- key: db.statement
action: delete
batch:
timeout: 5s
send_batch_size: 1024
probabilistic_sampler:
disabled: true
span:
name:
from_attributes: ["db.svc", "operation"]
separator: "::"
queued_retry: {}
tail_sampling:
policies:
- name: policy1
type: rate_limiting
rate_limiting:
spans_per_second: 100
Exporters
exporter指定了如何將數據發往一個或多個后端/目標。必須配置一個或多個exporter,默認不會配置任何exporter。
下面給出了所有可用的exporters的基本例子,更多參見exporters文檔。
exporters:
opencensus:
headers: {"X-test-header": "test-header"}
compression: "gzip"
cert_pem_file: "server-ca-public.pem" # optional to enable TLS
endpoint: "localhost:55678"
reconnection_delay: 2s
logging:
loglevel: debug
jaeger_grpc:
endpoint: "http://localhost:14250"
jaeger_thrift_http:
headers: {"X-test-header": "test-header"}
timeout: 5
endpoint: "http://localhost:14268/api/traces"
zipkin:
endpoint: "http://localhost:9411/api/v2/spans"
prometheus:
endpoint: "localhost:8889"
namespace: "default"
Service
Service部分用於配置OpenTelemetry Collector根據receivers, processors, exporters, 和extensions sections的配置會啟用那些特性。service分為兩部分:
- extensions
- pipelines
extensions包含啟用的擴展,如:
service:
extensions: [health_check, pprof, zpages]
Pipelines有兩類:
- metrics: 采集和處理metrics數據
- traces: 采集和處理trace數據
一個pipeline是一組 receivers, processors, 和exporters的集合。必須在service之外定義每個receiver/processor/exporter的配置,然后將其包含到pipeline中。
注:每個receiver/processor/exporter都可以用到多個pipeline中。當多個pipeline引用processor(s)時,每個pipeline都會獲得該processor(s)的一個實例,這與多個pipeline中引用receiver(s)/exporter(s)的情況不同(所有pipelines僅能獲得receiver/exporter的一個實例)。
下面給出了一個pipeline配置的例子,更多可以參見pipeline文檔。
service:
pipelines:
metrics:
receivers: [opencensus, prometheus]
exporters: [opencensus, prometheus]
traces:
receivers: [opencensus, jaeger]
processors: [batch, queued_retry]
exporters: [opencensus, zipkin]
Extensions
Extensions可以用於監控OpenTelemetry Collector的健康狀態。Extensions是可選的,默認不會配置任何Extensions。
下面給出了所有可用的extensions的基本例子,更多參見extensions文檔。
extensions:
health_check: {}
pprof: {}
zpages: {}
使用環境變量
collector配置中可以使用環境變量,如:
processors:
attributes/example:
actions:
- key: "$DB_KEY"
action: "$OPERATION"
Collector的使用
下面使用官方demo來體驗一下Collector的功能
本例展示如何從OpenTelemetry-Go SDK 中導出trace和metric數據,並將其導入OpenTelemetry Collector,最后通過Collector將trace數據傳遞給Jaeger,將metric數據傳遞給Prometheus。完整的流程為:
-----> Jaeger (trace)
App + SDK ---> OpenTelemtry Collector ---|
-----> Prometheus (metrics)
部署到Kubernetes
k8s目錄中包含本demo所需要的所有部署文件。為了簡化方便,官方將部署目錄集成到了一個makefile文件中。在必要時可以手動執行Makefile中的命令。
部署Prometheus operator
git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus
kubectl create -f manifests/setup
# wait for namespaces and CRDs to become available, then
kubectl create -f manifests/
可以使用如下方式清理環境:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
等待prometheus所有組件變為running狀態
# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 16m
alertmanager-main-1 2/2 Running 0 16m
alertmanager-main-2 2/2 Running 0 16m
grafana-7f567cccfc-4pmhq 1/1 Running 0 16m
kube-state-metrics-85cb9cfd7c-x6kq6 3/3 Running 0 16m
node-exporter-c4svg 2/2 Running 0 16m
node-exporter-n6tnv 2/2 Running 0 16m
prometheus-adapter-557648f58c-vmzr8 1/1 Running 0 16m
prometheus-k8s-0 3/3 Running 0 16m
prometheus-k8s-1 3/3 Running 1 16m
prometheus-operator-5b469f4f66-qx2jc 2/2 Running 0 16m
使用Makefile
下面使用makefile部署Jaeger,Prometheus monitor和Collector,依次執行如下命令即可:
# Create the namespace
make namespace-k8s
# Deploy Jaeger operator
make jaeger-operator-k8s
# After the operator is deployed, create the Jaeger instance
make jaeger-k8s
# Then the Prometheus instance. Ensure you have enabled a Prometheus operator
# before executing (see above).
make prometheus-k8s
# Finally, deploy the OpenTelemetry Collector
make otel-collector-k8s
等待observability
命名空間下的Jaeger和Collector的Pod變為running狀態
# kubectl get pod -n observability
NAME READY STATUS RESTARTS AGE
jaeger-7b868df4d6-w4tk8 1/1 Running 0 97s
jaeger-operator-9b4b7bb48-q6k59 1/1 Running 0 110s
otel-collector-7cfdcb7658-ttc8j 1/1 Running 0 14s
可以使用make clean-k8s
命令來清理環境,但該命令不會移除命名空間,需要手動刪除命名空間:
kubectl delete namespaces observability
配置OpenTelemetry Collector
完成上述步驟之后,就部署好了所需要的所有資源。下面看一下Collector的配置文件:
為了使應用發送數據到OpenTelemetry Collector,首先需要配置otlp
類型的receiver,它使用gRpc進行通信:
...
otel-collector-config: |
receivers:
# Make sure to add the otlp receiver.
# This will open up the receiver on port 55680.
otlp:
endpoint: 0.0.0.0:55680
processors:
...
上述配置會在Collector側創建receiver,並打開55680
端口,用於接收trace。剩下的配置都比較標准,唯一需要注意的是需要創建Jaeger和Prometheus exporters:
...
exporters:
jaeger_grpc:
endpoint: "jaeger-collector.observability.svc.cluster.local:14250"
prometheus:
endpoint: 0.0.0.0:8889
namespace: "testapp"
...
OpenTelemetry Collector service
配置中另外一個值得注意的是用於訪問OpenTelemetry Collector的NodePort
apiVersion: v1
kind: Service
metadata:
...
spec:
ports:
- name: otlp # Default endpoint for otlp receiver.
port: 55680
protocol: TCP
targetPort: 55680
nodePort: 30080
- name: metrics # Endpoint for metrics from our app.
port: 8889
protocol: TCP
targetPort: 8889
selector:
component: otel-collector
type:
NodePort
該service 會將用於訪問otlp receiver的30080端口與cluster node的55680端口進行綁定,這樣就可以通過靜態地址<node-ip>:30080
來訪問Collector。
運行代碼
在main.go文件中可以看到完整的示例代碼。要運行該代碼,需要滿足Go的版本>=1.13
# go run main.go
2020/10/20 09:19:17 Waiting for connection...
2020/10/20 09:19:17 Doing really hard work (1 / 10)
2020/10/20 09:19:18 Doing really hard work (2 / 10)
2020/10/20 09:19:19 Doing really hard work (3 / 10)
2020/10/20 09:19:20 Doing really hard work (4 / 10)
2020/10/20 09:19:21 Doing really hard work (5 / 10)
2020/10/20 09:19:22 Doing really hard work (6 / 10)
2020/10/20 09:19:23 Doing really hard work (7 / 10)
2020/10/20 09:19:24 Doing really hard work (8 / 10)
2020/10/20 09:19:25 Doing really hard work (9 / 10)
2020/10/20 09:19:26 Doing really hard work (10 / 10)
2020/10/20 09:19:27 Done!
2020/10/20 09:19:27 exporter stopped
該示例模擬了一個正在運行應用程序,計算10秒之后結束。
查看采集到的數據
運行go run main.go
的數據流如下:
Jaeger UI
Jaeger上查詢trace內容如下:
Prometheus
運行main.go結束之后,可以在Prometheus中查看該metric。其對應的Prometheus target為observability/otel-collector/0
Prometheus上查詢metric內容如下:
FAQ:
-
在運行完部署命令之后,發現Prometheus沒有注冊如http://10.244.1.33:8889/metrics這樣的target。可以查看Prometheus pod的日志,可能是因為Prometheus沒有對應的role權限導致的,將Prometheus的clusterrole修改為如下內容即可:
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: prometheus-k8s namespace: monitoring rules: - apiGroups: [""] resources: ["services","pods","endpoints","nodes/metrics"] verbs: ["get", "watch", "list"] - apiGroups: ["extensions"] resources: ["ingresses"] verbs: ["get", "watch", "list"] - nonResourceURLs: ["/metrics"] verbs: ["get", "watch", "list"]
-
在運行"go run main.go"時可能會遇到
rpc error: code = Internal desc = grpc: error unmarshalling request: unexpected EOF
這樣的錯誤,通常因為client和server使用的proto不一致導致的。client端(即main.go)使用的proto文件目錄為go.opentelemetry.io/otel/exporters/otlp/internal/opentelemetry-proto-gen
,而collector使用proto文件目錄為go.opentelemetry.io/collector/internal/data/opentelemetry-proto-gen
,需要比較這兩個目錄下的文件是否一致。如果不一致,則需要根據collector的版本為main.go生成對應的proto文件(或者可以直接更換collector的鏡像,注意使用的otel/opentelemetry-collector的鏡像版本)。在collector的proto目錄下可以看到對應的注釋和使用的proto版本,如下:collector使用的proto git庫為opentelemetry-proto。clone該庫,切換到對應版本后,執行
make gen-go
即可生成對應的文件。Component Maturity Binary Protobuf Encoding collector/metrics/* Alpha collector/trace/* Stable common/* Stable metrics/* Alpha resource/* Stable trace/trace.proto Stable trace/trace_config.proto Alpha JSON encoding All messages Alpha