Loki介紹
簡介
Like Prometheus, but for log.
Grafana Loki是Grafana針對日志相關功能開發的組件。說到日志,就不得不提到ELK。這一套下來,成本還是蠻大的。它會引入較多的維護量。
Loki僅索引關於日志的元數據,例如:日志文件的位置、配置等。而日志數據本身並不會進行索引處理。而這些元數據用Label來標識,就像Prometheus一樣。日志數據本身被壓縮存儲在對象存儲、或者本地文件系統中。因為僅索引元數據以及采用高度文件壓縮方式,Loki非常地輕量級,使用成本也很低。
Loki架構
上圖表名了,Grafana loki有兩種運行模式:
- 單進程模式
- 橫向擴展模式
我們需要Loki中的有以下重要組件
- Distributor
- Ingester
- [可選] Query frontend
- Querier
- Chunk Store
Distributor(分發器)
Distributor負責處理客戶端請求,也就是將日志數據寫入到指定路徑之前,先要經過Distributor。Distributor會驗證客戶端輸入流的合法性和正確性,確保傳入的用戶是合法的租戶。然后將每一個chunk(塊)拆分成多個批次,並行發送到多個ingester(攝取器中)
Ingester(攝取器)
Ingester攝取器負責將接收到的日志數據寫入到 Storage Backend中(可以是S3、Cassandra或者本地文件系統中),並負責將日志數據讀取到內存中供外部查詢。
[可選]Query fronted(查詢前端)
查詢前端是可選的組件,它提供了一個日志數據查詢的API端口。如果Loki部署了Query fronted組件,那么客戶端會將查詢發送給Query fronted,而不發給Querier(查詢器)。而查詢前端還是要將查詢交給Querier(查詢器)執行查詢的。
Querier(查詢器)
Querier組件用於接收LogQL語言進行日志數據查詢,它會同時從Ingester組件以及后端存儲中查詢日志。
Chunk Store(塊存儲)
Chunk Store是用於長期存儲Loki日志的存儲,並要求塊存儲能夠支持交互式查詢和持續寫入。Loki支持的chunk store有:
- Amazon DynamoDB
- Google BigTable
- Apache Cassandra
- Amazon S3
- Google Cloud Storage
Chunk Store並不是單獨的服務,而是以庫的形式提供(在Ingester和Querier使用到)。
讀寫流程
讀流程
-
Querier接收HTTP查詢請求
-
Querier將查詢傳遞給Ingester組件,並先在內存中查詢日志數據
- 如果找到,Ingester返回數據給Querier
- 如果沒有找到,Querier將會從Chunk Storage中加載數據並返回
-
Querier對重復數據進行過濾處理。
寫流程
- Distributor接收HTTP寫數據請求,並將寫數據流發送到對應的Ingester組件、以及對應的副本組件。
- Ingester將接收到日志數據創建一個新的chunk或者增加到現有的chunk中。
- Distributor返回ACK。
安裝Loki
要安裝Loki,必須要同時安裝Promtail以及Loki。
- Loki是一個日志處理引擎
- Promtail將日志發送到Loki
下載Promtail和Loki
https://github.com/grafana/loki/releases/
下載V1.6.1版本
創建Loki用戶
useradd loki
passwd loki
上傳並解壓
[loki@ha-node1 ~]$ ll
總用量 16224
-rw-r--r-- 1 loki loki 16612734 3月 11 11:39 loki-linux-amd64.zip
mkdir /opt/loki
unzip loki-linux-amd64.zip -d /opt/loki
loki-local-config.yaml配置文件
vim /opt/loki/loki-local-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
ingester:
lifecycler:
address: ha-node1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-03-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
limits_config:
ingestion_rate_mb: 4000
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 720h
配置文件請參考:
https://grafana.com/docs/loki/latest/configuration/#configuration-file-reference
運行Loki
su loki
mkdir -p /opt/loki/logs
nohup /opt/loki/loki-linux-amd64 -config.file=/opt/loki/loki-local-config.yaml >> /opt/loki/logs/$(groups)-$(whoami)-loki-$(hostname).log 2>&1 &
查看Loki
http://ha-node1:3100/metrics
配置節點日志采集
創建日志采集用戶
ssh ha-node1 "adduser -g hadoop promtail"; \
ssh ha-node2 "adduser -g hadoop promtail"; \
ssh ha-node3 "adduser -g hadoop promtail"; \
ssh ha-node4 "adduser -g hadoop promtail"; \
ssh ha-node5 "adduser -g hadoop promtail"
上傳並解壓
su promtail
[promtail@ha-node1 ~]$ ll
總用量 18244
-rw-r--r-- 1 promtail hadoop 18679510 3月 11 11:39 promtail-linux-amd64.zip
mkdir -p /opt/promtail
# 解壓
unzip promtail-linux-amd64.zip -d /opt/promtail
配置
以下是一份配置模板:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
其中,scrape_configs中配置將指定目錄的日志發送到 http://localhost:3100/metrics。配置說明如下:
# 從不同的組中采集日志
- job_name: hadoop
static_configs:
# 可選的
- targets:
- hdfs
# 自定義標簽(job是一個自定義標簽,比較好的是環境名稱、JOB名稱、或者應用名稱
labels:
cluster: hdfs
service: namenode
instance: hadoop1
job: hadoop
# 配置要將什么位置的日志發送到Loki
__path__: "C:/Program Files/GrafanaLabs/grafana/data/log/grafana.log"
在節點配置如下
vim /opt/promtail/promtail-local-config.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /opt/promtail/positions.yaml
clients:
- url: http://ha-node1:3100/loki/api/v1/push
scrape_configs:
- job_name: prometail-ha-node1
static_configs:
- targets:
- localhost
labels:
cluster: hdfs
service: namenode
__path__: /opt/hadoop/logs/*-hdfs-namenode-*.log
- targets: # hadoop(hdfs)
- localhost
labels:
cluster: hdfs
service: datanode
__path__: /opt/hadoop/logs/*-hdfs-datanode-*.log
- targets:
- localhost
labels:
cluster: hdfs
service: zkfc
__path__: /opt/hadoop/logs/*-hdfs-zkfc-*.log
- targets:
- localhost
labels:
cluster: hdfs
service: journalnode
__path__: /opt/hadoop/logs/*-hdfs-journalnode-*.log
- targets:
- localhost
labels:
cluster: hdfs
service: httpfs
__path__: /opt/hadoop/logs/*-hdfs-httpfs-*.log
- targets: # hadoop(yarn)
- localhost
labels:
cluster: yarn
service: historyserver
__path__: /opt/hadoop/logs/*-yarn-historyserver-*.log
- targets:
- localhost
labels:
cluster: yarn
service: resourcemanager
__path__: /opt/hadoop/logs/*-yarn-resourcemanager-*.log
- targets:
- localhost
labels:
cluster: yarn
service: nodemanager
__path__: /opt/hadoop/logs/*-yarn-nodemanager-*.log
- targets: # zookeeper
- localhost
labels:
cluster: zookeeper
service: zookeeper
__path__: /opt/zookeeper/logs/*-zookeeper-server-*.out
- targets: # hive
- localhost
labels:
cluster: hive
service: hive
__path__: /opt/hive/logs/hive.log.*
- targets: # hbase
- localhost
labels:
cluster: hbase
service: master
__path__: /opt/hbase/logs/*-hbase-master-*.log
- targets:
- localhost
labels:
cluster: hbase
service: regionserver
__path__: /opt/hbase/logs/*-hbase-regionserver-*.log
- targets: # spark
- localhost
labels:
cluster: spark
service: historyserver
__path__: /opt/spark/logs/*-spark-org.apache.spark.deploy.history.HistoryServer-*.out
- targets:
- localhost
labels:
cluster: spark
service: thriftserver
__path__: /opt/spark/logs/*-spark-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2*.out
- targets: # loki
- localhost
labels:
cluster: loki
service: loki
__path__: /opt/loki/logs/*-loki-*.log
- targets: # promtail
- localhost
labels:
cluster: promtail
service: promtail
__path__: /opt/promtail/logs/*-promtail-*.log
創建日志目錄
ssh ha-node1 "su - promtail -c 'mkdir -p /opt/promtail/logs'"; \
ssh ha-node1 "su - promtail -c 'mkdir -p /opt/promtail/logs'"; \
ssh ha-node1 "su - promtail -c 'mkdir -p /opt/promtail/logs'"; \
ssh ha-node1 "su - promtail -c 'mkdir -p /opt/promtail/logs'"; \
ssh ha-node1 "su - promtail -c 'mkdir -p /opt/promtail/logs'"
分發到所有節點
scp -r /opt/promtail ha-node1:/opt; \
scp -r /opt/promtail ha-node2:/opt; \
scp -r /opt/promtail ha-node3:/opt; \
scp -r /opt/promtail ha-node4:/opt; \
scp -r /opt/promtail ha-node5:/opt
# 修改目錄權限
ssh ha-node2 "chown -R promtail:hadoop /opt/promtail"; \
ssh ha-node3 "chown -R promtail:hadoop /opt/promtail"; \
ssh ha-node4 "chown -R promtail:hadoop /opt/promtail"; \
ssh ha-node5 "chown -R promtail:hadoop /opt/promtail"
配置參考: https://grafana.com/docs/loki/latest/clients/promtail/configuration/
處理特殊日志
需要單獨處理spark thrift server以及history server日志,將第一行的執行命令刪除。
啟動
su promtail
nohup /opt/promtail/promtail-linux-amd64 -config.file=/opt/promtail/promtail-local-config.yaml -client.external-labels=platform=hadoop-ha,host=$(hostname) >> /opt/promtail/logs/$(groups)-$(whoami)-promtail-$(hostname).log 2>&1 &
查看promtail webui
Grafana展示日志
配置Loki數據源
統一日志查詢
直接使用Grafana的Explorer功能即可查看日志。