Azure Data Explorer 日志分析解決方案


        很多小伙伴在日常日志分析方面都在使用 ES (ElasticSearch),但微軟雲 Azure 上沒有原生的托管的 ES PaaS 服務。其實微軟雲上 Azure Data Explorer 是一款非常適合做日志分析的服務,微軟雲原生的 LogAnalytics 服務就是基於 Azure Data Explorer 服務構建的,相較於數據源的靈活性,LogAnalytics 對於微軟雲原生服務作為數據源兼容性更好,對於希望自己靈活構建數據源的小伙班這里我推薦大家使用 Azure Data Explorer 服務,它可以平滑的將你現有的 ELK 技術棧繼承過來,它原生支持與 LogStash 的集成,從而可以將各種日志數據源采集到 Azure Data Explorer 中進行分析查詢。除此之外 Azure Data Explorer 作為 PaaS 托管服務用戶無需關心底層 Infra 的基礎構建,並且可以實現平滑的水平擴展滿足性能的要求。

        本文以 Nginx 的 Access Log 為例,為大家介紹如何通過 Filebeat + Logstash + Azure Data Explorer 來實現日志收集和分析。架構圖參考如下:

 

         Nginx 本文示例中采用 1.16.1 版本,從1.11.8 版本開始 Nginx Access Log 已經原生支持 Json 格式,大家可以參考如下 Nginx 配置文件,對 Access Log 格式進行定義,創建 nginx-log-json.conf

log_format json escape=json '{ '
 '"remote_ip": "$remote_addr", '
 '"user_name": "$remote_user", '
 '"time": "$time_iso8601", '
 '"method": "$request_method", '
 '"nginxhostname": "$host", '
 '"url": "$request_uri", '
 '"http_protocol": "$server_protocol", '
 '"response_code": "$status", '
 '"bytes": "$body_bytes_sent", '
 '"referrer": "$http_referer", '
 '"user_agent": "$http_user_agent" '
'}';

        在 nginx.conf 中引用上述配置,在 http 配置章節,引用上述定義,並定義 Access Log 輸出路徑及格式。

http {

        # Other Config Setting

        ##
        # Logging Settings
        ##

        include nginx-log-json.conf;
        access_log /var/log/nginx/access.json json;
        error_log /var/log/nginx/error.log;

        # Other Config Setting
}

        配置 Filebeat 從 Nginx Access Log 路徑收集日志,並將日志輸入 LogStash 進行 Transform,Filebeat 配置文件如下,其中將 Log 定義中的 path 修改為 Nginx 的 Access Log 文件路徑,配置 LogStash 作為輸出,本示例中 Logstash 與 Nginx 在相同主機上所以 Logstash 地址為 localhost,如果大家在真實部署時 Logstash 獨立部署的話,可以按照實際地址進行修改。

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths: - /var/log/nginx/*.json
    #- c:\programdata\elasticsearch\logs\*

  tags: ["nginx", "json"]

  json:
    keys_under_root: true
    add_error_key: true
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

#setup.template.settings:
#  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
#setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
#  hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts hosts: ["localhost:5044"] 
  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

#processors:
#  - add_host_metadata: ~
#  - add_cloud_metadata: ~

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== X-Pack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

#================================= Migration ==================================

        配置 Logstash 進行日志信息的 Transform 並將結果數據輸出至 Azure Data Explorer,Logstash 示例配置文件參考如下,其中 input 定義從 Filebeat 讀取日志,filer 中通過 grok 從 nginx 訪問日志中的 http_protocol 輸出中提取 http 版本號碼,通過 geoip 插件和 useragent 插件對 nginx 原生日志中的字段進行 transform。output 部分定義 Azure Data Explorer 作為接收方,其中 ingest_url, app_id, app_key, app_tenant, database, table, mapping 字段按照所創建的 azure data explorer 內的信息進行定義。默認 Logstash 中不包含 Azure data explorer 插件,請參考如下文檔安裝:https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-logstash

input {
    beats {
        port => "5044"
        codec => json
    }
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
    if "nginx" in [tags] {
        # nginx doesn't log the http version, only the protocol.
        # i.e. HTTP/1.1, HTTP/2
        grok {
            match => {
                "[http_protocol]" => "HTTP/%{NUMBER:[http_version]}"
            }
        }
        geoip {
            source => "[remote_ip]"
            target => "[geoip]"
        }
        useragent {
            source => "[user_agent]"
            target => "user_agent_info"
        }

    }
}
output {
    kusto {
            path => "/tmp/kusto/%{+YYYY-MM-dd-HH-mm-ss}.txt"
            ingest_url => "https://ingest-xxx.westus2.kusto.windows.net/"
            app_id => "xxx"  # azure management application identity
            app_key => "xxx"  # azure management application identity password
            app_tenant => "xxx"  # azure tenant id
            database => "nginx"  # database name defined in ADX
            table => "nginxlogs" # table name defined in ADX 
            mapping => "basicmsg" # table mapping schema defined in ADX
    }
}

        配置 Azure Data Explorer,創建過程這里不再贅述,大家可以自行查閱文檔。這里主要為大家介紹上述所引用的 Table 和 Mapping 如何創建,其中 table 是最終在 ADX 中存儲 nginx access log 的表,所以需要按照字段類型定義 Schema,Mapping 定義從 LogStash進來的 Json 日志字段如何映射到 ADX 的 Log Table 中。

--- 創建 Table

.create table nginxlogs (remote_ip: string, username: string, accesstime: datetime, method: string, response_code: int, url: string, http_protocol: string, http_version: string, bodybyte: int, referrer: string, user_agent_info: dynamic, geoip: dynamic)

-- 創建 Mapping

.create table nginxlogs ingestion json mapping 'basicmsg' '[{"column":"remote_ip","path":"$.remote_ip"},{"column":"username","path":"$.username"},{"column":"accesstime","path":"$.time"},{"column":"method","path":"$.method"},{"column":"response_code","path":"$.response_code"},{"column":"url","path":"$.url"},{"column":"http_protocol","path":"$.http_protocol"},{"column":"http_version","path":"$.http_version"},{"column":"bodybyte","path":"$.bytes"},{"column":"referrer","path":"$.referrer"},{"column":"user_agent_info","path":"$.user_agent_info"},{"column":"geoip","path":"$.geoip"}]'

        配置完成,我們通過下面簡單的 ADX 的 KQL 語言來查詢一下所收到的日志

nginxlogs
| sort by accesstime desc | take 10

 

         至此 ADX 日志分析引擎中的日志采集水線已經打通,大家可以通過 ADX 提供的 KQL 查詢語言放飛自我了,今天先寫到這里下一篇 Blog 我再給大家用 KQL 例舉幾個簡單的查詢示例。 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM