詳細講解,請參考官網:https://www.elastic.co/guide/en/beats/filebeat/current/configuring-howto-filebeat.html
filebeat.yml的格式如下,我們主要了解從log中輸入的相應配置
filebeat.inputs: - input_type: log paths: - /var/log/apache/httpd-*.log document_type: apache - input_type: log paths: - /var/log/messages - /var/log/*.log
Filebeat Options
input_type: log
指定輸入類型
paths
支持基本的正則,所有golang glob都支持,支持/var/log/*/*.log
encoding
plain, latin1, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk, hz-gb-2312, euc-kr, euc-jp, iso-2022-jp, shift-jis, and so on exclude_lines
支持正則 排除匹配的行,如果有多行,合並成一個單一行來進行過濾
include_lines
支持正則 include_lines執行完畢之后會執行exclude_lines。
exclude_files
支持正則 排除匹配的文件
exclude_files: ['.gz$']
tags
列表中添加標簽,用過過濾
filebeat.inputs:
- paths: ["/var/log/app/*.json"]
tags: ["json"]
fields
可選字段,選擇額外的字段進行輸出
可以是標量值,元組,字典等嵌套類型
默認在sub-dictionary 位置
filebeat.inputs:
- paths: ["/var/log/app/*.log"]
fields:
app_id: query_engine_12
fields_under_root
如果值為ture,那么fields存儲在輸出文檔的頂級位置
如果與filebeat中字段沖突,自定義字段會覆蓋其他字段
fields_under_root: true fields: instance_id: i-10a64379 region: us-east-1 ignore_older
可以指定Filebeat忽略指定時間段以外修改的日志內容
文件被忽略之前,確保文件不在被讀取,必須設置ignore older時間范圍大於close_inactive
如果一個文件正在讀取時候被設置忽略,它會取得到close_inactive后關閉文件,然后文件被忽略
close_*
close_ *配置選項用於在特定標准或時間之后關閉harvester。 關閉harvester意味着關閉文件處理程序。 如果在harvester關閉后文件被更新,則在scan_frequency過后,文件將被重新拾取。 但是,如果在harvester關閉時移動或刪除文件,Filebeat將無法再次接收文件,並且harvester未讀取的任何數據都將丟失。
close_inactive
啟動選項時,如果在制定時間沒有被讀取,將關閉文件句柄
讀取的最后一條日志定義為下一次讀取的起始點,而不是基於文件的修改時間
如果關閉的文件發生變化,一個新的harverster將在scan_frequency運行后被啟動
建議至少設置一個大於讀取日志頻率的值,配置多個prospector來實現針對不同更新速度的日志文件
使用內部時間戳機制,來反映記錄日志的讀取,每次讀取到最后一行日志時開始倒計時
使用2h 5m 來表示
recursive_glob.enabled 遞歸匹配日志文件,默認false
close_rename
當選項啟動,如果文件被重命名和移動,filebeat關閉文件的處理讀取
close_removed
當選項啟動,文件被刪除時,filebeat關閉文件的處理讀取
這個選項啟動后,必須啟動clean_removed
close_eof
適合只寫一次日志的文件,然后filebeat關閉文件的處理讀取
close_timeout
當選項啟動時,filebeat會給每個harvester設置預定義時間,不管這個文件是否被讀取,達到設定時間后,將被關閉
close_timeout 不能等於ignore_older,會導致文件更新時,不會被讀取
如果output一直沒有輸出日志事件,這個timeout是不會被啟動的,至少要要有一個事件發送,然后haverter將被關閉
設置0 表示不啟動
clean_inactived
從注冊表文件中刪除先前收獲的文件的狀態
設置必須大於ignore_older+scan_frequency,以確保在文件仍在收集時沒有刪除任何狀態
配置選項有助於減小注冊表文件的大小,特別是如果每天都生成大量的新文件
此配置選項也可用於防止在Linux上重用inode的Filebeat問題
clean_removed
啟動選項后,如果文件在磁盤上找不到,將從注冊表中清除filebeat
如果關閉close removed 必須關閉clean removed
scan_frequency
prospector 檢查指定用於收獲的路徑中的新文件的頻率,默認10s
document_type
類型事件,被用於設置輸出文檔的type字段,默認是log
harvester_buffer_size
每次harvester讀取文件緩沖字節數,默認是16384
max_bytes
對於多行日志信息,很有用,最大字節數
json
這些選項使Filebeat解碼日志結構化為JSON消息
逐行進行解碼json
keys_under_root
設置key為輸出文檔的頂級目錄
overwrite_keys
覆蓋其他字段
add_error_key
定一個json_error
message_key
指定json 關鍵建作為過濾和多行設置,與之關聯的值必須是string
multiline
控制filebeat如何處理跨多行日志的選項,多行日志通常發生在java堆棧中
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
上面匹配是將多行日志所有不是以[符號開頭的行合並成一行它可以將下面的多行日志進行合並成一行
Exception in thread "main" java.lang.NullPointerException at com.example.myproject.Book.getTitle(Book.java:16) at com.example.myproject.Author.getBookTitles(Author.java:25) at com.example.myproject.Bootstrap.main(Bootstrap.java:14) multiline.pattern
指定匹配的正則表達式,filebeat支持的regexp模式與logstash支持的模式有所不同
pattern regexp
multiline.negate
定義上面的模式匹配條件的動作是 否定的,默認是false
假如模式匹配條件'^b',默認是false模式,表示講按照模式匹配進行匹配 將不是以b開頭的日志行進行合並
如果是true,表示將不以b開頭的日志行進行合並
multiline.match
指定Filebeat如何將匹配行組合成事件,在之前或者之后,取決於上面所指定的negate
multiline.max_lines
可以組合成一個事件的最大行數,超過將丟棄,默認500
multiline.timeout
定義超時時間,如果開始一個新的事件在超時時間內沒有發現匹配,也將發送日志,默認是5s
tail_files
如果此選項設置為true,Filebeat將在每個文件的末尾開始讀取新文件,而不是開頭
此選項適用於Filebeat尚未處理的文件
symlinks
符號鏈接選項允許Filebeat除常規文件外,可以收集符號鏈接。收集符號鏈接時,即使報告了符號鏈接的路徑,Filebeat也會打開並讀取原始文件。
backoff
backoff選項指定Filebeat如何積極地抓取新文件進行更新。默認1s
backoff選項定義Filebeat在達到EOF之后再次檢查文件之間等待的時間。
max_backoff
在達到EOF之后再次檢查文件之前Filebeat等待的最長時間
backoff_factor
指定backoff嘗試等待時間幾次,默認是2
harvester_limit
harvester_limit選項限制一個prospector並行啟動的harvester數量,直接影響文件打開數
enabled
控制prospector的啟動和關閉
filebeat global
spool_size
事件發送的閥值,超過閥值,強制刷新網絡連接
filebeat.spool_size: 2048
publish_async
異步發送事件,實驗性功能
idle_timeout
事件發送的超時時間,即使沒有超過閥值,也會強制刷新網絡連接
filebeat.idle_timeout: 5s
registry_file
注冊表文件的名稱,如果使用相對路徑,則被認為是相對於數據路徑
有關詳細信息,請參閱目錄布局部分 默認值為${path.data}/registry
filebeat.registry_file: registry
config_dir
包含額外的prospector配置文件的目錄的完整路徑
每個配置文件必須以.yml結尾
每個配置文件也必須指定完整的Filebeat配置層次結構,即使只處理文件的prospector部分。
所有全局選項(如spool_size)將被忽略
必須是絕對路徑
filebeat.config_dir: path/to/configs
shutdown_timeout
Filebeat等待發布者在Filebeat關閉之前完成發送事件的時間。
Filebeat General
name
設置名字,如果配置為空,則用該服務器的主機名
name: "my-shipper"
queue_size
單個事件內部隊列的長度 默認1000
bulk_queue_size
批量事件內部隊列的長度
max_procs
設置最大使用cpu數量
geoip.paths
此配置選項目前僅由Packetbeat使用,它將在6.0版中刪除
要使GeoIP支持功能正常,GeoLite City數據庫是必需的。
geoip: paths: - "/usr/share/GeoIP/GeoLiteCity.dat" - "/usr/local/var/GeoIP/GeoLiteCity.dat"
Filebeat reload
屬於測試功能
path
定義要檢查的配置路徑
reload.enabled
設置為true時,啟用動態配置重新加載。
reload.period
定義要檢查的間隔時間
filebeat.config.inputs: path: configs/*.yml reload.enabled: true reload.period: 10s
一般配置:
###################### Filebeat Configuration Example ######################### # This file is an example configuration file highlighting only the most common # options. The filebeat.reference.yml file from the same directory contains all the # supported options with more comments. You can use it as a reference. # # You can find the full configuration reference here: # https://www.elastic.co/guide/en/beats/filebeat/index.html # For more available modules and options, please see the filebeat.reference.yml sample # configuration file. #=========================== Filebeat inputs ============================= #=========================== Filebeat 輸入配置 =========================== filebeat.inputs: # Each - is an input. Most options can be set at the input level, so # you can use different inputs for various configurations. # Below are the input specific configurations. # 輸入filebeat的類型,包括log(具體路徑的日志),stdin(鍵盤輸入),redis,udp,docker,tcp,syslog,可以同時配置多個(包括相同類型的) # 具體的每種類型的配置信息可以通過官網:https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html 了解 - type: log # Change to true to enable this input configuration. # 配置是否生效 enabled: true # Paths that should be crawled and fetched. Glob based paths. # 指定要監控的日志,可以指定具體得文件或者目錄 paths: #- /var/log/*.log (這是默認的,自行可以修改) - /usr/local/tomcat/logs/catalina.out # Exclude lines. A list of regular expressions to match. It drops the lines that are # matching any regular expression from the list. # 在輸入中排除符合正則表達式列表的那些行。 #exclude_lines: ['^DBG'] # Include lines. A list of regular expressions to match. It exports the lines that are # matching any regular expression from the list. # 包含輸入中符合正則表達式列表的那些行(默認包含所有行),include_lines執行完畢之后會執行exclude_lines #include_lines: ['^ERR', '^WARN'] # Exclude files. A list of regular expressions to match. Filebeat drops the files that # are matching any regular expression from the list. By default, no files are dropped. # 忽略掉符合正則表達式列表的文件 #exclude_files: ['.gz$'] # Optional additional fields. These fields can be freely picked # to add additional information to the crawled log files for filtering # 向輸出的每一條日志添加額外的信息,比如“level:debug”,方便后續對日志進行分組統計。 # 默認情況下,會在輸出信息的fields子目錄下以指定的新增fields建立子目錄,例如fields.level # 這個得意思就是會在es中多添加一個字段,格式為 "filelds":{"level":"debug"} #fields: # level: debug # review: 1 # module: mock ### Multiline options ### 日志中經常會出現多行日志在邏輯上屬於同一條日志的情況,所以需要multiline參數來詳細闡述。 # Multiline can be used for log messages spanning multiple lines. This is common # for Java Stack Traces or C-Line Continuation # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [ # 多行匹配正則表達式,比如:用空格開頭(^[[:space:]]),或者是否以[開頭(^\[)。正則表達式是非常復雜的,詳細見filebeat的正則表達式官方鏈接:https://www.elastic.co/guide/en/beats/filebeat/current/regexp-support.html multiline.pattern: ^\[ # Defines if the pattern set under pattern should be negated or not. Default is false. # 該參數意思是是否否定多行融入。 #multiline.negate: false # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern # that was (not) matched before or after or as long as a pattern is not matched based on negate. # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash # 取值為after或before。該值與上面的pattern與negate值配合使用 # ---------------------------------------------------------------------------------------------------- #|multiline.pattern|multiline.negate|multiline.match| 結論 | # ---------------------------------------------------------------------------------------------------- #| true | true | before |表示匹配行是結尾,和前面不匹配的組成一行完整的日志| # ---------------------------------------------------------------------------------------------------- #| true | true | after |表示匹配行是開頭,和后面不匹配的組成一行完整的日志| # ---------------------------------------------------------------------------------------------------- #| true | false | before |表示匹配的行和后面不匹配的一行組成一行完整的日志 | # ---------------------------------------------------------------------------------------------------- #| true | false | after |表示匹配的行和前面不匹配的一行組成一行完整的日志 | # ---------------------------------------------------------------------------------------------------- multiline.match: after # Specifies a regular expression, in which the current multiline will be flushed from memory, ending the multiline-message. # 表示符合該正則表達式的,將從內存刷入硬盤。 #multiline.flush_pattern # The maximum number of lines that can be combined into one event. # If the multiline message contains more than max_lines, any additional lines are discarded. The default is 500. # 表示如果多行信息的行數超過該數字,則多余的都會被丟棄。默認值為500行 #multiline.max_lines: 500 # After the specified timeout, Filebeat sends the multiline event even if no new pattern is found to start a new event. The default is 5s. # 表示超過timeout的時間(秒)還沒有新的一行日志產生,則自動結束當前的多行、形成一條日志發出去 #multiline.timeout: 5 #============================= Filebeat modules =============================== # 引入filebeat的module配置 filebeat.config.modules: # Glob pattern for configuration loading path: ${path.config}/modules.d/*.yml # Set to true to enable config reloading # 是否允許重新加載 reload.enabled: false # Period on which files under path should be checked for changes # 重新加載的時間間隔 #reload.period: 10s #==================== Elasticsearch template setting ========================== # Elasticsearch模板配置 setup.template.settings: # 數據分片數 index.number_of_shards: 3 # 數據分片備份數 #index.number_of_replicas: 1 #index.codec: best_compression #_source.enabled: false #================================ General ===================================== # The name of the shipper that publishes the network data. It can be used to group # all the transactions sent by a single shipper in the web interface. # 設置filebeat的名字,如果配置為空,則用該服務器的主機名 #name: # The tags of the shipper are included in their own field with each # transaction published. # 額外添加的tag標簽 #tags: ["service-X", "web-tier"] # Optional fields that you can specify to add additional information to the # output. # 額外添加的字段和值 #fields: # env: staging #============================== Dashboards ===================================== # dashboards的相關配置 # These settings control loading the sample dashboards to the Kibana index. Loading # the dashboards is disabled by default and can be enabled either by setting the # options here, or by using the `-setup` CLI flag or the `setup` command. # 是否啟用儀表盤 #setup.dashboards.enabled: false # The URL from where to download the dashboards archive. By default this URL # has a value which is computed based on the Beat name and version. For released # versions, this URL points to the dashboard archive on the artifacts.elastic.co # website. # 儀表盤地址 #setup.dashboards.url: #============================== Kibana ===================================== # kibana的相關配置 # Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API. # This requires a Kibana endpoint configuration. setup.kibana: # Kibana Host # Scheme and port can be left out and will be set to the default (http and 5601) # In case you specify and additional path, the scheme is required: http://localhost:5601/path # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601 # kibana地址 #host: "localhost:5601" # Kibana Space ID # ID of the Kibana Space into which the dashboards should be loaded. By default, # the Default Space will be used. # kibana的空間ID #space.id: #============================= Elastic Cloud ================================== # These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/). # The cloud.id setting overwrites the `output.elasticsearch.hosts` and # `setup.kibana.host` options. # You can find the `cloud.id` in the Elastic Cloud web UI. #cloud.id: # The cloud.auth setting overwrites the `output.elasticsearch.username` and # `output.elasticsearch.password` settings. The format is `<user>:<pass>`. #cloud.auth: #================================ Outputs ===================================== # 輸出配置 # Configure what output to use when sending the data collected by the beat. #-------------------------- Elasticsearch output ------------------------------ # 輸出到es #output.elasticsearch: # Array of hosts to connect to. # ES地址 # hosts: ["localhost:9200"] # ES索引 # index: "filebeat-%{[beat.version]}-%{+yyyy.MM.dd}" # Optional protocol and basic auth credentials. # 協議 #protocol: "https" # ES用戶名 #username: "elastic" # ES密碼 #password: "changeme" #----------------------------- Logstash output -------------------------------- # 輸出到logstash output.logstash: # The Logstash hosts # logstash地址 hosts: ["localhost:5044"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"] # Certificate for SSL client authentication #ssl.certificate: "/etc/pki/client/cert.pem" # Client Certificate Key #ssl.key: "/etc/pki/client/cert.key" #================================ Procesors ===================================== # Configure processors to enhance or manipulate events generated by the beat. processors: #主機相關 信息 - add_host_metadata: ~ # 雲服務器的元數據信息,包括阿里雲ECS 騰訊雲QCloud AWS的EC2的相關信息 - add_cloud_metadata: ~ #k8s元數據采集 #- add_kubernetes_metadata: ~ # docker元數據采集 #- add_docker_metadata: ~ # 執行進程的相關數據 #- - add_process_metadata: ~ #================================ Logging ===================================== # Sets log level. The default log level is info. # Available log levels are: error, warning, info, debug #logging.level: debug # At debug level, you can selectively enable logging only for some components. # To enable all selectors use ["*"]. Examples of other selectors are "beat", # "publish", "service". #logging.selectors: ["*"] #============================== Xpack Monitoring =============================== # filebeat can export internal metrics to a central Elasticsearch monitoring # cluster. This requires xpack monitoring to be enabled in Elasticsearch. The # reporting is disabled by default. # Set to true to enable the monitoring reporter. #xpack.monitoring.enabled: false # Uncomment to send the metrics to Elasticsearch. Most settings from the # Elasticsearch output are accepted here as well. Any setting that is not set is # automatically inherited from the Elasticsearch output configuration, so if you # have the Elasticsearch output configured, you can simply uncomment the # following line. #xpack.monitoring.elasticsearch: