Scribe日志收集工具

本文轉載自查看原文 2015-02-04 14:22 5977 開源服務器

Scribe日志收集工具

概述

Scribe是facebook開源的日志收集系統，在facebook內部已經得到大量的應用。它能夠從各種日志源上收集日志，存儲到一個中央存儲系統（可以是NFS，分布式文件系統等）上，以便於進行集中統計分析處理。它為日志的“分布式收集，統一處理”提供了一個可擴展的，高容錯的方案。當中央存儲系統的網絡或者機器出現故障時，scribe會將日志轉存到本地或者另一個位置，當中央存儲系統恢復后，scribe會將轉存的日志重新傳輸給中央存儲系統。
scribe的相關資料比較少，主要限於它的主頁（見參考資料1）。此外，它的安裝比較復雜，可參見《scribe日志收集系統安裝方法介紹》。

架構

如上圖所示，Scribe從各種數據源上收集數據，放到一個共享隊列上，然后push到后端的中央存儲系統上。當中央存儲系統出現故障時，scribe可以暫時把日志寫到本地文件中，待中央存儲系統恢復性能后，scribe把本地日志續傳到中央存儲系統上。

需要注意的是，各個數據源須通過thrift向scribe傳輸數據（每條數據記錄包含一個category和一個message）。可以在scribe配置用於監聽端口的thrift線程數（默認為3）。在后端，scribe可以將不同category的數據存放到不同目錄中，以便於進行分別處理。后端的日志存儲方式可以是各種各樣的store，包括file（文件），buffer（雙層存儲，一個主儲存，一個副存儲），network（另一個scribe服務器），bucket（包含多個store，通過hash的將數據存到不同store中），null(忽略數據)，thriftfile（寫到一個Thrift TFileTransport文件中）和multi（把數據同時存放到不同store中）。

scribe的全局配置

global配置	默認值	說明
port	0	監聽端口
max_msg_per_second	10000	每秒處理的最大消息數
max_quque_size	5000000	消息隊列的大小
check_interval	5s	store的檢查頻率
new_thread_per_category	yes	yes的話，會為每個category建立一個線程來處理
num_thrift_server_threads	3	線程數

例如：

port=1463
max_msg_per_second=2000000
max_queue_size=10000000
check_interval=3

store有3種類型：

默認store，處理沒有匹配到任何store的category；配置項：category=default
帶前綴的store，處理所有以指定前綴開頭的category；配置項：category=web*
復合categories，在一個store里面包含多個category；配置項：categories=rock paper* scissors

store配置

store 配置	默認值	說明
category	default	哪些消息被這個store處理，取值范圍：default、、
type		存儲類型，取值范圍：file、buffer、network、bucket、thriftfile、null、multi
max_write_interval	1s	處理消息隊列的時間最小間隔
target_write_size	16K	當消息隊列超過該值時，才進行處理
max_batch_size	1MB	一次處理的數據量
must_succeed	yes	如果一個處理消息失敗，是否重新進入消息隊列排隊，為no時丟棄該消息

例如：

<store>
category=statistics
type=file
target_write_size=20480
max_write_interval=2
</store>

下面介紹不同的store類型：

file

將日志寫到文件或者NFS中。目前支持兩種文件格式，即std和hdfs，分別表示普通文本文件和HDFS。可配置的選項有：

例如：

<store>
category=sprockets
type=file
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
add_newlines=1
rotate_period=daily
rotate_hour=0
rotate_minute=10
max_write_size=4096
</store>

配置解釋：

file store 配置	默認值	說明
file_path	/tmp	文件保存路徑
base_filename	category name
use_hostname_sub_directory	no	為yes的話，使用hostname來創建子目錄
sub_directory		使用指定的名字來創建子目錄
rotate_period	創建新文件的頻率	可以使用"s"、"m"、"h"、"d"、"w"后綴（秒、分、時、天、周）
rotate_hour	1	如果totate_period為d，則取值范圍：0-23
rotate_minute	15	如果totate_period為m，則取值范圍：0-59
max_size	1GB	當文件超過指定大小時進行回滾
write_meta	FALSE	文件回滾時，最后一行包含下一個文件的名字
fs_type	std	取值范圍："std"和"hdfs"
chunk_size	0	數據塊大小，如果消息不超過數據塊容量，就不應該跨chunk存儲
add_newlines	0	為1時，為每個消息增加一個換行
create_symlink	yes	創建一個鏈接，指向最新的一個寫入文件
write_stats	yes	創建一個狀態文件，記錄每個store的寫入情況
max_write_size	1MB	緩沖區大小，超過這個值進行flush。該值不能超過max_size配置項的值

network

network store轉發消息到其他scribe服務器上，scribe以長連接的方式批量轉發消息。

例如：

<store>
category=default
type=network
remote_host=hal
remote_port=1465
</store>

配置解釋：

scribe store 配置	默認值	說明
remote_host		遠程主機地址
remote_port		遠程主機端口
timeout	5000ms	socket超時時間
use_conn_pool	FALSE	是否使用連接池

buffer

buffer stores有兩個子stores，分別為"primary"和"secondary"，當primary store不可用時，才將日志寫入secondary store（只能是File Stores或Null Stores）。當primary store恢復工作時，會從secondary store恢復數據（除非replay_buffer=no）。

例如：

<store>
category=default
type=buffer
buffer_send_rate=1
retry_interval=30
retry_interval_range=10
  <primary>
    type=network
    remote_host=wopr
    remote_port=1456
  </primary>
  <secondary>
    type=file
    file_path=/tmp
    base_filename=thisisoverwritten
    max_size=10000000
  </secondary>
</store>

配置解釋：

buffer store 配置	默認值	說明
buffer_send_rate	1	在一次check_interval中，從secondary讀取多少次消息並發到primary
retry_interval	300s	在寫primary失敗后，指定重試的時間間隔
retry_interval_range	60s	在寫primary失敗后，重試的時間間隔在一個時間范圍內隨機選擇一個
replay_buffer	yes	是否將secondary的消息恢復到primary

null

丟棄指定category的消息；

例如：

<store>
category=tps_report*
type=null
</store>

bucket

bucket stores將每個消息的前綴作為key，並hash到多個文件中。

例如：

<store>
category=bucket_me
type=bucket
num_buckets=2
bucket_type=key_hash
  <bucket0>
    type=file
    fs_type=std
    file_path=/tmp/scribetest/bucket0
    base_filename=bucket0
  </bucket0>
  <bucket1>
    ...
  </bucket1>
  <bucket2>
    ...
  </bucket2>
</store>

配置解釋：

bucket store 配置	默認值	說明
num_buckets	1	hash表的bucket個數
bucket_type		取值范圍：key_hash、key_modulo、random
delimiter	:	識別key的前綴分隔符
remove_key	no	是否刪除每個消息的前綴
bucket_subdir		每個子目錄的名字

multi

multi store將消息同時轉發給多個子sotres（如store0, store1, store2, ...）。

例如：

<store>
category=default
type=multi
target_write_size=20480
max_write_interval=1
  <store0>
    type=file
    file_path=/tmp/store0
  </store0>
  <store1>
    type=file
    file_path=/tmp/store1
  </store1>
</store>

Thriftfile

Thriftfile store與File store類似，只是前者將消息發送給Thrift TFileTransport 文件；

例如：

<store>
category=sprockets
type=thriftfile
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
flush_frequency_ms=2000
</store>

配置解釋：

thriftfile store 配置	默認值	說明
file_path	/tmp	文件保存路徑
base_filename	category name
rotate_period	創建新文件的頻率	可以使用"s"、"m"、"h"、"d"、"w"后綴（秒、分、時、天、周）
rotate_hour	1	如果totate_period為d，則取值范圍：0-23
rotate_minute	15	如果totate_period為m，則取值范圍：0-59
max_size	1GB	當文件超過指定大小時進行回滾
fs_type	std	取值范圍："std"和"hdfs"
chunk_size	0	數據塊大小，如果消息不超過數據塊容量，就不應該跨chunk存儲
create_symlink	yes	創建一個鏈接，指向最新的一個寫入文件
flush_frequency_ms	3000ms	同步Thrift file 到磁盤的頻率
msg_buffer_size	0	非0時，拒絕所有大於該值的寫入

參考文檔：

http://dongxicheng.org/search-engine/scribe-installation/

http://dongxicheng.org/search-engine/scribe-intro/

http://blog.octo.com/en/scribe-a-way-to-aggregate-data-and-why-not-to-directly-fill-the-hdfs/

https://github.com/facebookarchive/scribe/wiki/Scribe-Configuration

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 多種日志收集工具比較開源日志系統比較：scribe、chukwa、kafka、flume 【轉】移動應用崩潰日志收集工具對比 ELK日志收集系統—收集Docker日志 logstash收集springboot日志 elk收集windows日志 EFK收集nginx日志 ELK springboot日志收集使用filebeat收集日志 gc日志收集和分析