Flume概述及安裝部署

本文轉載自查看原文 2021-02-06 10:53 286 Flume

一、Flume概述

1.1、Flume定義

官方網站：http://flume.apache.org/

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Flume 是 Cloudera 提供的一個高可用的，高可靠的，分布式的海量日志采集、聚合和傳輸的系統。Flume 基於流式架構，靈活簡單。

Flume最主要的作用就是實時讀取服務器本地磁盤的數據，將數據寫入到HDFS。

1.2、Flume基礎架構

1）Agent

Agent 是一個 JVM 進程，它以事件的形式將數據從源頭送至目的。

Agent 主要有 3 個部分組成，Source、Channel、Sink。

2）Source

Source 是負責接收數據到 Flume Agent 的組件。Source 組件可以處理各種類型、各種格式的日志數據，包括 avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。

3）Sink

Sink 不斷地輪詢 Channel 中的事件且批量地移除它們，並將這些事件批量寫入到存儲或索引系統、或者被發送到另一個 Flume Agent。 Sink 組件目的地包括 hdfs、logger、avro、thrift、ipc、file、HBase、solr、自定義。

4）Channel

Channel 是位於 Source 和 Sink 之間的緩沖區。因此，Channel 允許 Source 和 Sink 運作在不同的速率上。Channel 是線程安全的，可以同時處理幾個 Source 的寫入操作和幾個Sink 的讀取操作。

Flume 自帶兩種 Channel：Memory Channel 和 File Channel 以及 Kafka Channel。Memory Channel 是內存中的隊列。Memory Channel 在不需要關心數據丟失的情景下適用。如果需要關心數據丟失，那么 Memory Channel 就不應該使用，因為程序死亡、機器宕機或者重啟都會導致數據丟失。File Channel 將所有事件寫到磁盤。因此在程序關閉或機器宕機的情況下不會丟失數據。

5）Event

傳輸單元，Flume 數據傳輸的基本單元，以 Event 的形式將數據從源頭送至目的地。 Event 由 Header 和 Body 兩部分組成，Header 用來存放該 event 的一些屬性，為 K-V 結構， Body 用來存放該條數據，形式為字節數組。

二、Flume安裝部署

2.1、安裝地址

1） Flume 官網地址

http://flume.apache.org/

2）文檔查看地址

http://flume.apache.org/FlumeUserGuide.html

3）下載地址

http://archive.apache.org/dist/flume/

2.2、安裝部署

1）將 apache-flume-1.7.0-bin.tar.gz 上傳到 linux 的/opt/software 目錄下

2）解壓 apache-flume-1.7.0-bin.tar.gz 到/opt/module/目錄下
[hadoop@hadoop102 software]$ tar -zxf apache-flume-1.7.0-bin.tar.gz -C /opt/module/

3）修改 apache-flume-1.7.0-bin 的名稱為 flume
[hadoop@hadoop102 module]$ mv apache-flume-1.7.0-bin flume

4）將 flume/conf 下的 flume-env.sh.template 文件修改為 flume-env.sh，並配置 flumeenv.sh 文件
[hadoop@hadoop102 conf]$ mv flume-env.sh.template flume-env.sh
[hadoop@hadoop102 conf]$ vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

5）配置環境變量

[hadoop@hadoop102 ~]$ sudo vim /etc/profile
#FLUME
export FLUME_HOME=/opt/module/flume
export PATH=$PATH:$FLUME_HOME/bin

[hadoop@hadoop102 ~]$ source /etc/profile

6）查看版本

[hadoop@hadoop102 ~]$ flume-ng version
Flume 1.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523

三、Flume入門案例

3.1、監控端口數據官方案例

3.1.1、案例需求

使用 Flume 監聽一個端口，收集該端口數據，並打印到控制台。

3.1.2、需求分析

3.1.3、實現步驟

1）安裝 netcat 工具並判斷 44444 端口是否被占用

[hadoop@hadoop102 ~]$ sudo yum install -y nc
[hadoop@hadoop102 ~]$ sudo netstat -tunlp | grep 44444

2）創建 Flume Agent 配置文件 flume-netcat-logger.conf

[hadoop@hadoop102 flume]$ mkdir job && cd job
[hadoop@hadoop102 job]$ vim flume-netcat-logger.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

配置文件說明及來源：http://flume.apache.org/releases/content/1.7.0/FlumeUserGuide.html

3）開啟 flume 監聽端口

[hadoop@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console
#或者
[hadoop@hadoop102 flume]$ bin/flume-ng agent -c conf/ -n a1 -f job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

參數說明：

--conf/-c：表示配置文件存儲在 conf/目錄
--name/-n：表示給 agent 起名為 a1
--conf-file/-f：flume 本次啟動讀取的配置文件是在 job 文件夾下的 flume-telnet.conf
-Dflume.root.logger=INFO,console ：-D 表示 flume 運行時動態修改 flume.root.logger參數屬性值，並將控制台日志打印級別設置為 INFO 級別。日志級別包括:log、info、warn、error。

4）使用 netcat 工具向本機的 44444 端口發送內容，在 Flume 監聽頁面觀察接收數據情況

注意：不能使用nc hadoop102 44444

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Kafka概述及安裝部署 Nacos概述及安裝 Kafak概述及集群部署 Flask 學習（一）概述及安裝 zabbix 監控系統概述及部署 ENVI/SARscape軟件概述及安裝 MSSQL → 01：SQLServer 2008概述及安裝 Flume安裝部署 ELK概述及Elasticsearch 7.7部署 ELK日志分析系統概述及部署（圖文詳解）