一、open-falcon介紹
1)中文社區介紹
http://book.open-falcon.org/zh_0_2/intro/
參照文檔: https://www.cnblogs.com/LAlexH/p/11161943.html
參照文檔:https://www.cnblogs.com/straycats/p/7199209.html
視頻連接:http://www.jikexueyuan.com/course/1651_3.html?ss=1
告警設置:https://www.cnblogs.com/python-lbl/p/10450186.html
2)falcon的優點
強大靈活的數據采集:自動發現,支持falcon-agent、snmp、支持用戶主動push、用戶自定義插件支持、opentsdb data model like(timestamp、endpoint、metric、key-value tags)
水平擴展能力:支持每個周期上億次的數據采集、告警判定、歷史數據存儲和查詢
高效率的告警策略管理:高效的portal、支持策略模板、模板繼承和覆蓋、多種告警方式、支持callback調用
人性化的告警設置:最大告警次數、告警級別、告警恢復通知、告警暫停、不同時段不同閾值、支持維護周期
高效率的graph組件:單機支撐200萬metric的上報、歸檔、存儲(周期為1分鍾)
高效的歷史數據query組件:采用rrdtool的數據歸檔策略,秒級返回上百個metric一年的歷史數據
dashboard:多維度的數據展示,用戶自定義Screen
高可用:整個系統無核心單點,易運維,易部署,可水平擴展
開發語言: 整個系統的后端,全部golang編寫,portal和dashboard使用python編寫
3)falcon的特性
數據采集方式多樣靈活:支持agent、snmp、用戶主動push、自定義插件等多種方式進行數據采集
高效率報警策略管理
人性化的告警設置
dashboard多維度數據展示
模板支持繼承的同時支持覆蓋策略項
server端無需做配置,只需要在client端按照agent則可以自動監控
引入tag概念,通過tag多維度對數據進行查詢展示
4) falcon的架構圖
Open-Falcon是一個比較大的分布式系統,有十幾個組件。按照功能,這十幾個組件可以划分為 基礎組件、作圖鏈路組件和報警鏈路組件,其安裝部署的架構如下圖所示
二、open-falcon單機環境安裝
1)安裝redis
1.1)yum安裝方式
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
或 yum install epel-release
yum install redis -y systemctl start redis systemctl enable redis systemctl status redis
1.2)下載 tar包進行安裝
#創建redis工作目錄 mkdir /home/redis && cd /home/redis #下載redis程序包: wget http://download.redis.io/releases/redis-4.0.9.tar.gz #編譯安裝 tar -zxvf redis-4.0.9.tar.gz mv redis-4.0.9 redis4.0.9 && cd redis4.0.9 mkdir logs make && make install #修改配置文件 vim redis.conf bind 0.0.0.0 daemonize yes pidfile /var/run/redis_6379.pid logfile "/home/redis/redis4.0.9/logs/redis.log" #啟動redis redis-server /home/redis/redis4.0.9/redis.conf #連接測試 redis-cli -h 127.0.0.1 -p 6379
2)安裝mysql
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm yum -y install mysql57-community-release-el7-10.noarch.rpm yum -y install mysql-community-server systemctl start mysqld.service [root@node01 ~]# grep "password" /var/log/mysqld.log 2019-07-13T02:39:54.602191Z 1 [Note] A temporary password is generated for root@localhost: i?5XuEqh+aRL 第一次登陸。必須給密碼 mysql -uroot -pi?5XuEqh+aRL mysql> set global validate_password_policy=0; Query OK, 0 rows affected (0.00 sec) mysql> set global validate_password_length=1; Query OK, 0 rows affected (0.00 sec) mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY '123456'; Query OK, 0 rows affected (0.00 sec) systemctl stop mysqld.service systemctl start mysqld.service systemctl status mysqld.service systemctl enable mysqld.service
mysql -uroot -p123456
2.1) 卸載mysql源。防止每次yum操作都會自動更新
yum -y remove mysql57-community-release-el7-10.noarch
2.2)不要使用root賬號
GRANT ALL ON *.* TO 'falcon'@'localhost' IDENTIFIED BY 'falcon'; GRANT ALL ON *.* TO 'falcon'@'%' IDENTIFIED BY 'falcon'; flush privileges;
2.1) 授權普通用戶遠程連接
授權遠程連接 GRANT ALL PRIVILEGES ON *.* TO 'falcon'@'%' IDENTIFIED BY 'falconpassword' WITH GRANT OPTION; flush privileges;
查看用戶: SELECT DISTINCT CONCAT('User: ''',user,'''@''',host,''';') AS query FROM mysql.user;
部分版本授權不能使用 %,代表所有,可考慮使用 * 試一下
3)初始化表結構

git clone https://github.com/open-falcon/falcon-plus.git #導入表結構 cd ./falcon-plus/scripts/mysql/db_schema/ mysql -ufalcon -pfalcon < 1_uic-db-schema.sql mysql -ufalcon -pfalcon < 2_portal-db-schema.sql mysql -ufalcon -pfalcon < 3_dashboard-db-schema.sql mysql -ufalcon -pfalcon < 4_graph-db-schema.sql mysql -ufalcon -pfalcon < 5_alarms-db-schema.sql #刪除目錄 rm -rf falcon-plus/
4)安裝golang

#下載go安裝包 https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz #解壓至/home目錄下 tar -zxvf go1.12.7.linux-amd64.tar.gz -C /home #聲明PATH echo "export PATH=$PATH:/home/go/bin" >> /etc/profile source /etc/profile #查看go版本 go version
4.1)創建工作目錄
export FALCON_HOME=/home export WORKSPACE=$FALCON_HOME/open-falcon mkdir -p $WORKSPACE
5)在工作目錄中下載安裝包

#下載安裝包 wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.0/open-falcon-v0.2.0.tar.gz #解壓 cd /home/open-falcon tar -zxvf open-falcon-v0.2.0.tar.gz
6)后端啟動

#修改配置文件為自己設置的mysql用戶和密碼 grep -Ilr 3306 ./ | xargs -n1 -- sed -i 's/root:/falcon:falcon/g' #啟動服務 /home/open-falcon/open-falcon start /home/open-falcon/open-falcon check #顯示如下則全部啟動成功 falcon-graph UP 27685 falcon-hbs UP 27697 falcon-judge UP 27707 falcon-transfer UP 27716 falcon-nodata UP 27724 falcon-aggregator UP 27732 falcon-agent UP 27743 falcon-gateway UP 27753 falcon-api UP 27761 falcon-alarm UP 28201
遞歸替換
grep -Ilr 3306 ./ | xargs -n1 -- sed -i 's/root:/falcon:falconpassword/g' grep -Ilr 3306 ./ | xargs -n1 -- sed -i 's/127.0.0.1/172.20.16.5/g'
三、安裝前端展示界面
1)下載展示模板 dashboard
#下載dashboard項目至本地 cd $WORKSPACE git clone https://github.com/open-falcon/dashboard.git #安裝所需依賴包 yum install -y python-virtualenv yum install -y python-devel yum install -y openldap-devel yum install -y mysql-devel yum groupinstall "Development tools"
2)創建依賴環境
#創建獨立的虛擬環境 cd $WORKSPACE/dashboard/ virtualenv ./env #pip安裝依賴 ./env/bin/pip install -r pip_requirements.txt -i https://pypi.douban.com/simple
3)在依賴環境中創建配置文件
vim rrd/config # TODO: read from api instead of db PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","127.0.0.1") PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306)) PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","falcon") PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","falcon") PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal") # alarm database # TODO: read from api instead of db ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","127.0.0.1") ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306)) ALARM_DB_USER = os.environ.get("ALARM_DB_USER","falcon") ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","falcon") ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms")
4)啟動服務查看狀態
#啟動
bash control start
bash control status
#查看日志
bash control tail
服務器安裝完成
4.1)如果出現內部錯誤
[root@node01 dashboard]# cat rrd/config.py 請查看改該文件連接的mysql是否正常加載了用戶名和密碼
四、客戶端的安裝
1)從服務端拷貝文件到客戶端
[root@node01 open-falcon]# pwd /home/open-falcon [root@node01 open-falcon]# scp -r agent/ root@192.168.1.7:/home/open-falcon/ [root@node01 open-falcon]# scp open-falcon root@192.168.1.7:/home/open-falcon/
2)編輯配置文件
#編輯agent配置文件,修改hostname、transfer、heaetbeat配置項 vim agent/config/cfg.json #啟動agent,查看agent狀態 ./open-falcon start agent ./open-falcon check agent tailf agent/logs/agent.log #重載配置文件 curl 127.0.0.1:1988/config/reload
稍等片刻。機器自動發現
五、查詢基本使用
1)機器選擇,監控指標選擇
1.1)查看圖像
2) Screen的功能的基本使用
歸納: 先創建demo組,再創建 相關監控的類。最后添加監控指標
再繼續添加內存
3)分組功能
添加機器
4)創建模板
添加監控策略
將之前的主機組綁定模板
5)測試肯定會觸發的報警值。模板里面進行修改
六、客戶端的開機自啟動
[root@iotansible0001 init.d]# pwd /etc/rc.d/init.d [root@iotansible0001 init.d]# cat falcon-agentd #!/bin/bash # /etc/init.d/falcon-agentd # chkconfig: 2345 20 80 # description: Starts and Stops falcon-agent dir=/home/envuser/falcon pid=`ps -ef | grep falcon-agent | grep -v falcon-agentd | grep -v "grep" | awk '{print $2}'` case "$1" in start) if [[ $pid -gt 0 ]];then echo $pid kill -9 $pid echo "Stopping falcon-agent ..." fi sleep 1 echo "Starting falcon-agent ..." su - envuser -c "cd $dir && nohup ./open-falcon start agent &" ;; stop) if [[ $pid -gt 0 ]];then echo $pid kill -9 $pid echo "Stopping falcon-agent ..." sleep 1 else echo "Falcon-agent is stoped ..." fi ;; restart) echo "Resstarting falcon-agent ..." if [[ $pid -gt 0 ]];then echo $pid kill -9 $pid echo "Stopping falcon-agent ..." fi sleep 1 echo "Starting falcon-agent ..." su - envuser -c "cd $dir && nohup ./open-falcon start agent &" ;; *) echo "Usage: falcon-agentd {start|stop|restart}" exit 0 esac exit 0
添加至啟動項
chmod +x falcon-agentd chkconfig --add falcon-agentd chkconfig falcon-agentd on
七、客戶端命令驗證
[envuser@nginx-mqtt0001 bin]$ ./falcon-agent --check net.if ... ok cpustat ... ok disk.io ... ok memory ... ok ss -s ... ok ss -tln ... ok kernel ... ok df.bytes ... ok loadavg ... ok netstat ... ok ps aux ... ok du -bs ... ok
八、推送監控數據
curl -X POST -d "[{\"metric\": \"test_by_test\", \"endpoint\": \"test_by_test_ep\", \"timestamp\": `date +%s`,\"step\": 60,\"value\": 1,\"counterType\": \"GAUGE\",\"tags\": \"region=test\"}]" http://127.0.0.1:1988/v1/push &> /dev/null
九、 es 集群監控
引用配置
[es] data_host = elk0001:9200,elk0002:9200,elk0003:9200 log_host = elk-log0001.eniot.io:9200,elk-log0002.eniot.io:9200,elk-log0003.eniot.io:9200
監控腳本

# coding: utf-8 import time import datetime import json import traceback from monitor_logger import Logger from monitor_falcon import Falcon from elasticsearch import Elasticsearch log_file = u"eniot_monitor_es_status.log" class ESstatus(): def __init__(self,logger = None): self.logger = logger if logger else Logger(log_file).get_logger() self.falcon = Falcon(self.logger) def get_conf(self,cf): try: data_info = dict() region = cf.get(u"region", u"region") if not region: msg =u"get region by conf error!" self.logger.error(msg) return data_info.update({u"region": region}) data_host = cf.get(u"es", u"data_host") if not data_host: msg =u"get es host data by conf error!" self.logger.error(msg) return data_info.update({u"data_host": data_host}) log_host = cf.get(u"es", u"log_host") if not data_host: msg = u"get es host data by conf error!" self.logger.error(msg) return data_info.update({u"log_host": log_host}) return data_info except: self.logger.error(traceback.format_exc()) def push_falcon(self,region, excutetime , status,clusterName): try: endpoint = "eniot_monitor_es_status" metric = "eniot_monitor_es_status_excutetime" tags = "region={region},clusterName={clusterName}".format( region = region, clusterName = clusterName, ) print(tags) falcon_push_data = self.falcon.get_push_data(endpoint, metric, tags, float(excutetime)) self.falcon.push_data(falcon_push_data) metric = "eniot_monitor_es_status" falcon_push_data = self.falcon.get_push_data(endpoint, metric, tags, status) self.falcon.push_data(falcon_push_data) except: self.logger.error(traceback.format_exc()) def monitor_es_client(self,region,host): try: esclient = Elasticsearch(host) start_time = time.clock() result = esclient.cat.health().split(" ") result_v = esclient.cat.health(v=True) print(result_v) clusterName =result[2] if result[3] != "green": status = 0 else: status = 1 end_time = time.clock() excutetime = end_time - start_time print("result = " + result[3]) print("excutetime = " + str(excutetime)) self.push_falcon(region, excutetime, status,clusterName) except: self.logger.error(traceback.format_exc()) def main(self): try: cf =self.falcon.check_conf() data_info = self.get_conf(cf) if not data_info: msg = u"get es info error!" self.logger.warn(msg) return region = data_info["region"] log_host = data_info["log_host"].split(",") data_host = data_info["data_host"].split(",") self.monitor_es_client(region,log_host) self.monitor_es_client(region,data_host) except: self.logger.error(traceback.format_exc()) if __name__ == '__main__': app = ESstatus() app.main()