監控軟件之open-falcon


一、open-falcon介紹

1)中文社區介紹

http://book.open-falcon.org/zh_0_2/intro/

參照文檔: https://www.cnblogs.com/LAlexH/p/11161943.html     

參照文檔:https://www.cnblogs.com/straycats/p/7199209.html

視頻連接:http://www.jikexueyuan.com/course/1651_3.html?ss=1

告警設置:https://www.cnblogs.com/python-lbl/p/10450186.html

2)falcon的優點

強大靈活的數據采集:自動發現,支持falcon-agent、snmp、支持用戶主動push、用戶自定義插件支持、opentsdb data model like(timestamp、endpoint、metric、key-value tags)
水平擴展能力:支持每個周期上億次的數據采集、告警判定、歷史數據存儲和查詢
高效率的告警策略管理:高效的portal、支持策略模板、模板繼承和覆蓋、多種告警方式、支持callback調用
人性化的告警設置:最大告警次數、告警級別、告警恢復通知、告警暫停、不同時段不同閾值、支持維護周期
高效率的graph組件:單機支撐200萬metric的上報、歸檔、存儲(周期為1分鍾)
高效的歷史數據query組件:采用rrdtool的數據歸檔策略,秒級返回上百個metric一年的歷史數據
dashboard:多維度的數據展示,用戶自定義Screen
高可用:整個系統無核心單點,易運維,易部署,可水平擴展
開發語言: 整個系統的后端,全部golang編寫,portal和dashboard使用python編寫

3)falcon的特性

數據采集方式多樣靈活:支持agent、snmp、用戶主動push、自定義插件等多種方式進行數據采集
高效率報警策略管理
人性化的告警設置
dashboard多維度數據展示
模板支持繼承的同時支持覆蓋策略項
server端無需做配置,只需要在client端按照agent則可以自動監控
引入tag概念,通過tag多維度對數據進行查詢展示

 

 

 

4) falcon的架構圖

Open-Falcon是一個比較大的分布式系統,有十幾個組件。按照功能,這十幾個組件可以划分為 基礎組件、作圖鏈路組件和報警鏈路組件,其安裝部署的架構如下圖所示

 二、open-falcon單機環境安裝

1)安裝redis

1.1)yum安裝方式

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
或 yum install epel-release

yum install redis -y systemctl start redis systemctl enable redis systemctl status redis

1.2)下載 tar包進行安裝

#創建redis工作目錄
mkdir /home/redis && cd /home/redis
#下載redis程序包:
wget http://download.redis.io/releases/redis-4.0.9.tar.gz
#編譯安裝
tar -zxvf redis-4.0.9.tar.gz
mv redis-4.0.9 redis4.0.9 && cd redis4.0.9
mkdir logs
make && make install
#修改配置文件
vim redis.conf
bind 0.0.0.0
daemonize yes
pidfile /var/run/redis_6379.pid
logfile "/home/redis/redis4.0.9/logs/redis.log"
#啟動redis
redis-server /home/redis/redis4.0.9/redis.conf
#連接測試
redis-cli -h 127.0.0.1 -p 6379

 2)安裝mysql

wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server
systemctl start mysqld.service
[root@node01 ~]# grep "password" /var/log/mysqld.log
2019-07-13T02:39:54.602191Z 1 [Note] A temporary password is generated for root@localhost: i?5XuEqh+aRL
第一次登陸。必須給密碼
mysql -uroot -pi?5XuEqh+aRL

mysql>  set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)

mysql> set global validate_password_length=1;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY '123456';
Query OK, 0 rows affected (0.00 sec)

systemctl stop mysqld.service
systemctl start mysqld.service
systemctl status mysqld.service
systemctl enable mysqld.service
mysql -uroot -p123456

 2.1) 卸載mysql源。防止每次yum操作都會自動更新

yum -y remove mysql57-community-release-el7-10.noarch

2.2)不要使用root賬號

GRANT ALL  ON *.* TO 'falcon'@'localhost' IDENTIFIED BY 'falcon';
GRANT ALL  ON *.* TO 'falcon'@'%' IDENTIFIED BY 'falcon';
flush privileges;

2.1) 授權普通用戶遠程連接

授權遠程連接
GRANT ALL PRIVILEGES ON *.* TO 'falcon'@'%' IDENTIFIED BY 'falconpassword' WITH GRANT OPTION;
flush privileges;

查看用戶: SELECT DISTINCT CONCAT('User: ''',user,'''@''',host,''';') AS query FROM mysql.user;

部分版本授權不能使用 %,代表所有,可考慮使用 * 試一下

3)初始化表結構

git clone  https://github.com/open-falcon/falcon-plus.git
#導入表結構
cd ./falcon-plus/scripts/mysql/db_schema/
mysql -ufalcon -pfalcon < 1_uic-db-schema.sql
mysql -ufalcon -pfalcon < 2_portal-db-schema.sql
mysql -ufalcon -pfalcon < 3_dashboard-db-schema.sql
mysql -ufalcon -pfalcon < 4_graph-db-schema.sql
mysql -ufalcon -pfalcon < 5_alarms-db-schema.sql
#刪除目錄
rm -rf falcon-plus/
View Code

4)安裝golang

#下載go安裝包
https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz
#解壓至/home目錄下
tar -zxvf go1.12.7.linux-amd64.tar.gz -C /home
#聲明PATH
echo "export PATH=$PATH:/home/go/bin" >> /etc/profile
source  /etc/profile
#查看go版本
go version
View Code

 4.1)創建工作目錄

export FALCON_HOME=/home
export WORKSPACE=$FALCON_HOME/open-falcon
mkdir -p $WORKSPACE

 5)在工作目錄中下載安裝包

#下載安裝包
wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.0/open-falcon-v0.2.0.tar.gz
#解壓
cd /home/open-falcon
tar  -zxvf open-falcon-v0.2.0.tar.gz
View Code

6)后端啟動

#修改配置文件為自己設置的mysql用戶和密碼
grep -Ilr 3306  ./ | xargs -n1 -- sed -i 's/root:/falcon:falcon/g'
#啟動服務
/home/open-falcon/open-falcon start
/home/open-falcon/open-falcon check
#顯示如下則全部啟動成功
  falcon-graph UP 27685 
  falcon-hbs UP 27697 
  falcon-judge UP 27707 
  falcon-transfer UP 27716 
  falcon-nodata UP 27724 
  falcon-aggregator UP 27732 
  falcon-agent UP 27743 
  falcon-gateway UP 27753 
  falcon-api UP 27761 
  falcon-alarm UP 28201
View Code

遞歸替換

grep -Ilr 3306  ./ | xargs -n1 -- sed -i 's/root:/falcon:falconpassword/g'
grep -Ilr 3306  ./ | xargs -n1 -- sed -i 's/127.0.0.1/172.20.16.5/g'

三、安裝前端展示界面

1)下載展示模板 dashboard

#下載dashboard項目至本地
cd $WORKSPACE
git clone https://github.com/open-falcon/dashboard.git
#安裝所需依賴包
yum install -y python-virtualenv
yum install -y python-devel
yum install -y openldap-devel
yum install -y mysql-devel
yum groupinstall "Development tools"

2)創建依賴環境

#創建獨立的虛擬環境
cd $WORKSPACE/dashboard/
virtualenv ./env
#pip安裝依賴
./env/bin/pip install -r pip_requirements.txt -i https://pypi.douban.com/simple

3)在依賴環境中創建配置文件

vim rrd/config
# TODO: read from api instead of db
PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","127.0.0.1")
PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306))
PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","falcon")
PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","falcon")
PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal")

# alarm database
# TODO: read from api instead of db
ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","127.0.0.1")
ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306))
ALARM_DB_USER = os.environ.get("ALARM_DB_USER","falcon")
ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","falcon")
ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms")

4)啟動服務查看狀態

#啟動
bash control start
bash control status
#查看日志
bash control tail

 

服務器安裝完成

4.1)如果出現內部錯誤

[root@node01 dashboard]#  cat rrd/config.py   請查看改該文件連接的mysql是否正常加載了用戶名和密碼

四、客戶端的安裝

 1)從服務端拷貝文件到客戶端

[root@node01 open-falcon]# pwd
/home/open-falcon
[root@node01 open-falcon]# scp -r agent/ root@192.168.1.7:/home/open-falcon/
[root@node01 open-falcon]# scp open-falcon root@192.168.1.7:/home/open-falcon/

 2)編輯配置文件

#編輯agent配置文件,修改hostname、transfer、heaetbeat配置項
vim agent/config/cfg.json
#啟動agent,查看agent狀態
./open-falcon start agent
./open-falcon check agent
tailf agent/logs/agent.log
#重載配置文件
curl 127.0.0.1:1988/config/reload

稍等片刻。機器自動發現

 五、查詢基本使用

1)機器選擇,監控指標選擇

1.1)查看圖像

2) Screen的功能的基本使用

歸納: 先創建demo組,再創建 相關監控的類。最后添加監控指標

 

再繼續添加內存

 3)分組功能

添加機器

4)創建模板

添加監控策略

 將之前的主機組綁定模板

5)測試肯定會觸發的報警值。模板里面進行修改

六、客戶端的開機自啟動

[root@iotansible0001 init.d]# pwd
/etc/rc.d/init.d
[root@iotansible0001 init.d]# cat falcon-agentd 
#!/bin/bash
# /etc/init.d/falcon-agentd
# chkconfig: 2345 20 80
# description: Starts and Stops falcon-agent

dir=/home/envuser/falcon
pid=`ps -ef | grep falcon-agent | grep -v falcon-agentd | grep -v "grep" | awk '{print $2}'`

case "$1" in
start)
        if [[ $pid -gt 0 ]];then
                echo $pid
                kill -9 $pid
                echo "Stopping falcon-agent ..."
        fi
        sleep 1
        echo "Starting falcon-agent ..."
        su - envuser -c "cd $dir && nohup ./open-falcon start agent &"
        ;;
stop)
        if [[ $pid -gt 0 ]];then
                echo $pid
                kill -9 $pid
                echo "Stopping falcon-agent ..."
                sleep 1
        else
                echo "Falcon-agent is stoped ..."
        fi
        ;;
restart)
        echo "Resstarting falcon-agent ..."
        if [[ $pid -gt 0 ]];then
                echo $pid        
                kill -9 $pid
                echo "Stopping falcon-agent ..."
        fi
        sleep 1
        echo "Starting falcon-agent ..."
        su - envuser -c "cd $dir && nohup ./open-falcon start agent &"
        ;;
*) 
        echo "Usage: falcon-agentd {start|stop|restart}" 
        exit 0 
esac 
exit 0

添加至啟動項

chmod +x falcon-agentd
chkconfig --add falcon-agentd
chkconfig  falcon-agentd on

 七、客戶端命令驗證

[envuser@nginx-mqtt0001 bin]$ ./falcon-agent --check
net.if   ... ok
cpustat  ... ok
disk.io  ... ok
memory   ... ok
ss -s    ... ok
ss -tln  ... ok
kernel   ... ok
df.bytes ... ok
loadavg  ... ok
netstat  ... ok
ps aux   ... ok
du -bs   ... ok

八、推送監控數據


curl -X POST -d "[{\"metric\": \"test_by_test\", \"endpoint\": \"test_by_test_ep\", \"timestamp\": `date +%s`,\"step\": 60,\"value\": 1,\"counterType\": \"GAUGE\",\"tags\": \"region=test\"}]" http://127.0.0.1:1988/v1/push &> /dev/null

 

九、 es 集群監控

引用配置

[es]
data_host = elk0001:9200,elk0002:9200,elk0003:9200
log_host = elk-log0001.eniot.io:9200,elk-log0002.eniot.io:9200,elk-log0003.eniot.io:9200

監控腳本

# coding: utf-8
import time
import datetime
import json
import traceback

from monitor_logger import Logger
from monitor_falcon import Falcon

from elasticsearch import Elasticsearch

log_file = u"eniot_monitor_es_status.log"

class ESstatus():
    def __init__(self,logger = None):
        self.logger = logger if logger else Logger(log_file).get_logger()
        self.falcon = Falcon(self.logger)

    def get_conf(self,cf):
        try:
            data_info = dict()
            region = cf.get(u"region", u"region")
            if not region:
                msg =u"get region by conf error!"
                self.logger.error(msg)
                return
            data_info.update({u"region": region})

            data_host = cf.get(u"es", u"data_host")
            if not data_host:
                msg =u"get es host data by conf error!"
                self.logger.error(msg)
                return
            data_info.update({u"data_host": data_host})

            log_host = cf.get(u"es", u"log_host")
            if not data_host:
                msg = u"get es host data by conf error!"
                self.logger.error(msg)
                return
            data_info.update({u"log_host": log_host})

            return data_info
        except:
            self.logger.error(traceback.format_exc())

    def push_falcon(self,region, excutetime ,  status,clusterName):
        try:
            endpoint = "eniot_monitor_es_status"

            metric = "eniot_monitor_es_status_excutetime"
            tags = "region={region},clusterName={clusterName}".format(
                region = region,
                clusterName = clusterName,
            )
            print(tags)
            falcon_push_data = self.falcon.get_push_data(endpoint, metric, tags, float(excutetime))
            self.falcon.push_data(falcon_push_data)

            metric = "eniot_monitor_es_status"

            falcon_push_data = self.falcon.get_push_data(endpoint, metric, tags, status)
            self.falcon.push_data(falcon_push_data)
        except:
            self.logger.error(traceback.format_exc())

    def monitor_es_client(self,region,host):
        try:
            esclient = Elasticsearch(host)
            start_time = time.clock()
            result = esclient.cat.health().split(" ")
            result_v = esclient.cat.health(v=True)
            print(result_v)
            clusterName =result[2]
            if result[3] != "green":
                status = 0
            else:
                status = 1

            end_time = time.clock()
            excutetime = end_time - start_time
            print("result = " + result[3])
            print("excutetime = " + str(excutetime))

            self.push_falcon(region, excutetime, status,clusterName)
        except:
            self.logger.error(traceback.format_exc())

    def main(self):
        try:
            cf =self.falcon.check_conf()
            data_info = self.get_conf(cf)
            if not data_info:
                msg = u"get es info error!"
                self.logger.warn(msg)
                return
            region = data_info["region"]
            log_host = data_info["log_host"].split(",")
            data_host = data_info["data_host"].split(",")
            self.monitor_es_client(region,log_host)
            self.monitor_es_client(region,data_host)

        except:
            self.logger.error(traceback.format_exc())

if __name__ == '__main__':
    app = ESstatus()
    app.main()
eniot_monitor_es_status.py

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM