Linux下Nagios的安裝與配置

本文轉載自查看原文 2013-09-26 13:35 6662 Linux/ Mysql I

一、本文說明

本文是在參考：http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html David_Tang文章以及網上的一些資料完成，其中絕大部分內容是轉載於David_Tang。

二、Nagios簡介

Nagios是一款開源的電腦系統和網絡監視工具，能有效監控Windows、Linux和Unix的主機狀態，交換機路由器等網絡設置，打印機等。在系統或服務狀態異常時發出郵件或短信報警第一時間通知運維人員，在狀態恢復后發出正常的郵件或短信通知。

Nagios原名為NetSaint，由Ethan Galstad開發並維護至今。NAGIOS是一個縮寫形式：“Nagios Ain't Gonna Insist On Sainthood” Sainthood翻譯為聖徒，而"Agios"是"saint"的希臘表示方法。Nagios被開發在Linux下使用，但在Unix下也工作得非常好。

主要功能

    •網絡服務監控（SMTP、POP3、HTTP、NNTP、ICMP、SNMP、FTP、SSH）
    •主機資源監控（CPU load、disk usage、system logs），也包括Windows主機（使用NSClient++ plugin）
    •可以指定自己編寫的Plugin通過網絡收集數據來監控任何情況（溫度、警告……）
    •可以通過配置Nagios遠程執行插件遠程執行腳本
    •遠程監控支持SSH或SSL加通道方式進行監控
    •簡單的plugin設計允許用戶很容易的開發自己需要的檢查服務，支持很多開發語言（shell scripts、C++、Perl、ruby、Python、PHP、C#等）
    •包含很多圖形化數據Plugins（Nagiosgraph、Nagiosgrapher、PNP4Nagios等）
    •可並行服務檢查
    •能夠定義網絡主機的層次，允許逐級檢查，就是從父主機開始向下檢查
    •當服務或主機出現問題時發出通告，可通過email, pager, sms 或任意用戶自定義的plugin進行通知
    •能夠自定義事件處理機制重新激活出問題的服務或主機
    •自動日志循環
    •支持冗余監控
    •包括Web界面可以查看當前網絡狀態，通知，問題歷史，日志文件等

三、Nagios工作原理

Nagios的功能是監控服務和主機，但是他自身並不包括這部分功能，所有的監控、檢測功能都是通過各種插件來完成的。

　啟動Nagios后，它會周期性的自動調用插件去檢測服務器狀態，同時Nagios會維持一個隊列，所有插件返回來的狀態信息都進入隊列，Nagios每次都從隊首開始讀取信息，並進行處理后，把狀態結果通過web顯示出來。

　 Nagios提供了許多插件，利用這些插件可以方便的監控很多服務狀態。安裝完成后，在nagios主目錄下的/libexec里放有nagios自帶的可以使用的所有插件，如，check_disk是檢查磁盤空間的插件，check_load是檢查CPU負載的，等等。每一個插件可以通過運行./check_xxx –h 來查看其使用方法和功能。

　 Nagios可以識別4種狀態返回信息，即 0(OK)表示狀態正常/綠色、1(WARNING)表示出現警告/黃色、2(CRITICAL)表示出現非常嚴重的錯誤/紅色、3(UNKNOWN)表示未知錯誤/深黃色。Nagios根據插件返回來的值，來判斷監控對象的狀態，並通過web顯示出來，以供管理員及時發現故障。

四種監控狀態：

再說報警功能，如果監控系統發現問題不能報警那就沒有意義了，所以報警也是nagios很重要的功能之一。但是，同樣的，Nagios 自身也沒有報警部分的代碼，甚至沒有插件，而是交給用戶或者其他相關開源項目組去完成的。

　 Nagios 安裝，是指基本平台，也就是Nagios軟件包的安裝。它是監控體系的框架，也是所有監控的基礎。

　打開Nagios官方的文檔，會發現Nagios基本上沒有什么依賴包，只要求系統是Linux或者其他Nagios支持的系統。不過如果你沒有安裝apache（http服務），那么你就沒有那么直觀的界面來查看監控信息了，所以apache姑且算是一個前提條件。關於apache的安裝，網上有很多，照着安裝就是了。安裝之后要檢查一下是否可以正常工作。

　知道Nagios 是如何通過插件來管理服務器對象后，現在開始研究它是如何管理遠端服務器對象的。Nagios 系統提供了一個插件NRPE。Nagios 通過周期性的運行它來獲得遠端服務器的各種狀態信息。它們之間的關系如下圖所示：

Nagios 通過NRPE 來遠端管理服務

1. Nagios 執行安裝在它里面的check_nrpe 插件，並告訴check_nrpe 去檢測哪些服務。

2. 通過SSL，check_nrpe 連接遠端機子上的NRPE daemon

3. NRPE 運行本地的各種插件去檢測本地的服務和狀態(check_disk,..etc)

4. 最后，NRPE 把檢測的結果傳給主機端的check_nrpe，check_nrpe 再把結果送到Nagios狀態隊列中。

5. Nagios 依次讀取隊列中的信息，再把結果顯示出來。

四、實驗環境

Host Name	OS	IP	Software
node1	rhel5.4	192.168.1.151 192.168.11.164	hadoop0.20.2、namenode、dns、nfs、apache、php、nagios、nagios-plugins
node2	rhel5.4	192.168.1.152 192.168.11.167	hadoop0.20.2、datanode、mysql、nagios-plugins、nrpe
node3	rhel5.4	192.168.1.153 192.168.11.166	hadoop0.20.2、datanode、hive

node1安裝了nagios軟件，對監控的數據做處理，並且提供web界面查看和管理。當然也可以對本機自身的信息進行監控。

node2安裝了NRPE等客戶端，根據監控機的請求執行監控，然后將結果回傳給監控機。

防火牆已關閉/iptables：Firewall is not running。

SELINUX=disable

五、實驗目標

主機名	要監控的服務
node1	cpu負載
	當前登錄用戶數
	是否開啟80端口
	是否活動
	/分區使用情況
	總進程數
	是否開啟ssh服務
	swap分區使用情況
	是否啟動dns服務
node2	是否活動
	datanode進程
	mysql數據庫
node3	是否活動
node3	datanode進程

六、Nagios服務端安裝

6.1、基礎支持套件：gcc glibc glibc-common gd gd-devel xinetd openssl-devel

[root@node1 nagios]# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.1.2-46.el5
glibc-2.5-42
glibc-common-2.5-42
gd-2.0.33-9.4.el5_4.2
gd-devel-2.0.33-9.4.el5_4.2
xinetd-2.3.14-10.el5
openssl-devel-0.9.8e-26.el5_9.1
----如果系統中沒有這些套件，使用yum安裝

6.2、創建nagios用戶和用戶組

[root@node1 app]# useradd nagios
[root@node1 app]# mkdir /usr/local/nagios
[root@node1 app]# chown -R nagios.nagios /usr/local/nagios
[root@node1 app]# ll -d /usr/local/nagios/
drwxr-xr-x 2 nagios nagios 4096 Sep 24 12:02 /usr/local/nagios/

6.3、編譯安裝Nagios

[root@node1 app]# cd nagios
[root@node1 nagios]# ./configure --prefix=/usr/local/nagios
*** Configuration summary for nagios 3.3.1 07-25-2011 ***:

 General Options:
 -------------------------
        Nagios executable:  nagios
        Nagios user/group:  nagios,nagios
       Command user/group:  nagios,nagios
            Embedded Perl:  no
             Event Broker:  yes
        Install ${prefix}:  /usr/local/nagios
                Lock file:  ${prefix}/var/nagios.lock
   Check result directory:  ${prefix}/var/spool/checkresults
           Init directory:  /etc/rc.d/init.d
  Apache conf.d directory:  /etc/httpd/conf.d
             Mail program:  /bin/mail
                  Host OS:  linux-gnu

 Web Interface Options:
 ------------------------
                 HTML URL:  http://localhost/nagios/
                  CGI URL:  http://localhost/nagios/cgi-bin/
 Traceroute (used by WAP):  /bin/traceroute


Review the options above for accuracy.  If they look okay,
type 'make all' to compile the main program and CGIs.

[root@node1 nagios]# make all
cd ./base && make
make[1]: Entering directory `/app/nagios/base'
*** Support Notes *******************************************

If you have questions about configuring or running Nagios,
please make sure that you:

     - Look at the sample config files
     - Read the documentation on the Nagios Library at:
           http://library.nagios.com

before you post a question to one of the mailing lists.
Also make sure to include pertinent information that could
help others help you.  This might include:

     - What version of Nagios you are using
     - What version of the plugins you are using
     - Relevant snippets from your config files
     - Relevant error messages from the Nagios log file

For more information on obtaining support for Nagios, visit:

       http://support.nagios.com

*************************************************************

Enjoy.

[root@node1 nagios]# make install
*** Main program, CGIs and HTML files installed ***

You can continue with installing Nagios as follows (type 'make'
without any arguments for a list of all possible options):

  make install-init
     - This installs the init script in /etc/rc.d/init.d

  make install-commandmode
     - This installs and configures permissions on the
       directory for holding the external command file

  make install-config
     - This installs sample config files in /usr/local/nagios/etc

make[1]: Leaving directory `/app/nagios'

[root@node1 nagios]# make install-init
/usr/bin/install -c -m 755 -d -o root -g root /etc/rc.d/init.d
/usr/bin/install -c -m 755 -o root -g root daemon-init /etc/rc.d/init.d/nagios

*** Init script installed ***

[root@node1 nagios]# make install-commandmode
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/var/rw
chmod g+s /usr/local/nagios/var/rw

*** External command directory configured ***

[root@node1 nagios]# make install-config
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/etc
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/etc/objects
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/nagios.cfg /usr/local/nagios/etc/nagios.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/cgi.cfg /usr/local/nagios/etc/cgi.cfg
/usr/bin/install -c -b -m 660 -o nagios -g nagios sample-config/resource.cfg /usr/local/nagios/etc/resource.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/templates.cfg /usr/local/nagios/etc/objects/templates.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/commands.cfg /usr/local/nagios/etc/objects/commands.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/contacts.cfg /usr/local/nagios/etc/objects/contacts.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/timeperiods.cfg /usr/local/nagios/etc/objects/timeperiods.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/localhost.cfg /usr/local/nagios/etc/objects/localhost.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/windows.cfg /usr/local/nagios/etc/objects/windows.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/printer.cfg /usr/local/nagios/etc/objects/printer.cfg
/usr/bin/install -c -b -m 664 -o nagios -g nagios sample-config/template-object/switch.cfg /usr/local/nagios/etc/objects/switch.cfg

*** Config files installed ***

Remember, these are *SAMPLE* config files.  You'll need to read
the documentation for more information on how to actually define
services, hosts, etc. to fit your particular needs.

[root@node1 nagios]# chkconfig --add nagios
[root@node1 nagios]# chkconfig --level 35 nagios on
[root@node1 nagios]# chkconfig --list nagios
nagios             0:off    1:off    2:off    3:on    4:on    5:on    6:off

6.4、驗證程序是否被正確安裝

切換目錄到安裝路徑（這里是/usr/local/nagios），看是否存在etc、bin、sbin、share、var 這五個目錄，如果存在則可以表明程序被正確的安裝到系統了。Nagios 各個目錄用途說明如下：

bin	Nagios 可執行程序所在目錄
etc	Nagios 配置文件所在目錄
sbin	Nagios CGI 文件所在目錄，也就是執行外部命令所需文件所在的目錄
share	Nagios網頁文件所在的目錄
libexec	Nagios 外部插件所在目錄
var	Nagios 日志文件、lock 等文件所在的目錄
var/archives	Nagios 日志自動歸檔目錄
var/rw	用來存放外部命令文件的目錄

6.5、安裝Nagios插件

[root@node1 nagios-plugins-1.4.15]# ./configure --prefix=/usr/local/nagios
config.status: creating po/Makefile
            --with-apt-get-command: 
              --with-ping6-command: /bin/ping6 -n -U -w %d -c %d %s
               --with-ping-command: /bin/ping -n -U -w %d -c %d %s
                       --with-ipv6: yes
                      --with-mysql: no
                    --with-openssl: yes
                     --with-gnutls: no
               --enable-extra-opts: no
                       --with-perl: /usr/bin/perl
             --enable-perl-modules: no
                     --with-cgiurl: /nagios/cgi-bin
               --with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin
                   --enable-libtap: no
[root@node1 nagios-plugins-1.4.15]# make && make install

    6.6、安裝與配置Apache和Php
    Apache 和Php 不是安裝nagios 所必須的，但是nagios提供了web監控界面，通過web監控界面可以清晰的看到被監控主機、資源的運行狀態，因此，安裝一個web服務是很必要的。
    需要注意的是，nagios在nagios3.1.x版本以后，配置web監控界面時需要php的支持。這里我們下載的nagios版本為nagios-3.4.3，因此在編譯安裝完成apache后，還需要編譯php模塊，這里選取的php版本為php5.4.10。

a.安裝Apache

# wget http://archive.apache.org/dist/httpd/httpd-2.2.23.tar.gz

# tar zxvf httpd-2.2.23.tar.gz

# cd httpd-2.2.23

# ./configure --prefix=/usr/local/apache2

# make && make install

若出現錯誤，則在編譯時加入 --with-included-apr 即可解決。
b.安裝Php

# wget http://cn2.php.net/distributions/php-5.4.10.tar.gz

# tar zxvf php-5.4.10.tar.gz

# cd php-5.4.10

# ./configure --prefix=/usr/local/php --with-apxs2=/usr/local/apache2/bin/apxs 

# make && make install

c.配置apache
找到apache的配置文件/usr/local/apache2/conf/httpd.conf

----找到：
User daemon
Group daemon
----修改為：
User nagios
Group nagios
----然后找到：
<IfModule dir_module>
   DirectoryIndex index.html
</IfModule>
----修改為：
<IfModule dir_module>
   DirectoryIndex index.html index.php
</IfModule>  
----接着增加如下內容
AddType application/x-httpd-php .php

為了安全起見，一般情況下要讓nagios的web監控頁面必須經過授權才能訪問，這需要增加驗證配置，即在httpd.conf文件最后添加如下信息：

#setting for nagios
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
     AuthType Basic
     Options ExecCGI
     AllowOverride None
     Order allow,deny
     Allow from all
     AuthName "Nagios Access"
     AuthUserFile /usr/local/nagios/etc/htpasswd         //用於此目錄訪問身份驗證的文件
     Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
     AuthType Basic
     Options None
     AllowOverride None
     Order allow,deny
     Allow from all
     AuthName "nagios Access"
     AuthUserFile /usr/local/nagios/etc/htpasswd
     Require valid-user
</Directory>

d.創建apache目錄驗證文件
在上面的配置中，指定了目錄驗證文件htppasswd，下面要創建這個文件：

# /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd david

這樣就在/usr/local/nagios/etc 目錄下創建了一個htpasswd 驗證文件，當通過192.168.11.164/nagios/ 訪問時就需要輸入用戶名和密碼了。

e.查看認證文件的內容

# cat /usr/local/nagios/etc/htpasswd

f.啟動apache服務

# /usr/local/apache2/bin/apachectl start

到這里nagios 的安裝也就基本完成了，你可以通過web來訪問了。

七、配置Nagios

Nagios 主要用於監控一台或者多台本地主機及遠程的各種信息，包括本機資源及對外的服務等。默認的Nagios 配置沒有任何監控內容，僅是一些模板文件。若要讓Nagios 提供服務，就必須修改配置文件，增加要監控的主機和服務，下面將詳細介紹。

7.1、默認配置文件介紹

Nagios安裝完畢后，默認的配置文件在/usr/local/nagios/etc目錄下。

[root@node1 ~]# cd /usr/local/nagios/
[root@node1 nagios]# ls
bin  etc  include  libexec  sbin  share  var
[root@node1 nagios]# cd etc/
[root@node1 etc]# ls
cgi.cfg  contacts.cfg  hosts.cfg  htpasswd  nagios.cfg  objects  resource.cfg  services.cfg  timeperiods.cfg
[root@node1 etc]# cd objects/
[root@node1 objects]# ls
commands.cfg  localhost.cfg  switch.cfg     templates.cfg.bak  windows.cfg
contacts.cfg  printer.cfg    templates.cfg  timeperiods.cfg

每個文件或目錄含義如下表所示：

文件名或目錄名	用途
cgi.cfg	控制CGI訪問的配置文件
nagios.cfg	Nagios 主配置文件
resource.cfg	變量定義文件，又稱為資源文件，在些文件中定義變量，以便由其他配置文件引用，如$USER1$
objects	objects 是一個目錄，在此目錄下有很多配置文件模板，用於定義Nagios 對象
objects/commands.cfg	命令定義配置文件，其中定義的命令可以被其他配置文件引用
objects/contacts.cfg	定義聯系人和聯系人組的配置文件
objects/localhost.cfg	定義監控本地主機的配置文件
objects/printer.cfg	定義監控打印機的一個配置文件模板，默認沒有啟用此文件
objects/switch.cfg	定義監控路由器的一個配置文件模板，默認沒有啟用此文件
objects/templates.cfg	定義主機和服務的一個模板配置文件，可以在其他配置文件中引用
objects/timeperiods.cfg	定義Nagios 監控時間段的配置文件
objects/windows.cfg	監控Windows 主機的一個配置文件模板，默認沒有啟用此文件

7.2、配置文件之間的關系

在nagios的配置過程中涉及到的幾個定義有：主機、主機組，服務、服務組，聯系人、聯系人組，監控時間，監控命令等，從這些定義可以看出，nagios各個配置文件之間是互為關聯，彼此引用的。

成功配置出一台nagios監控系統，必須要弄清楚每個配置文件之間依賴與被依賴的關系，最重要的有四點：

第一：定義監控哪些主機、主機組、服務和服務組；

第二：定義這個監控要用什么命令實現；

第三：定義監控的時間段；

第四：定義主機或服務出現問題時要通知的聯系人和聯系人組。

7.3、配置Nagios

為了能更清楚的說明問題，同時也為了維護方便，建議將nagios各個定義對象創建獨立的配置文件：

創建hosts.cfg文件來定義主機和主機組

創建services.cfg文件來定義服務

用默認的contacts.cfg文件來定義聯系人和聯系人組

用默認的commands.cfg文件來定義命令

用默認的timeperiods.cfg來定義監控時間段

用默認的templates.cfg文件作為資源引用文件

a. templates.cfg文件

nagios主要用於監控主機資源以及服務，在nagios配置中稱為對象，為了不必重復定義一些監控對象，Nagios引入了一個模板配置文件，將一些共性的屬性定義成模板，以便於多次引用。這就是templates.cfg的作用。

----此文件可能需要修改contact_groups----
[root@node1 objects]# cat templates.cfg
###############################################################################
# TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES
#
# Last Modified: 10-03-2007
#
# NOTES: This config file provides you with some example object definition
#        templates that are refered by other host, service, contact, etc.
#        definitions in other config files.
#       
#        You don't need to keep these definitions in a separate file from your
#        other object definitions.  This has been done just to make things
#        easier to understand.
#
###############################################################################



###############################################################################
###############################################################################
#
# CONTACT TEMPLATES
#
###############################################################################
###############################################################################

# Generic contact definition template - This is NOT a real contact, just a template!

define contact{
        name                            generic-contact        ; The name of this contact template
        service_notification_period     24x7            ; service notifications can be sent anytime
        host_notification_period        24x7            ; host notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s        ; send notifications for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s        ; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email    ; send service notifications via email
        host_notification_commands      notify-host-by-email    ; send host notifications via email
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }




###############################################################################
###############################################################################
#
# HOST TEMPLATES
#
###############################################################################
###############################################################################

# Generic host definition template - This is NOT a real host, just a template!

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1           ; Host notifications are enabled
        event_handler_enabled           1           ; Host event handler is enabled
        flap_detection_enabled          1           ; Flap detection is enabled
        failure_prediction_enabled      1           ; Failure prediction is enabled
        process_perf_data               1           ; Process performance data
        retain_status_information       1           ; Retain status information across program restarts
        retain_nonstatus_information    1           ; Retain non-status information across program restarts
    notification_period        24x7        ; Send host notifications at any time
        register                        0           ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }


# Linux host definition template - This is NOT a real host, just a template!

define host{
    name                linux-server    ; The name of this host template
    use                generic-host    ; This template inherits other values from the generic-host template
    check_period            24x7        ; By default, Linux hosts are checked round the clock
    check_interval            1        ; Actively check the host every 5 minutes
    retry_interval            1        ; Schedule host check retries at 1 minute intervals
    max_check_attempts        2        ; Check each Linux host 10 times (max)
        check_command               check-host-alive ; Default command to check Linux hosts
    notification_period        workhours    ; Linux admins hate to be woken up, so we only notify during the day
                            ; Note that the notification_period variable is being overridden from
                            ; the value that is inherited from the generic-host template!
    notification_interval        120        ; Resend notifications every 2 hours
    notification_options        d,u,r        ; Only send notifications for specific host states
    contact_groups            ts        ; Notifications get sent to the admins by default
    notifications_enabled           1
        register            0        ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
    }
----linux-server3和linux-server2為新增加進去的----
define host{
        name                            linux-server3    ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        check_interval                  1               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              2               ; Check each Linux host 10 times (max)
        check_command                   check-host-alive ; Default command to check Linux hosts
        notification_period             workhours       ; Linux admins hate to be woken up, so we only notify during the day
                                                        ; Note that the notification_period variable is being overridden from
                                                        ; the value that is inherited from the generic-host template!
        notification_interval           120             ; Resend notifications every 2 hours
        notification_options            d,u,r           ; Only send notifications for specific host states
        contact_groups                  ts              ; Notifications get sent to the admins by default
        notifications_enabled           1
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }

define host{
        name                            linux-server2    ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        check_interval                  5               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              10              ; Check each Linux host 10 times (max)
        check_command                   check-host-alive ; Default command to check Linux hosts
        notification_period             workhours       ; Linux admins hate to be woken up, so we only notify during the day
                                                        ; Note that the notification_period variable is being overridden from
                                                        ; the value that is inherited from the generic-host template!
        notification_interval           120             ; Resend notifications every 2 hours
        notification_options            d,u,r           ; Only send notifications for specific host states
        contact_groups                  ts              ; Notifications get sent to the admins by default
        notifications_enabled           1
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }


# Windows host definition template - This is NOT a real host, just a template!

define host{
    name            windows-server    ; The name of this host template
    use            generic-host    ; Inherit default values from the generic-host template
    check_period        24x7        ; By default, Windows servers are monitored round the clock
    check_interval        5        ; Actively check the server every 5 minutes
    retry_interval        1        ; Schedule host check retries at 1 minute intervals
    max_check_attempts    10        ; Check each server 10 times (max)
    check_command        check-host-alive    ; Default command to check if servers are "alive"
    notification_period    24x7        ; Send notification out at any time - day or night
    notification_interval    30        ; Resend notifications every 30 minutes
    notification_options    d,r        ; Only send notifications for specific host states
    contact_groups        ts        ; Notifications get sent to the admins by default
    hostgroups        windows-servers ; Host groups that Windows servers should be a member of
    register        0        ; DONT REGISTER THIS - ITS JUST A TEMPLATE
    }


# We define a generic printer template that can be used for most printers we monitor

define host{
    name            generic-printer    ; The name of this host template
    use            generic-host    ; Inherit default values from the generic-host template
    check_period        24x7        ; By default, printers are monitored round the clock
    check_interval        5        ; Actively check the printer every 5 minutes
    retry_interval        1        ; Schedule host check retries at 1 minute intervals
    max_check_attempts    10        ; Check each printer 10 times (max)
    check_command        check-host-alive    ; Default command to check if printers are "alive"
    notification_period    workhours        ; Printers are only used during the workday
    notification_interval    30        ; Resend notifications every 30 minutes
    notification_options    d,r        ; Only send notifications for specific host states
    contact_groups        ts        ; Notifications get sent to the admins by default
    register        0        ; DONT REGISTER THIS - ITS JUST A TEMPLATE
    }


# Define a template for switches that we can reuse
define host{
    name            generic-switch    ; The name of this host template
    use            generic-host    ; Inherit default values from the generic-host template
    check_period        24x7        ; By default, switches are monitored round the clock
    check_interval        5        ; Switches are checked every 5 minutes
    retry_interval        1        ; Schedule host check retries at 1 minute intervals
    max_check_attempts    10        ; Check each switch 10 times (max)
    check_command        check-host-alive    ; Default command to check if routers are "alive"
    notification_period    24x7        ; Send notifications at any time
    notification_interval    30        ; Resend notifications every 30 minutes
    notification_options    d,r        ; Only send notifications for specific host states
    contact_groups        ts        ; Notifications get sent to the admins by default
    register        0        ; DONT REGISTER THIS - ITS JUST A TEMPLATE
    }




###############################################################################
###############################################################################
#
# SERVICE TEMPLATES
#
###############################################################################
###############################################################################

# Generic service definition template - This is NOT a real service, just a template!

define service{
        name                            generic-service     ; The 'name' of this service template
        active_checks_enabled           1               ; Active service checks are enabled
        passive_checks_enabled          1                   ; Passive service checks are enabled/accepted
        parallelize_check               1               ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1               ; We should obsess over this service (if necessary)
        check_freshness                 0               ; Default is to NOT check service 'freshness'
        notifications_enabled           1               ; Service notifications are enabled
        event_handler_enabled           1               ; Service event handler is enabled
        flap_detection_enabled          1               ; Flap detection is enabled
        failure_prediction_enabled      1               ; Failure prediction is enabled
        process_perf_data               1               ; Process performance data
        retain_status_information       1               ; Retain status information across program restarts
        retain_nonstatus_information    1               ; Retain non-status information across program restarts
        is_volatile                     0               ; The service is not volatile
        check_period                    24x7            ; The service can be checked at any time of the day
        max_check_attempts              3            ; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10            ; Check the service every 10 minutes under normal conditions
        retry_check_interval            2            ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  ts            ; Notifications get sent out to everyone in the 'admins' group
    notification_options        w,u,c,r            ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60            ; Re-notify about service problems every hour
        notification_period             24x7            ; Notifications can be sent out at any time
         register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


# Local service definition template - This is NOT a real service, just a template!

define service{
    name                local-service         ; The name of this service template
    use                generic-service        ; Inherit default values from the generic-service definition
        max_check_attempts              4            ; Re-check the service up to 4 times in order to determine its final (hard) state
        normal_check_interval           5            ; Check the service every 5 minutes under normal conditions
        retry_check_interval            1            ; Re-check the service every minute until a hard state can be determined
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
    }

b. resource.cfg文件
resource.cfg是nagios的變量定義文件，文件內容只有一行：

[root@node1 etc]# cat resource.cfg 
$USER1$=/usr/local/nagios/libexec

其中，變量$USER1$指定了安裝nagios插件的路徑，如果把插件安裝在了其它路徑，只需在這里進行修改即可。需要注意的是，變量必須先定義，然后才能在其它配置文件中進行引用。

c. commands.cfg文件

此文件默認是存在的，無需修改即可使用，當然如果有新的命令需要加入時，在此文件進行添加即可。

[root@node1 etc]# cat objects/commands.cfg 
###############################################################################
# COMMANDS.CFG - SAMPLE COMMAND DEFINITIONS FOR NAGIOS 3.3.1
#
# Last Modified: 05-31-2007
#
# NOTES: This config file provides you with some example command definitions
#        that you can reference in host, service, and contact definitions.
#       
#        You don't need to keep commands in a separate file from your other
#        object definitions.  This has been done just to make things easier to
#        understand.
#
###############################################################################


################################################################################
#
# SAMPLE NOTIFICATION COMMANDS
#
# These are some example notification commands.  They may or may not work on
# your system without modification.  As an example, some systems will require 
# you to use "/usr/bin/mailx" instead of "/usr/bin/mail" in the commands below.
#
################################################################################


# 'notify-host-by-email' command definition
define command{
    command_name    notify-host-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" |/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
    }

# 'notify-service-by-email' command definition
define command{
    command_name    notify-service-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" |/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
    }





################################################################################
#
# SAMPLE HOST CHECK COMMANDS
#
################################################################################


# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip 
# average time to produce a critical error.
# Note: Five ICMP echo packets are sent (determined by the '-p 5' argument)

# 'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }




################################################################################
#
# SAMPLE SERVICE CHECK COMMANDS
#
# These are some example service check commands.  They may or may not work on
# your system, as they must be modified for your plugins.  See the HTML 
# documentation on the plugins for examples of how to configure command definitions.
#
# NOTE:  The following 'check_local_...' functions are designed to monitor
#        various metrics on the host that Nagios is running on (i.e. this one).
################################################################################

# 'check_local_disk' command definition
define command{
        command_name    check_local_disk
        command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
        }


# 'check_local_load' command definition
define command{
        command_name    check_local_load
        command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$
        }


# 'check_local_procs' command definition
define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
        }


# 'check_local_users' command definition
define command{
        command_name    check_local_users
        command_line    $USER1$/check_users -w $ARG1$ -c $ARG2$
        }


# 'check_local_swap' command definition
define command{
    command_name    check_local_swap
    command_line    $USER1$/check_swap -w $ARG1$ -c $ARG2$
    }


# 'check_local_mrtgtraf' command definition
define command{
    command_name    check_local_mrtgtraf
    command_line    $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
    }


################################################################################
# NOTE:  The following 'check_...' commands are used to monitor services on
#        both local and remote hosts.
################################################################################

# 'check_ftp' command definition
define command{
        command_name    check_ftp
        command_line    $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_hpjd' command definition
define command{
        command_name    check_hpjd
        command_line    $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
        }


# 'check_snmp' command definition
define command{
        command_name    check_snmp
        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
        }


# 'check_ssh' command definition
define command{
    command_name    check_ssh
    command_line    $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
    }


# 'check_dhcp' command definition
define command{
    command_name    check_dhcp
    command_line    $USER1$/check_dhcp $ARG1$
    }


# 'check_ping' command definition
define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
        }


# 'check_pop' command definition
define command{
        command_name    check_pop
        command_line    $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
        }


# 'check_imap' command definition
define command{
        command_name    check_imap
        command_line    $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
        }


# 'check_smtp' command definition
define command{
        command_name    check_smtp
        command_line    $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
        }


# 'check_tcp' command definition
define command{
    command_name    check_tcp
    command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
    }


# 'check_udp' command definition
define command{
    command_name    check_udp
    command_line    $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
    }


# 'check_nt' command definition
define command{
    command_name    check_nt
    command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
    }



################################################################################
#
# SAMPLE PERFORMANCE DATA COMMANDS
#
# These are sample performance data commands that can be used to send performance
# data output to two text files (one for hosts, another for services).  If you
# plan on simply writing performance data out to a file, consider using the 
# host_perfdata_file and service_perfdata_file options in the main config file.
#
################################################################################


# 'process-host-perfdata' command definition
define command{
    command_name    process-host-perfdata
    command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
    }


# 'process-service-perfdata' command definition
define command{
    command_name    process-service-perfdata
    command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
    }

#'check_nrpe' command definition
  define command{
            command_name   check_nrpe
            command_line   $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
            }
----以下三個命令是新增的
define command{
        command_name    check_jps
        command_line    /usr/local/nagios/libexec/check_jps $ARG1$ $ARG2$
        }

define command{
        command_name    check_zhulh
        command_line    /usr/local/nagios/libexec/check_zhulh $ARG1$ $ARG2$
        }

define command{
        command_name    check_jps2
        command_line    /usr/local/nagios/libexec/check_jps2 $ARG1$ $ARG2$
        }

d. hosts.cfg文件

此文件默認不存在，需要手動創建，hosts.cfg主要用來指定被監控的主機地址以及相關屬性信息，根據實驗目標配置如下：

[root@node1 etc]# cat hosts.cfg 
define host{
        use                     linux-server2
        host_name               node2
        alias                   Nagios-node2
        address                 192.168.11.167
        }
define host{
        use                     linux-server3
        host_name               node3
        alias                   Nagios-node3
        address                 192.168.11.166
        }
define hostgroup{      
        hostgroup_name          bsmart-servers      
        alias                   bsmart servers        
        members                 node2,node3
        }

注意：在/usr/local/nagios/etc/objects 下默認有localhost.cfg 和windows.cfg 這兩個配置文件，localhost.cfg 文件是定義監控主機本身的，windows.cfg 文件是定義windows 主機的，其中包括了對host 和相關services 的定義。根據自己的需要修改其中的相關配置，詳細如下：

localhost.cfg

[root@node1 etc]# cat objects/localhost.cfg 
###############################################################################
# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE
#
# Last Modified: 05-31-2007
#
# NOTE: This config file is intended to serve as an *extremely* simple 
#       example of how you can create configuration entries to monitor
#       the local (Linux) machine.
#
###############################################################################




###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the local machine

define host{
        use                     linux-server            ; Name of host template to use
                            ; This host definition will inherit all variables that are defined
                            ; in (or inherited by) the linux-server host template definition.
        host_name               node1
        alias                   node1
        address                 192.168.11.164
        }



###############################################################################
###############################################################################
#
# HOST GROUP DEFINITION
#
###############################################################################
###############################################################################

# Define an optional hostgroup for Linux machines

define hostgroup{
        hostgroup_name  linux-servers ; The name of the hostgroup
        alias           Linux Servers ; Long name of the group
        members         node1     ; Comma separated list of hosts that belong to this group
        }



###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################


# Define a service to "ping" the local machine

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             PING
    check_command            check_ping!100.0,20%!500.0,60%
        }


# Define a service to check the disk space of the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             Root Partition
    check_command            check_local_disk!20%!10%!/
        }



# Define a service to check the number of currently logged in
# users on the local machine.  Warning if > 20 users, critical
# if > 50 users.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             Current Users
    check_command            check_local_users!20!50
        }


# Define a service to check the number of currently running procs
# on the local machine.  Warning if > 250 processes, critical if
# > 400 users.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             Total Processes
    check_command            check_local_procs!250!400!RSZDT
        }



# Define a service to check the load on the local machine. 

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             Current Load
    check_command            check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
        }



# Define a service to check the swap usage the local machine. 
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             Swap Usage
    check_command            check_local_swap!20!10
        }



# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             SSH
    check_command            check_ssh
    notifications_enabled        1
        }



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             HTTP
    check_command            check_http
    notifications_enabled        1
        }

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node1
        service_description             dns on node1
        check_command                   check_jps!dns!node1
        notifications_enabled           1
        }

windows.cfg 省略
e. services.cfg文件

此文件默認也不存在，需要手動創建，services.cfg文件主要用於定義監控的服務和主機資源，例如監控http服務、ftp服務、主機磁盤空間、主機系統負載等等。

[root@node1 etc]# cat services.cfg 

define service{
        use                     local-service
        host_name               node3
        service_description     check-host-alive
        check_command           check-host-alive
        }  

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node3
        service_description             datanode on node3
        check_command                   check_jps2!DataNode!node3
        notifications_enabled           1
        }

define service{
        use                     local-service
        host_name               node2
        service_description     check-host-alive
        check_command           check-host-alive
        }  

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node2
        service_description             datanode on node2
        check_command                   check_jps2!DataNode!node2
        notifications_enabled           1
        }


define service{
        use                             local-service
        host_name                       node2
        service_description             mysql
        check_command                   check_nrpe!check_mysql
        notifications_enabled           1
        check_interval                  1               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              2    
        }

f. contacts.cfg文件

contacts.cfg是一個定義聯系人和聯系人組的配置文件，當監控的主機或者服務出現故障，nagios會通過指定的通知方式（郵件或者短信）將信息發給這里指定的聯系人或者使用者。

[root@node1 etc]# cat contacts.cfg 
define contact{
        contact_name                    David           
        use                             generic-contact 
        alias                           Nagios Admin
        email                           zlh200868@gmail.com
        }
define contact{
        contact_name                    Jack
        use                             generic-contact
        alias                           Nagios Admin2
        email                           zlh10@163.com
        }

define contactgroup{
        contactgroup_name       ts                             
        alias                   Technical Support               
        members                 David,Jack                 
        }

g. timeperiods.cfg文件

此文件只要用於定義監控的時間段，下面是一個配置好的實例：

[root@node1 etc]# cat timeperiods.cfg 

define timeperiod{  
        timeperiod_name 24x7  
        alias           24 Hours A Day, 7 Days A Week  
        sunday          00:00-24:00  
        monday          00:00-24:00  
        tuesday         00:00-24:00  
        wednesday       00:00-24:00  
        thursday        00:00-24:00  
        friday          00:00-24:00  
        saturday        00:00-24:00  
        }
define timeperiod{  
        timeperiod_name workhours   
        alias           Normal Work Hours  
        monday          09:00-17:00  
        tuesday         09:00-17:00  
        wednesday       09:00-17:00  
        thursday        09:00-17:00  
        friday          09:00-17:00  
        }

h. cgi.cfg文件

此文件用來控制相關cgi腳本，如果想在nagios的web監控界面執行cgi腳本，例如重啟nagios進程、關閉nagios通知、停止nagios主機檢測等，這時就需要配置cgi.cfg文件了。由於nagios的web監控界面驗證用戶為david，所以只需在cgi.cfg文件中添加此用戶的執行權限就可以了，需要修改的配置信息如下：

default_user_name=david
authorized_for_system_information=nagiosadmin,david  
authorized_for_configuration_information=nagiosadmin,david  
authorized_for_system_commands=david
authorized_for_all_services=nagiosadmin,david  
authorized_for_all_hosts=nagiosadmin,david
authorized_for_all_service_commands=nagiosadmin,david  
authorized_for_all_host_commands=nagiosadmin,david

i. nagios.cfg文件

nagios.cfg默認的路徑為/usr/local/nagios/etc/nagios.cfg，是nagios的核心配置文件，所有的對象配置文件都必須在這個文件中進行定義才能發揮其作用，這里只需將對象配置文件在Nagios.cfg文件中進行引用即可。

# You can specify individual object config files as shown below:
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg


# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
#cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg

# Definitions for monitoring a Windows machine
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

# Definitions for monitoring a router/switch
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg

status_update_interval=10

nagios_user=nagios
nagios_group=nagios

check_external_commands=0

command_check_interval=10s

interval_length=60

7.4、驗證Nagios 配置文件的正確性

Nagios 在驗證配置文件方面做的非常到位，只需通過一個命令即可完成：

[root@node1 etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Nagios提供的這個驗證功能非常有用，在錯誤信息中通常會打印出錯誤的配置文件以及文件中的哪一行，這使得nagios的配置變得非常容易，報警信息通常是可以忽略的，因為一般那些只是建議性的。
看到上面這些信息就說明沒問題了，然后啟動Nagios 服務。

八、Nagios的啟動與停止

8.1、啟動Nagios

service nagios start

8.2、手動方式啟動nagios
通過nagios命令的"-d"參數來啟動nagios守護進程：

# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

8.3、手工方式停止Nagios

#kill <nagios_pid>

九、利用NRPE監控遠程Linux上的"本地信息"

上面已經對遠程Linux 主機是否存活做了監控，而判斷遠程機器是否存活，我們可以使用ping 工具對其監測。還有一些遠程主機服務，例如ftp、ssh、http，都是對外開放的服務，即使不用Nagios，我們也可以試的出來，隨便找一台機器看能不能訪問這些服務就行了。但是對於像磁盤容量，cpu負載這樣的“本地信息”，Nagios只能監測自己所在的主機，而對其他的機器則顯得有點無能為力。畢竟沒得到被控主機的適當權限是不可能得到這些信息的。為了解決這個問題，nagios有這樣一個附加組件--“NRPE”，用它就可以完成對Linux 類型主機"本地信息”的監控。

9.1、NRPE工作原理

NRPE 總共由兩部分組成： check_nrpe 插件，位於監控主機上 NRPE daemon，運行在遠程的Linux主機上(通常就是被監控機) 按照上圖，整個的監控過程如下：

當Nagios 需要監控某個遠程Linux 主機的服務或者資源情況時：

Nagios 會運行check_nrpe 這個插件，告訴它要檢查什么；

check_nrpe 插件會連接到遠程的NRPE daemon，所用的方式是SSL；

NRPE daemon 會運行相應的Nagios 插件來執行檢查；

NRPE daemon 將檢查的結果返回給check_nrpe 插件，插件將其遞交給nagios做處理。

注意：NRPE daemon 需要Nagios 插件安裝在遠程的Linux主機上，否則，daemon不能做任何的監控。

9.2、在被監控機(node2、node3)上

a.增加用戶&設定密碼

#useradd nagios

#passwd nagios

b.安裝Nagios插件

# tar zxvf nagios-plugins-1.4.16.tar.gz
# cd nagios-plugins-1.4.16
# ./configure --prefix=/usr/local/nagios
# make && make install

這一步完成后會在/usr/local/nagios/下生成三個目錄include、libexec和share。

修改目錄權限：

# chown nagios.nagios /usr/local/nagios
# chown -R nagios.nagios /usr/local/nagios/libexec

c.安裝NRPE

# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.13.tar.gz
# tar zxvf nrpe-2.13.tar.gz
# cd nrpe-2.13
# ./configure
*** Configuration summary for nrpe 2.13 11-11-2011 ***:

 General Options:
 -------------------------
 NRPE port:    5666
 NRPE user:    nagios
 NRPE group:   nagios
 Nagios user:  nagios
 Nagios group: nagios


Review the options above for accuracy.  If they look okay,
type 'make all' to compile the NRPE daemon and client.

[root@node2 nrpe-2.13]# make all
cd ./src/; make ; cd ..
make[1]: Entering directory `/app/nrpe-2.13/src'
gcc -g -O2 -I/usr/include/openssl -I/usr/include -DHAVE_CONFIG_H -o nrpe nrpe.c utils.c acl.c -L/usr/lib  -lssl -lcrypto -lnsl -lwrap  
gcc -g -O2 -I/usr/include/openssl -I/usr/include -DHAVE_CONFIG_H -o check_nrpe check_nrpe.c utils.c -L/usr/lib  -lssl -lcrypto -lnsl 
make[1]: Leaving directory `/app/nrpe-2.13/src'

*** Compile finished ***

If the NRPE daemon and client compiled without any errors, you
can continue with the installation or upgrade process.

Read the PDF documentation (NRPE.pdf) for information on the next
steps you should take to complete the installation or upgrade.

接下來安裝NRPE插件，daemon和示例配置文件
c.1 安裝check_nrpe

監控機需要安裝check_nrpe這個插件，被監控機並不需要，我們在這里安裝它只是為了測試目的。

[root@node2 nrpe-2.13]# make install-plugin
cd ./src/ && make install-plugin
make[1]: Entering directory `/app/nrpe-2.13/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec
/usr/bin/install -c -m 775 -o nagios -g nagios check_nrpe /usr/local/nagios/libexec
make[1]: Leaving directory `/app/nrpe-2.13/src'

c.2 安裝deamon

[root@node2 nrpe-2.13]# make install-daemon
cd ./src/ && make install-daemon
make[1]: Entering directory `/app/nrpe-2.13/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/bin
/usr/bin/install -c -m 775 -o nagios -g nagios nrpe /usr/local/nagios/bin
make[1]: Leaving directory `/app/nrpe-2.13/src'

c.3 安裝配置文件

[root@node2 nrpe-2.13]# make install-daemon-config
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/etc
/usr/bin/install -c -m 644 -o nagios -g nagios sample-config/nrpe.cfg /usr/local/nagios/etc

按照安裝文檔的說明，是將NRPE deamon作為xinetd下的一個服務運行的。在這樣的情況下xinetd就必須要先安裝好，不過一般系統已經默認安裝了。
d.安裝xinetd腳本

[root@node2 nrpe-2.13]# make install-xinetd
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe

可以看到創建了這個文件/etc/xinetd.d/nrpe

編譯這個腳本：

[root@node2 ~]# cat /etc/xinetd.d/nrpe 
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
           flags           = REUSE
        socket_type     = stream    
    port        = 5666    
           wait            = no
        user            = nagios
    group        = nagios
           server          = /usr/local/nagios/bin/nrpe
        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
           log_on_failure  += USERID
        disable         = no
    only_from       = 192.168.11.164 127.0.0.1
}

在only_from后增加監控主機的IP地址

編輯/etc/services文件，增加NRPE服務

[root@node2 ~]# tail -n 4 /etc/services 
iqobject    48619/tcp            # iqobject
iqobject    48619/udp            # iqobject
# Local services
nrpe            5666/tcp                        #nrpe

重啟xinetd服務

[root@node2 ~]# service xinetd restart
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]

查看NRPE是否已經啟動

[root@node2 ~]# netstat -an|grep 5666
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN

可以看到5666端口已經在監聽了。

e.測試NRPE是否正常工作

使用上面在被監控機上安裝的check_nrpe 這個插件測試NRPE 是否工作正常。

# /usr/local/nagios/libexec/check_nrpe -H localhost

會返回當前NRPE的版本

[root@node2 ~]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.13

也就是在本地用check_nrpe連接nrpe daemon是正常的。

注：為了后面工作的順利進行，注意本地防火牆要打開5666能讓外部的監控機訪問。

9.3 在監控機(node1)上

之前已經將Nagios運行起來了，現在要做的事情是：

安裝check_nrpe 插件；在commands.cfg 中創建check_nrpe 的命令定義，因為只有在commands.cfg 中定義過的命令才能在services.cfg 中使用；創建對被監控主機的監控項目；

9.3.1、安裝check_nrpe插件

# tar zxvf nrpe-2.13.tar.gz 
# cd nrpe-2.13
# ./configure
# make all
# make install-plugin

只運行這一步就行了，因為只需要check_nrpe插件。

在node2和node3上我們已經裝好了nrpe，現在我們測試一下監控機使用check_nrpe 與被監控機運行的nrpe daemon之間的通信。

[root@node1 etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.11.167
NRPE v2.13

看到已經正確返回了NRPE的版本信息，說明一切正常。

9.3.2、在commands.cfg中增加對check_nrpe的定義

[root@node1 etc]# cat objects/commands.cfg

#'check_nrpe' command definition
  define command{
            command_name   check_nrpe
            command_line   $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
            }

-c 后面帶的$ARG1$ 參數是傳給nrpe daemon 執行的檢測命令，之前說過了它必須是nrpe.cfg 中所定義的那5條命令中的其中一條。在services.cfg 中使用check_nrpe 的時候要用 “!” 帶上這個參數。

9.3.3、定義對Nagios-Linux 主機的監控

下面就可以在services.cfg 中定義對Nagios-Linux 主機的監控了。

[root@node1 etc]# cat services.cfg 

define service{
        use                     local-service
        host_name               node3
        service_description     check-host-alive
        check_command           check-host-alive
        }  

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node3
        service_description             datanode on node3
        check_command                   check_jps2!DataNode!node3
        notifications_enabled           1
        }

define service{
        use                     local-service
        host_name               node2
        service_description     check-host-alive
        check_command           check-host-alive
        }  

define service{
        use                             local-service         ; Name of service template to use
        host_name                       node2
        service_description             datanode on node2
        check_command                   check_jps2!DataNode!node2
        notifications_enabled           1
        }


define service{
        use                             local-service
        host_name                       node2
        service_description             mysql
        check_command                   check_nrpe!check_mysql
        notifications_enabled           1
        check_interval                  1               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              2    
        }

9.3.4、查看配置情況：

十、Nagios郵件報警的配置

10.1、安裝sendmail組件

首先要確保sendmail相關組件的完整安裝，我們

可以使用如下的命令來完成sendmail 的安裝：

# yum install -y sendmail*

然后重新啟動sendmail服務：

# service sendmail restart

然后發送測試郵件，驗證sendmail的可用性：

# echo "Hello World" | mail zlh10@163.com

10.2、郵件報警的配置

在上面我們已經簡單配置過了/usr/local/nagios/etc/objects/contacts.cfg 文件，Nagios 會將報警郵件發送到配置文件里的E-mail 地址。

10.3 Nagios通知

十一、重點說明：

11.1、監控遠端的mysql

Nagios監控遠端的mysql

11.2、由於需要監控node2和node3上面datanode的進程因此需要node1、node2、node3之間設置無密碼登陸。

11.3、啟動nagios報錯：

[root@rhel5 etc]# service nagios start
Starting nagios:This account is currently not available.
 done.

修改/etc/passwd
將/sbin/nologin改成/bin/bash

十二、參考資料：

    •Nagios官方網站：http://www.nagios.org/
    •yahoon的小屋《nagios全攻略》：http://yahoon.blog.51cto.com/
    •技術成就夢想《運維監控利器Nagios》：http://ixdba.blog.51cto.com/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linux下Nagios的安裝與配置 Linux下Nagios的安裝與配置 Nagios安裝與配置 Nagios配置安裝詳解 Nagios+pnp4nagios+rrdtool 安裝配置nagios（一） Nagios4.x安裝配置總結 CentOS7安裝nagios並配置出圖詳解 CentOS7安裝Nagios並配置出圖詳解爛泥：學習Nagios（三）： NRPE安裝及配置 linux下LAMP安裝與配置