這內容相信很多大大都知道,
會設定nagios就明白整個監控通知的流程.
這邊則是警示發生了,
但nagios並不是我設定的,
而且我也沒碰過nagios,
那如何學習nagios就有兩種方式,
1. 正面了解nagios的安裝
2. 邊看邊學
這邊作者是邊看邊學,
首先當我收到前人設定好的警示時,假設不知道這是什麼.
***** Icinga *****
Notification Type: PROBLEM
Service: Disk Space
Host: ****-**-*
Address: monitored_host_ip
State: WARNING
Date/Time: Wed May 18 10:14:34 CST 2016
Additional Info:
DISK WARNING - free space: / 9781 MB (20% inode=98%): /dev/shm 3936 MB (100% inode=99%): /boot 394 MB (87% inode=99%): /home 9006 MB (99% inode=99%): /var/lib/mysql 51066 MB (83% inode=99%):
至被監控端檢查
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_centos6template-lv_root
50G 38G 9.6G 80% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 477M 57M 395M 13% /boot
/dev/mapper/vg_centos6template-lv_home
9.3G 22M 8.8G 1% /home
/dev/sdb1 63G 9.8G 50G 17% /var/lib/mysql
df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg_centos6template-lv_root
3.2M 61K 3.1M 2% /
tmpfs 985K 1 985K 1% /dev/shm
/dev/sda1 126K 44 125K 1% /boot
/dev/mapper/vg_centos6template-lv_home
614K 32 614K 1% /home
/dev/sdb1 4.0M 721 4.0M 1% /var/lib/mysql
與警示內容之間的關係是??
先到監控主機(nagios server)查看設定
less /etc/icinga/conf.d/xxx_hosts.cfg
define service{
use normal-service
hostgroup_name linux-servers
service_description Disk Space
check_command nrpe_check_all_disk!20%!10%
notifications_enabled 1
}
被監控端屬於linux-servers內
檢查的程序為nrpe_check_all_disk!20%!10%,
帶兩個參數=>20%($ARG1)與10%($ARG2)
less /etc/icinga/conf.d/nrpe_command.cfg
define command {
command_name nrpe_check_all_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c 'check_all_disk!$ARG1$!$ARG2$' -t 30
}
監控主機端對備監控端執行check_all_disk!$ARG1$!$ARG2$命令
至被監控端
less /etc/nagios/nrpe.cfg
找到
command[check_all_disk]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -l
執行/usr/bin/sudo /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -l
DISK WARNING - free space: / 9780 MB (20% inode=98%); /dev/shm 3936 MB (100% inode=99%); /boot 394 MB (87% inode=99%); /home 9006 MB (99% inode=99%); /var/lib/mysql 51066 MB (83% inode=99%);| /=37927MB;40214;45241;0;50268 /dev/shm=0MB;3148;3542;0;3936 /boot=56MB;380;428;0;476 /home=21MB;7613;8565;0;9517 /var/lib/mysql=10031MB;51499;57936;0;64374
與警示內容相同
執行/usr/bin/sudo /usr/lib64/nagios/plugins/check_disk --help 可知
-w, --warning=INTEGER
Exit with WARNING status if less than INTEGER units of disk are free
-w, --warning=PERCENT%
Exit with WARNING status if less than PERCENT of disk space is free
-c, --critical=INTEGER
Exit with CRITICAL status if less than INTEGER units of disk are free
-c, --critical=PERCENT%
Exit with CRITICAL status if less than PERCENT of disk space is free
-l, --local
Only check local filesystems
由上面來看,這個警示告知硬碟空間(非inode)小於20%(與df -h結果相同),
inode 參數如下
-W, --iwarning=PERCENT%
Exit with WARNING status if less than PERCENT of inode space is free
-K, --icritical=PERCENT%
Exit with CRITICAL status if less than PERCENT of inode space is free
但此處/usr/lib64/nagios/plugins/check_disk是binary檔,
無法看到內容,所以想明確的trace code得上網google
"nagios check_disk shell script" 這些關鍵字.
如果想多了解nagios一點,可以看參考的文章
接著切換到cd /
du -hx --max-depth=1
4.0K ./selinux
8.0K ./media
6.6M ./bin
128K ./root
16M ./sbin
4.0K ./mnt
891M ./usr
5.8M ./opt
4.0K ./srv
4.0K ./mnfs
20G ./backup
4.5G ./tmp
16K ./lost+found
22M ./lib64
255M ./lib
0 ./dev
0 ./sys
4.0K ./boot
4.0K ./home
0 ./proc
12G ./var
76M ./etc
37G .
發現./backup有點大,不太合理,清一清結束.
參考