Le bloc-notes de Jérôme: Nagios2 sur Debian 4.0

installation de debian etch via le réseau système de base seulement

ajout de contrib et non-free dans /etc/apt/sources.list

en root:

apt-get install nagios2 nagios-images nagios-plugins nagios2-doc openssh-server

On obtient typiquement quelque chose de ce type:
PDBL:~# apt-get install nagios2 nagios-images nagios-plugins nagios2-doc openssh-server
Reading package lists... Done
Building dependency tree... Done
openssh-server is already the newest version.
The following extra packages will be installed:
fping libnet-snmp-perl libradius1 nagios-plugins-basic
nagios-plugins-standard nagios2-common qstat radiusclient1 snmp
Suggested packages:
libcrypt-des-perl libdigest-hmac-perl libdigest-sha1-perl
libio-socket-inet6-perl nagios-text nagios nagios-nrpe-plugin
The following NEW packages will be installed:
fping libnet-snmp-perl libradius1 nagios-images nagios-plugins
nagios-plugins-basic nagios-plugins-standard nagios2 nagios2-common
nagios2-doc qstat radiusclient1 snmp
0 upgraded, 13 newly installed, 0 to remove and 1 not upgraded.
Need to get 0B/6020kB of archives.
After unpacking 14.9MB of additional disk space will be used.

cd /etc/nagios2/
htpasswd -c htpasswd.users nagiosadmin

!!!! si on choisit un autre nom d'utilisateur que nagiosadmin,
il faut faire le changement dans /etc/nagios2/cgi.cfg

Pour pouvoir lancer des vérifications de puis l'interface graphique

chown nagios.www-data /var/lib/nagios2/
chown nagios.www-data /var/lib/nagios2/rw/

chmod u+rwx /var/lib/nagios2/rw/
chmod g+rwx /var/lib/nagios2/rw/

chmod g+s /var/lib/nagios2/
chmod g+s /var/lib/nagios2/rw/

----------------------------------------------------

quand on fait une modification dans un des fichiers de configuration,
on peut vérifier que tout est ok en faisant:

/usr/sbin/nagios2 -v /etc/nagios2/nagios.cfg

afin de pouvoir lancer des checks depuis l'interface web:

dans /etc/nagios2/nagios.cfg

check_external_commands=1 (mis à 0 par défaut)

---------------------------------------------------

Monitoring d'une machine windows:

sur la machine windows il faut installer un client

Installing the Windows Agent

Before you can begin monitoring private services and attributes of Windows machines,
you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at http://sourceforge.net/projects/nscplus. These instructions will take
you through a basic installation of the NSClient++ addon, as well as the configuration
of Nagios for monitoring the Windows machine.

1. Download the latest stable version of the NSClient++ addon from http://sourceforge.net/projects/nscplus

2. Unzip the NSClient++ files into a new C:\NSClient++ directory

3. Open a command prompt and change to the C:\NSClient++ directory

4. Register the NSClient++ system service with the following command:

 nsclient++ /install

5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):

 nsclient++ SysTray

6. Open the services manager and make sure the NSClientpp service is allowed to interact
with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed
to interact with the desktop, check the box to allow it to.

7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:

Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
Optionally require a password for clients by changing the 'password' option in the [Settings] section.
Uncomment the 'allowed_hosts' option in the [Settings] section. Add the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
Make sure the 'port' option in the [NSClient] section is uncommented and set to '1248' (the default port).

voilà un exemple de fichier de config NSC.INI fonctionnel:

[modules]
;# NSCLIENT++ MODULES
;# A list with DLLs to load at startup.
; You will need to enable some of these for NSClient++ to work.
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
; * *
; * N O T I C E ! ! ! - Y O U H A V E T O E D I T T H I S *
; * *
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
NRPEListener.dll
SysTray.dll
CheckEventLog.dll
CheckHelpers.dll
;
; CheckWMI IS AN EXTREM EARLY IDEA SO DONT USE FOR PRODUCTION ENVIROMNEMTS!
;CheckWMI.dll
;
; RemoteConfiguration IS AN EXTREM EARLY IDEA SO DONT USE FOR
PRODUCTION ENVIROMNEMTS!
;RemoteConfiguration.dll

[Settings]
;# OBFUSCATED PASSWORD
; This is the same as the password option but here you can store the password
in an obfuscated manner.
; *NOTICE* obfuscation is *NOT* the same as encryption, someone with access
to this file can still figure out the
; password. Its just a bit harder to do it at first glance.
;obfuscated_password=Jw0KAUUdXlAAUwASDAAB
;
;# PASSWORD
; This is the password (-s) that is required to access NSClient remotely.
If you leave this blank everyone will be able to access the daemon remotly.
password=test
;# ALLOWED HOST ADDRESSES
; This is a comma-delimited list of IP address of hosts that are allowed to talk
; to the all daemons.
; If leave this blank anyone can access the deamon remotly (NSClient still requires
; a valid password).
; The syntax is host or ip/mask so 192.168.0.0/24 will allow anyone on that
; subnet access
allowed_hosts=172.16.0.0/16
;
;# USE THIS FILE
; Use the INI file as opposed to the registry if this is 0 and the use_reg in the
; registry is set to 1
; the registry will be used instead.
use_file=1

[log]
;# LOG DEBUG
; Set to 1 if you want debug message printed in the log file (debug messages
; are always printed to stdout when run with -test)
debug=1
;
;# LOG FILE
; The file to print log statements to
file=NSC.log
;
;# LOG DATE MASK
; The format to for the date/time part of the log entry written to file.
date_mask=%Y-%m-%d %H:%M:%S

[NSClient]
;# ALLOWED HOST ADDRESSES
; This is a comma-delimited list of IP address of hosts that are allowed to talk
; to NSClient deamon.
; If you leave this blank the global version will be used instead.
allowed_hosts=172.16.27.43
;
;# NSCLIENT PORT NUMBER
; This is the port the NSClientListener.dll will listen to.
port=1248
;
;# BIND TO ADDRESS
; Allows you to bind server to a specific local address.
;This has to be a dotted ip adress not a hostname.
; Leaving this blank will bind to all avalible IP adresses.
bind_to_address=

[Check System]
;# CPU BUFFER SIZE
; Can be anything ranging from 1s (for 1 second) to 10w for 10 weeks.
; Notice that a larger buffer will waste memory
; so don't use a larger buffer then you need (ie. the longest check you do +1).
;CPUBufferSize=1h
;
;# CHECK RESOLUTION
; The resolution to check values (currently only CPU).
; The value is entered in 1/10:th of a second and the default is 10
; (which means ones every second)
;CheckResolution=10

[NRPE]
;# NRPE PORT NUMBER
; This is the port the NRPEListener.dll will listen to.
;port=5666
;
;# COMMAND TIMEOUT
; This specifies the maximum number of seconds that the NRPE
; daemon will allow plug-ins to finish executing before killing them off.
;command_timeout=60
;
;# COMMAND ARGUMENT PROCESSING
; This option determines whether or not the NRPE daemon will
;allow clients to specify arguments to commands that are executed.
;allow_arguments=0
;
;# COMMAND ALLOW NASTY META CHARS
; This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments. ;allow_nasty_meta_chars=0 ; ;# USE SSL SOCKET ; This option controls if SSL should be used on the socket. ;use_ssl=1 ; ;# BIND TO ADDRESS ; Allows you to bind server to a specific local address. This has to be a dotted ip adress not a hostname. ; Leaving this blank will bind to all avalible IP adresses. ; bind_to_address= ; ;# ALLOWED HOST ADDRESSES ; This is a comma-delimited list of IP address of hosts that are allowed to talk to NRPE deamon. ; If you leave this blank the global version will be used instead. ;allowed_hosts= ; ;# SCRIPT DIRECTORY ; All files in this directory will become check commands. ; *WARNING* This is undoubtedly dangerous so use with care! ;script_dir=scripts\ [NRPE Handlers] ;# COMMAND DEFINITIONS ;# Command definitions that this daemon will run. ;# Can be either NRPE syntax: ;command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 ;# Or simplified syntax: ;test=c:\test.bat foo $ARG1$ bar ;check_disk1=/usr/local/nagios/libexec/check_disk -w 5 -c 10 ;# Or even loopback (inject) syntax (to run internal commands) ;# This is a way to run "NSClient" commands and other internal module commands such as check eventlog etc. ;check_cpu=inject checkCPU warn=80 crit=90 5 10 15 ;check_eventlog=inject CheckEventLog Application warn.require.eventType=error warn.require.eventType=warning critical.require.eventType=error critical.exclude.eventType=info truncate=1024 descriptions ;check_disk_c=inject CheckFileSize ShowAll MaxWarn=1024M MaxCrit=4096M File:WIN=c:\ATI\*.* ;# But be careful: ; dont_check=inject dont_check This will "loop forever" so be careful with the inject command... ;# Check some escapings... ; check_escape=inject CheckFileSize ShowAll MaxWarn=1024M MaxCrit=4096M "File: foo \" WIN=c:\\WINDOWS\\*.*" ;# Some real world samples ;nrpe_cpu=inject checkCPU warn=80 crit=90 5 10 15 ;nrpe_ok=scripts\ok.bat

8. Start the NSClient++ service with the following command:

 nsclient++ /start

9. If installed properly, a new icon should appear in your system tray.

It will be a yellow circle with a black 'M' inside.

10. Success! The Windows server can now be added to the Nagios monitoring configuration...

sur le serveur nagios

créer un fichier avec toutes les machines windows

cd /etc/nagios2/conf.d/

cp host-gateway_nagios2.cfg host-windows_nagios2.cfg
vim host-windows_nagios2.cfg

renseigner le hostname, l'alias, l'adresse et le parent

rajouter le service:
optiplex:/etc/nagios2/conf.d# vim services_nagios2.cfg

# check that NSClient is up
define service {
host_name pc-simulation
service_description NS Client Version
check_command check_nt!CLIENTVERSION
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}

dans /etc/nagios-plugins/config/nt.cfg s'assurer que le mot de passe défini
dans NSC.INI est bien présent (par défaut non)

command_line /usr/lib/nagios/plugins/check_nt -H $HOSTADDRESS$
-v $ARG1$ -s test (le -s test est à rajouter)

define service{
use   generic-service
host_name   winserver
service_description NSClient++ Version
check_command  check_nt!CLIENTVERSION
}

Add the following service definition to monitor the uptime of the Windows server.

define service{
use   generic-service
host_name   winserver
service_description Uptime
check_command  check_nt!UPTIME
}

Add the following service definition to monitor the CPU utilization on the
Windows server and generate a CRITICAL alert if the 5-minute CPU load
is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.

define service{
use   generic-service
host_name   winserver
service_description CPU Load
check_command  check_nt_cpuload!10,80,90,20,80,95,30,80,95
}

!!!!!  Il faut rajouter ceci dans le fichier /etc/nagios-plugins/config/load.cfg
# 'check_nt_cpuload command definition
define command{
command_name check_nt_cpuload
command_line /usr/lib/nagios/plugins/check_nt -H $HOSTADDRESS$ -p 1248 -
s test -v CPULOAD -l $ARG1$
}

Add the following service definition to monitor memory usage
on the Windows server and generate a CRITICAL alert if memory
usage is 90% or more or a WARNING alert if memory usage is 80% or greater.

define service{
use   generic-service
host_name   winserver
service_description Memory Usage
check_command  check_nt!MEMUSE!-w 80 -c 90
}

Add the following service definition to monitor usage of the C:\ drive on
the Windows server and generate a CRITICAL alert if disk usage is 90%
or more or a WARNING alert if disk usage is 80% or greater.

define service{
use   generic-service
host_name   winserver
service_description C:\ Drive Space
check_command  check_nt_disk!C!80!90
}

!!!!!  Il faut rajouter ceci dans le fichier /etc/nagios/-plugins/config/disk.cfg
# 'check_nt_disk command definition
define command{
command_name check_nt_disk
command_line /usr/lib/nagios/plugins/check_nt -H $HOSTADDRESS$ -p 1248 -
s test -v USEDDISKSPACE -l $ARG1$ -w $ARG2$ -c $ARG3$
}

Add the following service definition to monitor the W3SVC service
state on the Windows machine and generate a CRITICAL alert if
the service is stopped.

define service{
use   generic-service
host_name   winserver
service_description W3SVC
check_command  check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}

Add the following service definition to monitor the Explorer.exe process
on the Windows machine and generate a CRITICAL alert if the process
is not running.

define service{
use   generic-service
host_name   winserver
service_description Explorer
check_command  check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}
----------------------------------------------------------------

monitoring de "switches" et routeurs

Creating Required Definitions

You'll need to create some object definitions in order to monitor a new switch.
These definitions can be placed in their own file or added to an already exiting
object configuration file.

First, its best practice to create a new template for each different type of host
you'll be monitoring. Let's create a new template for switches.

define host{ name generic-switch ; The name of this host template use generic-host ; Inherit default values from the generic-host template check_period 24x7 ; By default, switches are monitored round the clock check_interval 5 ; Switches are checked every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each switch 10 times (max) check_command check-host-alive ; Default command to check if routers are "alive" notification_period 24x7 ; Send notifications at any time notification_interval 30 ; Resend notifications every 30 minutes notification_options d,r ;
Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE }

Notice that the switch template definition is inheriting default values from the generic-host template, which is defined in the sample localhost.cfg file.

Next, define a new host for the switch that references the newly created generic-switch host template.

define host{ use generic-switch ; Inherit default values from a template host_name linksys-srw224p ; The name we're giving to this switch alias Linksys SRW224P Switch ; A longer name associated with the switch address 192.168.1.253 ; IP address of the switch hostgroups allhosts ; Host groups this switch is associated with }

Add an optional hostgroup for switches. This is useful if you create additional switches in the future and want to view them together in the CGIs. It can also be useful for object definition tricks that you can use to manage larger configurations later on.

define hostgroup{ hostgroup_name switches ; The name of the hostgroup alias Network Switches ; Long name of the group members linksys-srw224p ; Comma separated list of hosts that belong to this group }

The linksys-srw224p host will be a member of two hostgroups - allhosts (which is referenced in the host definition and defined in localhost.cfg) and switches (which is defined above).

Monitoring Packet Loss and RTA

Now its time to define some services that should be associated with the switch. First off, we should monitor packet loss and round trip average between the Nagios host and the switch.
This can be accomplished by using the check_ping plugin. A command definition for using the check_ping plugin that has been defined in the commands.cfg file. That command definition looks like this...

define command{ command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5 }

Let's create a service called PING as follows...

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p ; The name of the host the service is associated with service_description PING
; The service description check_command check_ping!200.0,20%!600.0,60%
; The command used to monitor the service normal_check_interval 5
; Check the service every 5 minutes under normal conditions retry_check_interval 1
; Re-check the service every minute until its final/hard state is determined }

Notice that the check_command directive is passing "200.0,20%" and "600.0,60%" to the check_ping command, where they are substituted for the $ARG1$ and $ARG2$ macros, respectively. This means that the PING service will be:

CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more
WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more
OK if the RTA is less than 200 ms and the packet loss is less than 20%

Monitoring SNMP Status Information

If your switch or router supports SNMP, you can monitor a lot of information by using the check_snmp plugin. A command definition for using the check_snmp plugin that has been defined in the commands.cfg file. That command definition looks like this...

define command{ command_name check_snmp command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$ }

Monitoring the uptime of a switch is fairly common. A service definition that would accomplish that looks like this...

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }

The check_command directive will pass the "-C public -o sysUpTime.0" options to the $ARG1$ macro in the check_snmp command definitions. The "-C public" tells the plugin that the SNMP community name is "public" and the "-o sysUpTime.0" is the OID that we want to check.

If you want to ensure that a specific port/interface on the switch is in an up state, you could create a service definition like this:

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB }

In the example above, the "-o ifOperStatus.1" refers to the OID for the operational status of port 1 on the switch. The "-r 1" option tells the check_snmp plugin to return an OK state if "1" is found in the SNMP result (1 indicates an "up" state on the port) and CRITICAL if it isn't found. The "-m RFC1213-MIB" is optional and tells the check_snmp plugin to only load the "RFC1213-MIB" instead of every single MIB that's installed on your system, which can help speed things up.

That's it for the SNMP monitoring example. There are a million things that can be monitored via SNMP, so its up to you to decide what you need and want to monitor. Good luck!

Tip: You can usually find the OIDs that can be monitored on a switch by running the following command (replace 192.168.1.253 with the IP address of the switch): snmpwalk -v1 -c public 192.168.1.253 -m ALL .1

Le bloc-notes de Jérôme

10 juillet 2007

Nagios2 sur Debian 4.0

Aucun commentaire: