The Sentinel monitoring system uses Nagios to passively collect information on associated Call Recording installations.
Centralized Sentinel monitoring provides status information, including the status of Call Recording components, server condition (free disk space, memory, load) to the Support team and partners. This information can also be made available to customers so they can monitor their own Call Recording installations.
Sentinel monitoring data is accessible through the Nagios interface, and can also be sent as e-mail notifications.

Principle of monitoring

Using the SMTP transport protocol, data is gathered through passive distributed monitoring. Each monitored system runs remote Nagios, collects service data, and sends it to the central Sentinel system.

Nagios sends simple text emails with status messages to Sentinel. The data is processed and displays in the Nagios interface. No other data is sent, so no additional access to Sentinel monitoring system is required by the system.

Monitoring flow

Each monitored system requires services configuration in Nagios.

  1. Nagios installations monitor the systems by running active checks, collecting local service status.
  2. Emails are sent to the central Nagios system with status reports.
  3. The monitored system runs postfix MTA. (A relay mail server or direct connection to the internet is required for postfix configuration.)
  4. Results from distributed monitoring systems are received at the central postfix MTA. A mail-bot adds the results to the Nagios process queue.
  5. These records are processed by the central Nagios system, that sends notifications in the event of a service status change.

Sentinel and Nagios Configuration

By default Call Recording enables you to install and configure the Nagios reporting module during the core installation process. Sentinel can also be installed at a later date if preferred.

For more information about the Sentinel and Nagios systems, refer to the Sentinel White Paper.

Call Recording SNMP Module

The Call Recording SNMP Module is a python script that is called by the Net-SNMP module for all OIDs beginning with .1.3.6.1.4.1.16321. This script provides the data lookup and returns the acquired value. The lookup consists of the following steps:

  • Identifying the Call Recording module that is requested.
  • Collecting the data from the module status.

The SNMP module uses values cached for 5 minutes to protect system from DOS attacks. The cache files are located at /tmp/snmp_trans*. If the cache is outdated or the system clock is shifted back the data is refreshed.

Mapping Call Recording modules

This table shows how OIDs are mapped to specific modules. Only the Modules that provide callrec_status info are mapped. All sub OIDs are prefixed by .1.3.6.1.4.1.16321.

Sub OIDModule NameComment
1.10.0MASTERA summary of all modules
1.10.1CORE 
1.10.2REDLINESDatabase statistics
1.10.4RMI 
1.10.5PRERECORDING 
1.10.6DS 
1.10.7CONFIGMANAGER 
1.10.8SRS 
1.10.9NAMING 
1.10.10WEBADMIN 
1.10.11RTS_JTAPI 
1.10.13MIXER 

Log in to the website and click download.

Dependencies

Since the SNMP module reads some data from the database (REDLINES module), this package relies on the postgresql-python rpm package.

Provisioning

The SNMP module logs events into the /var/log/callrec/SNMP_trans.log. This log file is rotated by logrotate service with callrec instance setting.

Net-SNMP events are logged into /var/log/snmpd.log and /var/log/messages

Sentinel Agent

Sentinel agent is the distributed Nagios installation that monitors individual Call Recording servers. Sentinel agent monitors different modules and services using probes. There are four possible statuses for each probe, these are:

  • OK
  • Warning
  • Critical
  • Unknown

In addition, each probe can provide additional text information.

Probes are periodically executed by the Nagios scheduler. The standard interval between checks by a probe is 5 minutes. In the event of non OK status the interval between checks is shortened to 1 minute. Each result is sent to the Sentinel core.

There are two options for delivering results:

  • Mail batch mode: This is the default mode. Results are cached in the temporary file and they are sent by e-mail every five minutes or sooner if the cached results reach a threshold of 100 items.
  • TCP mode: This requires additional configuration and uses public Internet connection or a pre-established VPN connection to send results. The results are delivered by the Nagios Service Check Acceptor NSCA add-on or the NSCA Client Daemon add-on. The difference between NSCA and NSCACD is how the tcp connection is established. NSCA opens one tcp connection for each submitting result. NSCACD opens one tcp connection and keeps it open as long as possible for all submitting results; if the connection fails it opens it again. Because NSCA has a huge ratio of opening connections and can be recognized as security threat use NSCACD.

The configured delivery option is stored in the variable:

/etc/callrec/callrec.conf SENTINEL_DELIVERY

The value can be email,nsca, or nscacd.

Mail batch mode description

The results are sent by the script. /usr/lib/nagios/plugins/submit_check_result This script is defined as the following commands: 

submit_check_result nagios ocsp_command

and

submit_host_check_result ochp_command 

(for distributed monitoring parameters).

The recipient address is sentinel@zoomint.com and this mailbox is utilized by mailbot. The recipient's address can be changed in the variable:

/etc/callrec/callrec.conf SENTINEL_CHECK_EMAIL.
  • The subject of the email is “Nagios Report”.
  • The body of the email consists of the Nagios External Commands
PROCESS_SERVICE_CHECK_RESULT

and

PROCESS_HOST_CHECK_RESULT.
  •  Check results are cached in the /tmp/nagios.tmp file.
  • An email is sent if cached checks are older than 5 minutes or if the number of cached checks exceeds100.

PROCESS_SERVICE_CHECK_RESULT

Command Format:

PROCESS_SERVICE_CHECK_RESULT;<host_name>;<service_description>;<return_code>;<plugin_output>

Description: This is used to submit a passive check result for a particular service. 

The "return_code" field should be one of the following:

  • 0=OK
  • 1=WARNING
  • 2=CRITICAL
  • 3=UNKNOWN

The <plugin_output> field contains text output from the service check, along with optional performance data.

PROCESS_HOST_CHECK_RESULT

Command Format:

PROCESS_HOST_CHECK_RESULT;<host_name>;<status_code>;<plugin_output>

Description: This is used to submit a passive check result for a particular host.

The status_code indicates the state of the host check and should be one of the following:

  • 0=UP
  • 1=DOWN
  • 2=UNREACHABLE

The plugin_output argument contains the text returned from the host check, along with optional performance data.

TCP mode description

The scripts /usr/lib/nagios/plugins/submit_check_result and /usr/lib/nagios/plugins/submit_host_check_result scripts send the results. These scripts are defined as the commands:

submit_check_result nagios ocsp_command 

and

submit_host_check_result ochp_command 

(for distributed monitoring parameters).

Each result is encrypted and sent to the sentinel.zoomint.com:5667 TCP port immediately. The host name alias can be changed in the variable.

/etc/callrec/callrec.conf SENTINEL_CHECK_HOST

Auto configuration and auto registration of probes

The configuration of the probes is generated at the end of Call Recording setup. The auto-configuration script creates probes based on the system and Call Recording configuration information.

Configuration is regenerated each day to prevent inconsistency between configurations of the system and probes.

Call Recording must be setup immediately after installation (before 4:00 am). If Call Recording is not setup immediately, prevent Nagios from starting using the commands chkconfig nagios off and service nagios stop commands otherwise there is a possibility of unexpected behavior (full mailbox or excessive SMTP traffic).

After each auto-configuration, the list the probes for all configured hosts and services is sent to Sentinel core.

To prevent the auto-configuration run rename the file:

/etc/cron.daily/sentinelcheck

to

/etc/cron.daily/.sentinelcheck

( Note the period before sentinelcheck).

Auto configuration and auto registration can be forced manually using the script:

/opt/callrec/bin/sentinelcheck

Available Scripts

Instead of <FQDN> type the Fully Qualified Domain name of your installation. For example,if the hostname is sascr005 and the domain istest.office.zoomint.com. then the FQDN is sascr005.test.office.zoomint.com.

This script creates the Nagios configuration.

/opt/callrec/bin/sentinelcfg <FQDN>

The result file is /etc/nagios/callrec_nagios.cfg.

This script parses the configuration of Nagios and prints all the probes for configured hosts and services to the STDOUT for auto registration purposes.

/opt/callrec/bin/sentinelreg.pl

The following code is an example of the result.

[root@sascr005 ~]# /opt/callrec/bin/sentinelreg.pl
[1334665530] REGISTER_HOST;sascr005.test.office.zoomint.com
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS Ping
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS Call Recording Version
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DISK /
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DISK /tmp
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DISK /opt
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DISK /home
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DISK /boot
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS Logged-in Users
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS PCPU 10 and more
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS Current Load
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;OS Swap
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB APP http://localhost:8080/callrec/
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB APP http://localhost:8080/prerecording/
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB APP http://localhost:8080/
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB APP http://localhost:8080/scorecard-webui/
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB APP http://localhost:8080/screenrec-uploader/
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEBADMIN Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;PRERECORDING Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DATABASE callrec
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;CORE RS count
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;CORE Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;CONFIGMANAGER Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;CORE Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RMI Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RTS_JTAPI Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;WEB Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;NAMING Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RMI Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;NAMING Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DS 1 Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DS Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;DS count
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;CONFIGMANAGER Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RTS_JTAPI Overall status
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RTS_JTAPI Registered terminals
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RS eth1 Process
[1334665530] REGISTER_SERVICE;sascr005.test.office.zoomint.com;RS eth1 SPAN port

This script provides a Nagios sanitycheck, autoconfiguration (sentinelcfg) and autoregistration (sentinelreg.pl).

/opt/callrec/bin/sentinelcheck

Any Nagios sanity problems are reported directly to the support team sentinel_reports@zoomint.com.

The address of the support team can be changed in the variable:

/etc/callrec/callrec.conf SENTINEL_SUPPORT_EMAIL 

Then reload the Nagios configuration as /etc/init.d/nagios reload.

To run the Sentinel Check script daily by cron service (4:02 AM localtime by default)

/etc/cron.daily/sentinelcheck

This is the symlink to /opt/callrec/bin/sentinelcheck to run this script daily by cron service (4:02 AM localtime by default)

New registrations are sent to register-sentinel@zoomint.com. In the event of email restriction, the output of the /opt/callrec/bin/sentinelreg.pl script should be delivered to the register-sentinel@zoomint.com email address by an alternative method. The email address can be changed in the variable.

/etc/callrec/callrec.conf SENTINEL_REGISTER_EMAIL 

Register commands looks similar to other Nagios external commands:

[<timestamp>] REGISTER_HOST;<host_name>
[<timestamp>] REGISTER_SERVICE;<host_name>;<service_description>

Notice: Register commands are not part of Nagios implementation but auto registration add-on. Nagios does not understand these commands and they are effective only in add-on.

Warnings and Critical Conditions

Probe name / maskWarning conditionError condition
<module> JVM free memory<25% free<10% free
<module> Overall statusWARNINGERROR
<module> ProcessN/Anot running
CORE RS countN/A<1
CORE Support statusMissing or Expired support licenseN/A
DATABASE callrec>3s response>8s response
DISK <partition><20% free<10% free
DS count<2<1
OS Call Recording Versiondifferent packages and db versionN/A
OS Current Load (per CPU core)>6.0 last minute 
>5.0 5 minutes 
>4.0 15 minutes
>10.0 last minute 
>8.0 5 minutes 
>6.0 15 minutes
OS Logged-in Users>20 users>50 users
OS New email – <admin|root>>0N/A
OS PCPU 10 and more>10 processes>25 processes
OS Ping (default probe)>100ms roundtrip delay 
>20% packet loss
>500ms roundtrip delay 
>60% packet loss
OS Swap<90% free<75% free
OS Time synchronization

>60s offset 
>5s jitter

>120s offset 
>10s jitter
REDLINES CRQ<90%<80%
REDLINES Decoder Queue>50 calls>500 calls
REDLINES Recent calls<2 calls< 1 calls
RS <interface> SPAN portN/Adown
RTS_JTAPI CUCM server <address>>100ms roundtrip delay
>20% packet loss
>500ms roundtrip delay 
>60% packet loss
RTS_JTAPI Registered terminals<6<1
RTS_SIP <instance> SPAN port <port>N/Adown
RTS_SKINNY <instance> SPAN port <port>N/Adown
WEB APP <URL>>1s response>4s response

The list of MIB OIDs monitored by Sentinel probes

Probe nameOID
CONFIGMANAGER Overall status.1.3.6.1.4.1.16321.1.10.7.2.0
CORE Overall status.1.3.6.1.4.1.16321.1.10.1.4.0
CORE RS count.1.3.6.1.4.1.16321.1.10.1.6.122.0
CORE Support status.1.3.6.1.4.1.16321.1.10.1.6.73.0
DS count.1.3.6.1.4.1.16321.1.10.6.6.12.0
DS Overall status.1.3.6.1.4.1.16321.1.10.6.2.0
GENESYS Overall status.1.3.6.1.4.1.16321.1.10.6.2.0
IPCCEX Overall status.1.3.6.1.4.1.16321.1.10.14.2.0
MIXER Overall status.1.3.6.1.4.1.16321.1.10.9.2.0
PRERECORDING Overall status.1.3.6.1.4.1.16321.1.10.5.2.0
REDLINES CRQ (Disabled by default).1.3.6.1.4.1.16321.1.10.2.6.7.0
REDLINES Decoder Queue (Disabled by default).1.3.6.1.4.1.16321.1.10.2.6.6.0
REDLINES Recent calls (Disabled by default).1.3.6.1.4.1.16321.1.10.2.6.1.0
RMI Overall status.1.3.6.1.4.1.16321.1.10.4.2.0
RTS_JTAPI Registered terminals.1.3.6.1.4.1.16321.1.10.11.6.12.0
WEBADMIN Overall status.1.3.6.1.4.1.16321.1.10.10.2.0

Modifying Sentinel Probes

The Nagios probes for Sentinel and their thresholds for the Call Recording application are stored in the
/etc/nagios/callrec-nagios.cfg configuration file.

The probes are updated on daily basis. To change the threshold

the following steps are required.

  1. Verify the check command in the /etc/nagios/callrec-nagios.cfg configuration file. 
    less /etc/nagios/callrec-nagios.cfg. The check command is defined by the check_command line:
    For example:

The swap usage probe is defined by the following lines:

define service{
use callrec-service ; Name of service template to use
host_name sascr008.office.zoomint.com
service_description OS Swap
contact_groups callrec-admins
check_command check_swap!90%!75%
}

The check command for swap usage is:

check_command check_swap!90%!75%

The command: /usr/lib/nagios/plugins/check_swap is used.

This command specifies:

  • The warning state is reported if 90% of swap space is free.
  • The critical state is reported if 75% of swap space is free.
  1. To change the values defined on the /etc/nagios/callrec-nagios.cfg file create a new file /opt/callrec/etc/sentinelcustomrules.cfg.
  2. The /opt/callrec/etc/sentinelcustomrules.cfg file specifys which strings of the /etc/nagios/callrec-nagios.cfg configuration file are replaced.
  3. Write the replacement rules carefully so that only the required rows are changed.
  4. The syntax of the of the line is follows:
    s,<original string >,<new string>,
    Example 1: 
s, check_swap!90%!75%,check_swap!50%!10%,

This ensures that the thresholds of the swap probe change:

  • The warning from 90% to 50 % of free swap space.
  • The critical from 75% to 10% of free swap space.

Example 2:
To change the free space on the root partition make the following changes:
/etc/nagios/callrec-nagios,cfg specifies the following probes:

define service{
use callrec-service ; Name of service template to use
host_name sascr008.office.zoomint.com
service_description DISK /
contact_groups callrec-admins
check_command check_local_disk!20%!10%!/
}
define service{
use callrec-service ; Name of service template to use
host_name sascr008.office.zoomint.com
service_description DISK /boot
contact_groups callrec-admins
check_command check_local_disk!20%!10%!/boot
}

To specify only the root partition probe use the “$“ character to symbolize the end of line in /opt/callrec/etc/sentinelcustomrules.cfg..

The /opt/callrec/etc/sentinelcustomrules.cfg includes:

s,check_local_disk!20%!10%!/$,check_local_disk!10%!5%!/,

This ensures that:

  • The warning report changes from 20% to 10% of free space.
  • The critical report changes from 10% to 5% of free space.

If the “$” is not used both root partition and /boot partition threshold change.

6. Restart the nagios service after the changes are finished using the command:

/etc/init.d/nagios restart.

The /opt/callrec/etc/sentinelcustomrules.cfg can have several lines. The number of lines depends on how many parameters need to be changed.