Home > Uncategorized > Descriptive and granular Event Log Monitoring with NSClient++ and Nagios

Descriptive and granular Event Log Monitoring with NSClient++ and Nagios

The method described in this article is now obsolete as of NSClient++ 0.4.0.

Check out this blog post at Michael Medin’s blog (author of NSClient++) about NSClient 0.4.0’s real time event log monitoring.

Michael has posted another tutorial about also polling text files.

This post is a bit messy. If you want to see the most value, simply refer to the Some examples of the new(est) syntax section near the end.  To understand context, read the article.


I know you’ve read the previous post Administering Event Log permissions, and now you’re trying to configure NSClient++ to properly parse event logs so that you can have Nagios alert you when NSClient++ finds specific  alerts you’re searching.

Run on sentences aside, this seems very complex indeed, so I’ll be expanding this post as I research.

Review all available events:

If you implemented event log permissions as posted in Administering Event Log permissions you may have noticed that there are DLL files that contain Event Log definitions. The easiest way to review these definitions is by using the Event to Trap Translator tool that comes with Windows Server, evntwin (I posted about my unsuccessful romp with eventlog snmp traps earlier).

After installing the SNMP service, you can run evntwin.  In the main window, select bullet Custom under Configuration Type, then click Edit button on the right side.

The Edit button will then switch to view and will allow you to review all events available in your Event Logs.  These includes: Event Logs (first tree level), their Event Sources (second tree level), the Event sources’ Events (right side), including each event’s properties: Event ID, Severity, and Description.

Other than this being interesting, depending on how specific you wish to get, you can make a list of the source and event IDs you wish to capture; or you can just note some wildcards you wish to monitor.

For instance, one of the purposes of my roll out is to monitor DFS Replication for issues.  So I’d like to simply: monitor all Events in the Event Log “DFS Replication,” from the Event Sources “DFS Replication” and “DFSR,” with the Event Severity Level “Error” and “Warning.”  I don’t care which events they are, as long as the events meet those rules.

We will configure this in our NSC.ini file as described on the NSClient++ CheckEventLog Config page.

NSC.ini configuration and/or check_nrpe command syntax

To enable NRPE: in NSC.ini, under [modules] uncomment:

NRPEListener.dll
CheckEventLog.dll

An argument of check_nrpe, -c allows you to issue a command to check (CheckEventLog) followed by -a for the arguments.

Here is an example of a check_nrpe line that addresses a scenario of a logon failure on a server.  I know this will be easiest for you to test since you don’t use your regular user account to log on to your servers over RDP, but have a separate account with administrative privileges.  So, you can run this repeatedly and log the error by trying to log on to the server with your regular user account.  Event 534 will be logged in the servers Security Event Log.

./check_nrpe -H 192.168.100.20 -c CheckEventLog -a truncate=1023 MaxWarn=1 MaxCrit=1 file='Security' filter=in filter+eventType==auditFailure filter+generated=\<10m filter+eventID==534 filter+message=substr:'mbrown' descriptions unique syntax='Severity: %severity% Source:%source% Occurances:(%count%) STRINGS:%strings% MESSAGE:%message%'

Specifics about syntax, or filter types, etc, can be at the NSClient++ CheckEventLog doc page.

argument meaning
truncate=1023 truncate the output to the payload being sent over NRPE to 1023 bytes (max 1024 bytes)
MaxWarn=1 1 matching hit returns a Warning state.
MaxCrit=1 1 matching hit returns a Critical state.
file=”Security” Return records that are part of the Security Event Log and pass to the filters. (Why not eventlogname, gents? We’re not all programmers.)
filter=in Records matching the filter will be include (not exclude) in the results.
filter+eventType==auditFailure Include all records matching the event type string “auditFailure.”
filter+eventSource=substr:Sec Include this or drop the record.
filter+generated=\<5m filter is generated less than 5 minutes ago (remember to escape the < or > or the shell thinks you’re redirecting)
filter+eventID==534 if event ID 534 ,include the record
filter+message=substr:’mbrown’ Look in %strings% for a string ‘mbrown’ If any of the strings in %strings% matches, include the record (see syntax below).
descriptions Include a string representation (controlled by syntax) in the returned data.
unique Tells CheckEventLog to only return one line for each record with unique attributes, even if there are 50. Meaning if you have 50 of the same event, it will return one (see %count% in syntax below).
syntax=[statement] This will allow the contents of a record to be returned as output. See Filter Keywords.

Specifics about filter* evaluation

Note that you will get the inverse evaluation if using a ‘filter-X‘ instead of a ‘filter+X‘.

For example, I want to log all of errors and warning from a two sources, but I decided I don’t need to be alerted for a few events, so I can simply use filter-eventID==4202.

filter* statements are evaluated from left to right in your check_nrpe request.

+ is AND
. is OR
- is NOT AND

This means that if you want to check and proceed to the next filter*, you should use filter. not filter+ or filter-. For example:

"filter.eventSource=DFS Replication" filter.eventSource='DFSR'

This won’t drop events that don’t match “filter.eventSource=DFS Replication”.  If you were to use “filter+eventSource=DFS Replication” the event will be dropped if it’s event source is the string ‘DFSR,’ since the string ‘DFSR’ is not the string ‘DFS Replication.’

Additionally, you may find it useful to use the filter*eventType=†

† would be arithmetic conditionals =, <, >, <>.

filter=in "filter.eventType==warning" "filter.eventType==error" "filter+eventType=<>info"

The above filter includes events with: maybe eventType warning, and maybe eventType error, and eventType not equal to information.

It’s sort of unclear how to explicitly include multiple eventTypes, until the newer versions of CheckEventLog.

Including multiple of the same filter designation

Order and the logic order is very important:

filter=in "filter=generated > -5m AND ((source = 'DFS Replication' OR source = 'DFSR') AND severity NOT IN ('informational'))"

The above statement uses the new filter syntax, which is similar to SQL.

The source event log is ‘DFS Replication’

If the event is (filter=in):

  • generated greater than “Now”(-5 minutes)
  • severity string is not a string in the string list (‘informational’)
  • source string is ‘DFS Replication’ OR ‘DFSR’

Then report it.

Michael Medin was kind enough to reply to me on ServerFault with this answer.

Problem when parsing syntax=’%message%’?

Note that versions previous to 0.3.9 will receive an error “We cant handle N arguments so you wont get argList her” when the event log they are parsing contains more than 11 arguments. Michael, the esteemed developer of CheckEventLog.dll, actually all of NSClient++, increased the count to 15 in 0.3.9.

If you specify descriptions, but don’t specify a syntax, CheckEventLog simply returns %message%, which is a-okay if you’re down with at.

Also, note that while testing, it might appear that data is being lost out of %message%.  In fact, the terminal is dealing with the carriage returns in the event message/description, it would.  So when you run check_nrpe, you only see “Logon Failure:”  This is because that line is followed by a carriage return.  You best bet is to redirect stdout to a file, and review that file (try: ./check_nrpe [blah] > .\file.txt && vim .\file.txt).

Creating aliases in nsc.ini

Michael was kind enough to promptly answer my question that you can use CheckExternalScript aliases ([External Aliases]) to run CheckEventLogs.  This might be useful to you in some specific situations where configuring the poller to be centralized.

Monitor. Monitor. Monitor.

Remember to monitor what the nsclient++.exe is doing when the poll occurs. Is it eating up all your RAMs or CPU? Maybe your poll needs to be tweaked.  %strings% is less costly than %message%.  %message% queries the registry and then DLL definition files for the Event Source.

Note on using Centreon

I know, people hate front ends to Nagios. I use Centreon, you might use others.

It is a known issue that Centreon has a problem dealing with line modifiers in configured commands, namely the ‘+’ and ‘-‘ characters found in the check_nrpe CheckEventLogs filter* modifier commands.  The workaround is to use single or double quotes to create a string.

Some examples of the new(est) syntax

Monitor DFS Replication errors, excluding events 4304 and 4202 (remember filter type qualifier ‘id’ is numeric)

CheckEventLog -a truncate=1023 MaxWarn=1 MaxCrit=1 file='DFS Replication' filter=in "filter=generated > -5m AND (source = 'DFS Replication' OR source = 'DFSR') AND (id ne 4304) AND (id ne 4202) AND severity NOT IN ('informational')" descriptions unique syntax='%message%'

Monitoring System for all events that aren’t informational (just replace ‘System’ with ‘Application’ to do the same with Application Event Log):

CheckEventLog -a truncate=1023 MaxWarn=1 MaxCrit=1 file='System' filter=in "filter=generated > -5m AND type NOT IN 'info'" descriptions unique syntax='%message%'

Monitor Security for all events that are failures (be careful, because object access audits, such as the events logged for audited deletions, is a ‘success’ not a ‘failure’):

CheckEventLog -a truncate=1023 MaxWarn=1 MaxCrit=1 file='Security' filter=in "filter=generated > -5m AND severity not like 'success'" descriptions unique syntax='%message%'

[Update]

After some time, allowing the polls to run without setting up Twilio SMS messaging alerts, I’ve come up with the following filter syntax which seems to fit pretty well.  It displays good guidelines you can use, specifically noting that CheckEventLog appears not to parse two parenthesis levels down.

!CheckEventLog -a truncate=1023 MaxWarn=1 MaxCrit=1
file='DFS Replication'
filter=in

"filter=

generated > -5m
AND
	(
		source = 'DFS Replication'
		OR
		source = 'DFSR'
	)
AND
	(
		id ne 4304
	)
AND
	(
		id ne 4202
	)
AND
	(
		id ne 4302
	)
AND
	(
		id ne 4208
	)
AND
	(
		severity NOT IN ('informational')
		OR
		id = 5004
	)
"

descriptions unique syntax='(Logged at: %written%) %message%'

Note when configuring the service, you should set the is_volatile setting to true. This will execute a notification command each time a non-OK state is returned from a poll event, even if the service is in a non-OK state.  Usually, no notification will be sent if this is the case.

Advertisements
  1. March 23, 2012 at 3:51 am

    Holy ! what a great work !
    You helped me soooooo much !
    Thanks !

    • March 23, 2012 at 7:23 am

      Glad it helped! Thanks for reading.

    • March 23, 2012 at 9:44 am

      Also make sure to keep an eye on the blog as I’ll be posting a about my monitoring/logging/alerting infrastructure soon.

  2. Maarten Minnebo
    April 25, 2012 at 4:47 am

    Thanks for the elaborate guide!

    I do have one additional question though. Is there any possible way to check multiple logfiles instead of a single? I’m asking this because to monitor server roles you often have to get information from multiple logfiles.

    Any ideas on this?

    Regards

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: