Home > Uncategorized > A more effective monitoring architecture

A more effective monitoring architecture

Needs some work


After having a conversation with Carter Bullard of argus fame about six months ago, two points stuck with me (loosely quoted):

  • “You throttle ICMP?! Why?! ICMP has a lot of useful data for everyone!”
  • “Why are you so focused on using argus data for security? Focus on using it to monitor performance. It’ll give you something to deliver to your manager so they don’t think you’re wasting your time and their money. Then focus on security.”

But how? Well, quite easily. At boundaries, use an argus probe to:

  • watch for ICMP status that isn’t a successful ECHO-ECHO REPLY:
    ra -S 127.0.0.1:561 -s ltime saddr daddr smac dmac spkts dpkts flgs state inode - "icmp and (dst pkts eq 0 or not echo)"
    
  • watch for no “heartbeat” (needs tuning):
    rabins -S 127.0.0.1:561 -B 15s -M 5m - src bytes lt 1 or dst bytes lt 1 or src rate lt 1 or dst rate lt 1
    
  • watch for `loss`:
    rabins -S 127.0.0.1:561 -B 15s -M 5s - ploss gt 0
    
  • watch for protocol indicated problems:
    rabins -S 127.0.0.1:561 -B 15s -M 5s - frag or retrans or outoforder or winshut
    
  • watch for performance degradation below a threshold:
    #requires at least argus-clients-3.0.7.19
    rabins -S 127.0.0.1:561 -B 15s -M 5s - src jit gt N or dst jit gt N or src intpkt gt N or dst intpkt gt N
    

If you want to filter in certain addresses to use a pipeline:

ra -S 127.0.0.1:561 -w - - icmp | rafilteraddr -r - -f raaddrfilter.txt -s ltime saddr daddr sbytes dbytes flgs state

Nagios et al are useful to get resource statistics via snmp for sure. It is also better at managing alerts than logstash (specifically schedules!).

The architecture would be like this:
monitoring_arch

nagios output from logstash is already coded.

Icinga et al. should still be used to send pings to devices, but no NOTICEs should be sent on these unreachable events, as the argus probe should be taking care of reachability monitoring.

I believe the bulk of the challenge will take place with processing argus data, but I believe it is quite doable. See: Using elasticsearch for logs (will probably run logstash or logstash-forwarder (aka lumberjack) on the local argus box for caching).

This consolidates performance monitoring into a single dashboard, who’s backend can be utilized for SIEM when the time comes. Producing reports should be very easy, and a ton of work has already been done as related to layman statistics on elasticsearch data, so this is great.

Processing icinga service and host check_results into elasticsearch should be very easy. Look at:

  • service_perfdata_file_template (very important for your logstash grok definition)
  • service_perfdata_file_mode
  • service_perfdata_file_processing_interval
  • service_perfdata_file_processing_command
  • service_perfdata_command
Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: