Reading a link on Hacker News simply titled “Nagios-plugins web site taken over by Nagios,” my man Michael Friedrich (who was employed by netways after working on icinga development and evangelism for at least a year) came out of the gates strong defending the open source community.
But yeah, as domain owner you can treat your community like shit, and still feel happy about it. It’s still a miracle why you didn’t fork the former Nagios Plugins project into your own enterprise product stack any sooner. Probably their work was just needed, and now you’re just throwing them away like rubbish.
Very interesting thread and topic. Even if open source is open source, contributing people don’t seem to like it when it turns into free as in beer and not free as in people. I’m sticking with monitoring-plugins.org.
Needs some work
After having a conversation with Carter Bullard of argus fame about six months ago, two points stuck with me (loosely quoted):
- “You throttle ICMP?! Why?! ICMP has a lot of useful data for everyone!”
- “Why are you so focused on using argus data for security? Focus on using it to monitor performance. It’ll give you something to deliver to your manager so they don’t think you’re wasting your time and their money. Then focus on security.”
But how? Well, quite easily. At boundaries, use an argus probe to:
- watch for ICMP status that isn’t a successful ECHO-ECHO REPLY:
ra -S 127.0.0.1:561 -s ltime saddr daddr smac dmac spkts dpkts flgs state inode - "icmp and (dst pkts eq 0 or not echo)"
- watch for no “heartbeat” (needs tuning):
rabins -S 127.0.0.1:561 -B 15s -M 5m - src bytes lt 1 or dst bytes lt 1 or src rate lt 1 or dst rate lt 1
- watch for `loss`:
rabins -S 127.0.0.1:561 -B 15s -M 5s - ploss gt 0
- watch for protocol indicated problems:
rabins -S 127.0.0.1:561 -B 15s -M 5s - frag or retrans or outoforder or winshut
- watch for performance degradation below a threshold:
#requires at least argus-clients-220.127.116.11 rabins -S 127.0.0.1:561 -B 15s -M 5s - src jit gt N or dst jit gt N or src intpkt gt N or dst intpkt gt N
If you want to filter in certain addresses to use a pipeline:
ra -S 127.0.0.1:561 -w - - icmp | rafilteraddr -r - -f raaddrfilter.txt -s ltime saddr daddr sbytes dbytes flgs state
Nagios et al are useful to get resource statistics via snmp for sure. It is also better at managing alerts than logstash (specifically schedules!).
nagios output from logstash is already coded.
Icinga et al. should still be used to send pings to devices, but no NOTICEs should be sent on these unreachable events, as the argus probe should be taking care of reachability monitoring.
I believe the bulk of the challenge will take place with processing argus data, but I believe it is quite doable. See: Using elasticsearch for logs (will probably run logstash or logstash-forwarder (aka lumberjack) on the local argus box for caching).
This consolidates performance monitoring into a single dashboard, who’s backend can be utilized for SIEM when the time comes. Producing reports should be very easy, and a ton of work has already been done as related to layman statistics on elasticsearch data, so this is great.
Processing icinga service and host check_results into elasticsearch should be very easy. Look at:
- service_perfdata_file_template (very important for your logstash grok definition)
Within the MIB for the Fortigate, there are two OIDs that contain the policy hit counts:
fgFwPolPktCount 18.104.22.168.4.1.12322.214.171.124.126.96.36.199 Number of packets matched to policy (passed or blocked, depending on policy action). Count is from the time the policy became active. 188.8.131.52.4.1.123184.108.40.206.220.127.116.11.V.P = policy packet count for policy ID P, in VDOM V
fgFwPolByteCount 18.104.22.168.4.1.12322.214.171.124.126.96.36.199 Number of bytes in packets matching the policy. See fgFwPolPktCount. 188.8.131.52.4.1.123184.108.40.206.220.127.116.11.V.P = policy byte count for policy ID P, in VDOM V
I just created a DENY policy for a variety of geographic regions, a feature of the Fortigate. Although I am also monitoring destination country code information with argus, I have not yet integrated argus into an IDS platform. Before I do this, I can quickly set up a icinga/nagios service to query this value and report when it increases above 0. I am logging policy violations within the Fortigate so that I can quickly review the source, revert to argus, then to the workstation itself.
Netways’ inGraph views are accessible at the path:
I have a lot of trouble dealing with the UI when cloning a graph, and modify a hostname, since this is not how the perfdata is stored to be pivoted in the ingraph DB.
Instead, what I do is make one graph, then edit the json files located within ./views/ adding the additional hosts by copy and pasting the necessary json.
I started assigning parents to my hosts so that I could utilize the status map. I quickly realized that there was a bug or problem with the production of the status map in 1.7.1, as only some of the nodes weren’t appearing.
I googled the problem and came across a few bugs in the bug tracker that cited upgrading to the latest trunk; which was several commits ago (before the latest version).
Sending unlimited push notifications to your iPhone or any Growl client from nagios/icinga for free with prowlapp.com or howlapp.com
UPDATE: Push also looks like a great service and is multi-platform.
Sir Issac Newton, pioneer of LSD (clearly).
Last time I talked about notifications, I talked about using SMS via Twilio, but I quickly realized that I didn’t like the unrealibility of SMS, and instead, opted to use the push service (with confirmed receipt) NotifyMyAndroid, which was great! By the way, the Email-to-SMS gateway for Sprint (at least) tended to not deliver SMS in a timely manner or at all.
So now that I’ve switched to the iPhone (mostly because I get bored easily), I’d like to send push notifications in a similar manner to NotifyMyAndroid’s excellent client, where confirmation of client receipt is very important. After doing some preliminary research, I came across Growl notifications, the prowlapp.com service and the howlapp.com service.
There is no easy way to access an interface to query the Message field for events recorded into the EventDB database. But it is quite easy to perform a query.
In the EventDB cronk:
1) click on Filter> edit
2) Click on the “Advanced” tab
3) Under “Filter by message,” click the Add button
4) You can Include/Exclude strings, or match a regular expression. Remember the following syntax is your friend:
.*error\ occurred\ during\ logon.*
5) Once modified you can easily save and share your cronks with your team, effectively creating interactive and live reports:
– right click on the cronk tab> rename if you wish then> “Save cronk as”
– Select an image
– You can create a new category here like “Go Team” or just use an existing category
– You can share the access to your cronk with other users and groups
– It will then be listed in the left side menu under the selected category.
You must give the user the icinga.cronk.custom right in order for the user’s cronk to be saved to the DB. Otherwise, you will be able to effectively save the cronk, but the backend DB entries won’t be there, so the page will fail to load.
This is quite useful to test regex for nxlog conf conditionals in Exec statement:
$Message =~ /.*select\ \*\ from\ HP_AlertIndication.*/\
Maybe one of these days I’ll write a Jasper report.