Anomaly detection, creating baselines, and determining statistical outliers in argus data with argus-clients
Carter covered how to use argus data and the argus-clients to data mine to find evidence of an APT1 infection within your network in a series of Emails to the argus-info mailing list some time ago.
The email thread started to get really twisted and is it quite difficult to read in the threaded gmane interface available for the argus-info mailing list.
In effort to make this thread more useful, I have created a PDF of the emails with some attempt to visibly separate the command and output (updated June 7th, 2013) so that it is easier to read. It was created for printing, not for viewing on a computer screen, although you can do that as well of course.
Hopefully, when I am done dissecting the thread, I will be able to compile some interesting uses for the argus-clients to build a model for anomaly detection.
Here are some threads on the mailing list that inquire about anomaly detection:
- A thread I started
- One from CS Lee.
- One from Craig Merchant.
- Another one from Craig Merchant.
- One from Jaime Nebrera
I believe rahisto will serve a great purpose here.
I’ve finished reading the thread linked above, and have come out with the following set of points.
This will cover less of the specific instance of how Carter analyzes argus data for APT1 detection (as that’s covered, well, by Carter); and instead just summerize some additional useful points that a user can take away from the document.
1) Store your data in an SQL database, do not use flat files:
- In the second email Carter covers querying a two year period of argus binary data for a set of some addresses (327670 addresses).
- Searching three years of data (“a few terabytes of argus records”) with the raaddrfilter client, he states it takes him 8+ hours.
- By using rasqlinsert() to store the `record` BLOB that contains the binary argus data, Carter is able to specify the time range (possible with raaddrfilter anyway?), to 2 years, and the query finishes within 18 seconds.
2) Remember to use all indicators of compromise:
- Carter first seeks for IP address ranges, then investigates what they were doing.
- Carter then seeks out layer 4+ protocols to see what they were doing.
3) NATed? no problem…
- To deal with packets that are NATed… simply rely indicators that aren’t affected by NATs, such as: `appbytes`, `dappbytes`, `sappbytes`. “base sequence numbers”, “payload patterns” are two additional metrics to investigate.
4) Identify upper layer protocols:
- One way to determine if upper protocols are traversing abnormal ports is to average `rate` metrics and average `load` metrics over a time span to check for protocol activity on strange ports.
5) When new Snort/Bro signatures are released, check your old argus data for the indicators.
- Sounds like a full time job, but would be very valuable to able to do. Carter discusses the Bell-LaPadula model: you are secure as you are, you start monitoring, and are alerted when you transition from secure to no longer secure.
6) Consider appbyte ratio (`abr`) as an indicator of compromise:
- Carter mentions that generally user workstations = consumers, while servers = producers. The appbyte ratio (`abr` = sappbytes:dappbytes) will reflect this relationship.
- Carter then discusses that a Hop Point’s traffic should begin to push the `abr` closer to 1.0.
- Carter suggests: 1.5 = producer node, 0.95-1.5 = balanced node, but consider differential from “normal” for a node!
- Dave suggests an additional ratio to consider: (sappbytes/dappbytes)/(sbytes/dbytes). This will stop obfuscation.
- Clearly, also flag when a subnet changes roles (and consider a local subnet versus the internet [which _should never_ be a consumer for most workstation subnets]).
/etc/argus.conf: ARGUS_GENERATE_APPBYTE_METRIC = yes
7) Recreating a packet from Argus data:
- Dave suggests:
`argus -w log.pcap` writes a pcap file.
/etc/argus.conf: ARGUS_GENERATE_MAC_DATA = yes
/etc/argus.conf: ARGUS_GENERATE_APPBYTE_METRIC = yes
/etc/argus.conf: ARGUS_CAPTURE_DATA_LEN=N (maybe 1024… 2048?) [make sure you can do this with your risk folks, because it is a liability!]
- use `radump -M printer=hex` to see a tcpdump-like display of data.
- see: http://thread.gmane.org/gmane.network.argus/9514/focus=9520
8) Use Snort, Bro, Suricata, etc. along with argus:
- Carter states, many people use argus for false positive rejection.
- An example:
a) an SQL injection attack sig is matched
b) search argus data for the hose
c) using grep function of ra() -e /regex/ of matching sig
d) “looks fishy[?]” expend time…
e) more stringently watch host for anomalous traffic for a period of time
9) Need a place to check if a file checksum indicates if a file is malicious?
- Craig recommends: isthisfilesafe.com, virustotal.com, team-cymru.org as file checksum checker sites.
10) The inception of app byte ratio (`abr`):
- John has been using this metric for a while, but is concerned about what to do with the denomintor is 0.
- Carter considers that sbytes or dbytes might be 0, and offers the solution of signifying this condition with a result of -1. He also notes that it’s not possible for [s,d]bytes = 0 and [s,d]appbytes !=0 (but I suppose the inverse is possible).
- John’s created the following metrics (quoting):
3 – appbytes non-zero for both src and dst
2 – packets exchange but zero appbytes in one or both dirs
1 – packets in only one dir
0 – malformed flow (e.g. 0 spkts which can happen with backscatter)
- John and Carter math-it-up for several emails, and nearly don’t go over my head (horray me), but succeed in the end anyway (boo).
- Carter decides to final the `abr` metric will be:
abr = (sappbytes - dappbytes)/(sappbytes+dappbytes) If you take the `abr` of an aggregation, then: abr = (sum(sappbytes) - sum(dappbytes))/(sum(sappbytes) + sum(dappbytes)) Acceptable range: 1.0 >= abr >= -1.0, abr != -0.0 +1.0: "all the app bytes were from the source" = a producer -1.0: "all the app bytes were from the destination" = a consumer 0.0 = balanced -0.0: no appbytes seen
11) Effective use of the appbyte ratio metric:
- Craig (basically) asks about how to use `abr` effectively.
- Carter responds: use `abr` against something where `abr` is relevant: like proto+port for a subnet or host.
- The usefulness of considering the `abr` of an entire node or subnet (without associating a proto+port) is more limited, but still of good use.
- Carter suggests calculating on a 30 second or 60 second basis and comparing it historically.