Part 1: Introduction
I finally had a chance to dissect the APT 1 thread that turned into an anomaly detection thread on the argus mailing list, and have written to the mailing list to see what the community has to say about anomaly detection metrics and methods.
How anomalous is a flow? Scale from 1-1000, assign a percentage weight to each question/consideration.
This is not the right method:
You would have to begin by assuming that the flows as they currently exist are perfect. There is no consideration to your previous investigation into existing badness.
As discussed on the mailing list, and I will consider writing a script that tries to predict the role of a host.
See Bell-LaPadula model (thanks Carter!).
argus derivative data: Assign flows a weight.
Considering an daddr + saddr pair:
– has this daddr+saddr been seen before?
– what is the nature of the previous traffic:
– what protocol?
– What time of the day? How many times? Can you create a standard deviation?
– how many dbytes and sbytes? Is this an outlier?
– what is the appbyte ratio? (consumer versus producer) is this an outlier?
– what are the flow durations (-s mean and -s stddev). is this an outlier?
– what are the packet sizes? is this an outlier?
– what is the country code of daddr/saddr? is this an outlier?
This sort of analysis would occur at a given period of time (once a minute or shorter?).
It’s pretty clear that leveraging SQL would be useful with this. Probably keeping some tables in memory?
Probably writing something in python that checks and throws alerts would also be good.
Creating an exclusion engine for IPs and ports? times of the day? weekdays?:
– IPs of microsoft windows updates (maybe?)
– IPs of local hosts (maybe?)
– IPs of MSFT/Akamai hosted Skype supernode whitelist (maybe?)
Example of deriving weight and data:
Let’s focus on a single saddr, daddr, dport combo.
[to be done next week]
How do i work with raservice in this context? What about rahisto? They are both valuable for anomaly detection.
Things to do:
1) Build “profiles” per host: What are their roles. Put weight onto the amount of data stored (as in, an accuracy based on the amount of data back filled/available) since we are utilizing a generally empirical method with scipy.stats.mstats.mquantiles().
2) calculate a standard deviation for sourcedestination conversations (considering, well, source and destination IP, timespan of connection, port, packets per second, and a few more things)
3) Goal of assigning a weight to a “traffic event” (what does this even mean, and holy false positives).
Other info that would be useful maybe for figuring out roles:
1) app version detection (https://github.com/gamelinux/prads)
2) DNS lookups historically (https://github.com/gamelinux/passivedns)
3) snort (is it possible to check the user data buffers for patterns and/or flag flows automatically?)
Note you will have to (re)compile the argus-clients after installing mysql-devel as the sql clients require them.
Install mysql stuff:
yum -y install mysql mysql-server mysql-devel service mysqld start /usr/bin/mysql_secure_installation #enter for no root password #Y to set root password #enter a password #Y to remove anonymous user #Y to disallow root login remotely #Y to remove test database and access to it #Y to reload privilege tables now #done
echo "create database argus;" > ~/argus_db.sql echo "create user 'argus'@'localhost' identified by 'password';" >> ~/argus_db.sql echo "grant all privileges on argus.* to 'argus'@'localhost';" >> ~/argus_db.sql echo "flush privileges;" >> ~/argus_db.sql mysql -p < ~/argus_db.sql shred -u -z -n 30 ~/argus_db.sql
Quickly roll back:
drop database argus; use mysql; REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'argus'@'localhost'; DELETE FROM user WHERE user='argus'; SELECT user, password FROM user; FLUSH PRIVILEGES; SHOW GRANTS FOR 'argus'@'localhost';
If you include the password in the command, anyone with access to listing processes can see the password if you pass it in the command line, so set RA_DB_PASS in the ~/.rarc and `chmod 600 ~/.rarc`
Insert records into the DB (without the record blob):
rasqlinsert -S 127.0.0.1:561 -m none -d -M time 1d -w mysql://email@example.com/argus/argusTable_%Y_%m_%d -M norec -s seq stime ltime dur saddr daddr proto sport dport sbytes dbytes spkts dpkts sappbytes dappbytes abr sload dload srate drate sco dco