Part 3.a: Has this daddr+saddr been seen before?
Data flow structure:
1) when a new flow arrives…
2) check to see if the daddr and saddr pair was ever recorded. If it was not, then record it in a table for tracking.
3) rinse. repeat.
1) Create a new table to store the saddr+daddr pairs:
CREATE TABLE `argus`.`historic_argus_saddrdaddr` ( `inserttime` double(18,6) unsigned NOT NULL, `saddr` varchar(64) NOT NULL, `daddr` varchar(64) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
2) For HISTORIC data: populate this table with the pairs from historical data if they do not exist:
Insert data into the historic_argus_saddrdaddr table, from the data returned by (SELECT saddr, daddr FROM argus.argusTable_2013_06_26 group by saddr, daddr), that doesn't exist in the table argus.historic_argus_saddrdaddr where historic_argus_saddrdaddr.saddr = derived.saddr and historic_argus_saddrdaddr.daddr = derived.daddr
SET @a = UNIX_TIMESTAMP(sysdate(6)); INSERT INTO argus.historic_argus_saddrdaddr(inserttime, saddr, daddr) ( SELECT @a, saddr, daddr FROM (SELECT saddr, daddr FROM argus.argusTable_2013_06_24 group by saddr, daddr) as derived WHERE NOT EXISTS (SELECT * FROM argus.historic_argus_saddrdaddr where historic_argus_saddrdaddr.saddr = derived.saddr and historic_argus_saddrdaddr.daddr = derived.daddr) );
3) For LIVE data:
How I will deal with live data:
a) create a trigger that will check if a pair exists, and insert the pair if it is missing. b) have this trigger also UPDATE a last modified time in a status table (http://stackoverflow.com/a/5771988/843000) c) have a client program check this last modified time once every few seconds, and if it is greater than the previous last modified time (that the client program has recorded), consider it a new saddr+daddr pair**. (see below)
a) Create the status TABLE:
CREATE TABLE `argus`.`argus_status` ( `saddrdaddrlastmodtime` double(18,6) unsigned NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
b) Create the TRIGGER:
NOTE: the ability to CREATE a TRIGGER in new tables will have to be integrated into rasqlinsert see: http://thread.gmane.org/gmane.network.argus/9488/focus=9489
DELIMITER | CREATE TRIGGER `argus`.`after_argus_insert` AFTER INSERT ON `argus`.`argusTable_2013_06_28` FOR EACH ROW BEGIN SET @a = UNIX_TIMESTAMP(sysdate(6)); INSERT INTO argus.historic_argus_saddrdaddr(inserttime, saddr, daddr) ( SELECT @a, NEW.saddr, NEW.daddr FROM (SELECT NEW.saddr, NEW.daddr) as derived WHERE NOT EXISTS (SELECT saddr, daddr FROM argus.historic_argus_saddrdaddr where historic_argus_saddrdaddr.saddr = NEW.saddr and historic_argus_saddrdaddr.daddr = NEW.daddr) ); IF ROW_COUNT() > 0 THEN UPDATE argus_status SET saddrdaddrlastmodtime=@a; END IF; END; | DELIMITER ;
[To drop the trigger:
DROP TRIGGER `argus`.`after_argus_insert`
Monitor your resource utilization:
1) Track of memory utilization of rasqlinsert:
while true ; do echo $(date '+%H:%M:%S'): $(ps -o rss $(pgrep rasqlinsert) | grep -v RSS) ; sleep 2 ; done > ~/rss_rasqlinsert.log &
2) Track of how many records are present in table over time
while true ; do echo $(date '+%H:%M:%S'): $(mysql -pdbuser_password -udbuser --batch -e 'SELECT count(*) FROM argus.historic_argus_saddrdaddr;' | grep -v count) ; sleep 2 ; done > ~/track_uniquesaddrdaddrpairs_overtime.log &
Ideas going forward:
should seriously consider primary keys on the main flow table (the ones who are created with rasqlinsert) to correlate with the pairs in the historic pairs table, to identify which record actually inserted the new pairInstead, I will be relying on `seq` as advised by Carter.
- should I also update pair hits???
**Keep in mind that the goal is to give some weight to “the indicator” of “we’ve got a new saddr+daddr pair.” I believe that this will be one of the least valuable real time indicators that a flow is anomalous.