Part 3: Qualifying data as anomalous
A long time ago, I had no idea what I wanted when it came to declaring argus data as anomalous. But over time, my viewpoint evolved into considering some real questions that could indicate anomalous data and I have come up with the following few questions. This list can be much longer and I am happy to have input.
I will organize this list in a question and detail form:
what is the nature of the previous traffic:
- Part 3.a: has this daddr+saddr been seen before?
- need to record the daddr and saddr pair in an sql database:
result = db.query('select daddr, saddr from historic_argus_daddrsaddr_pairs_table where daddr = given_daddr and saddr = given_saddr').execute if len(result) = 0: db.query('insert daddr, saddr into historic_argus_daddrsaddr_pairs_table').execute
- consider: should this be separate from the whole data set DB, or should it just rely on the historic flow data?
- what protocol?
- What time of the day? How many times? Can you create a standard deviation?
- how many dbytes and sbytes? Is this an outlier?
- what is the appbyte ratio? (consumer versus producer) is this an outlier?
- what are the flow durations (-s mean and -s stddev). is this an outlier?
- what are the packet sizes? is this an outlier?
- what is the country code of daddr/saddr? is this an outlier?