Part 3: Qualifying data as anomalous

A long time ago, I had no idea what I wanted when it came to declaring argus data as anomalous. But over time, my viewpoint evolved into considering some real questions that could indicate anomalous data and I have come up with the following few questions. This list can be much longer and I am happy to have input.

I will organize this list in a question and detail form:

what is the nature of the previous traffic:

  • Part 3.a: has this daddr+saddr been seen before?
    • need to record the daddr and saddr pair in an sql database:
    • pseudo-code:
      result = db.query('select daddr, saddr from historic_argus_daddrsaddr_pairs_table where daddr = given_daddr and saddr = given_saddr').execute
      if len(result) = 0:
          db.query('insert daddr, saddr into historic_argus_daddrsaddr_pairs_table').execute
    • consider: should this be separate from the whole data set DB, or should it just rely on the historic flow data?
  • what protocol?
  • What time of the day? How many times? Can you create a standard deviation?
  • how many dbytes and sbytes? Is this an outlier?
  • what is the appbyte ratio? (consumer versus producer) is this an outlier?
  • what are the flow durations (-s mean and -s stddev). is this an outlier?
  • what are the packet sizes? is this an outlier?
  • what is the country code of daddr/saddr? is this an outlier?
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: