Archive

Posts Tagged ‘data’

My bleeding heart: Dear argus, I miss you.

April 9, 2014 Leave a comment

Since I started a new job, I’ve got a lot of stuff to master before I revisit implementing flow data.

With all the Heartbleed reaction craze, I noticed that some Snort defs were released the other day, and that means there are likely IOCs that can be found in historical flow data.

Carter looks like he’s going to start a write up shortly, so keep an eye on the mailing list.

Advertisements

Web Scraping: visual web scraping

April 2, 2014 Leave a comment
Tags: , ,

Taxes, inference, downloading.

February 16, 2014 Leave a comment

Okay. So I lasted about three days without posting, but I came across a few things that are very worth sharing, but aren’t quite worthy of a page.

Taxes:
Someone posted a link to DARPA sponsored projects to Hacker News yesterday. This should squelch that useless obsession I have with predictive analysis of non-existent data I have for the week. Specifically: lineup, lyra, immens, and…

Inference:
BayesDB is a project that allows you to import data from CSV and provides a query language to query that data. Oh… it also uses Bayesian analysis to make predictions (within a tolerance) of future data. But we can already do some of this with scipy.stats.bayes_mvs()… but you can’t do it this simply.

Read more about it here. I have a feeling the next societal learning experience will be statistical literacy (previously computer literacy), and BayesDB is an effort that will assist. Don’t forget NimbleText and OpenRefine. Here is a paper on using Bayesian inference on network traffic.

Downloading:
I was disappointed to see offliberty failed to download a mix from soundcloud and mixcloud. I even wrote offliberty a bug report, and… received nothing back. After scratching my chin, I remembered my man SZ had written a last.fm downloader, and I figured I’d check in on the project. I was excited to see that not only does his last.fm downloader download from soundcloud and mixcloud, but about 20 other sites as well. Unfortunately, mixcloud downloading failed, so I dropped him an Email. He responded within a few hours, and offered that mixcloud changed their webapp and he would give reprogramming his scraper a shot. It took him two days and he was able to solve the problem today. So go grab Free Music Downloader from SZ. Don’t forget to donate.

Grepping for N occurances of a string or character within a line:

July 8, 2013 2 comments

The following will grep for one backslash in a line:

grep "^[^\\]*\\[^\\]*$" du_report.log

The following will grep for three backslashes in a line:

grep "^[^\\]*\\[^\\]*[^\\]*\\[^\\]*[^\\]*\\[^\\]*$" du_report.log

The following will grep for one, two, or three backslashes in a line:

grep "^[^\\]*\\[^\\]*[^\\]*\\[^\\]*[^\\]*\\[^\\]*$\|^[^\\]*\\[^\\]*[^\\]*\\[^\\]*$\|^[^\\]*\\[^\\]*$" du_report.log

A comment by Lee, also suggest using extended grep commands:

grep -E ‘\\{1,3}’ du_report.log

^ = beginning of line regex character
[^\\] = beginning of line regex character followed by a backslash
* = any character
\\ = backslash
$ = end of line regex character

References:

Visually render time series-data with TimeSearcher

June 12, 2013 Leave a comment

Intro:
I was turned onto the java project timesearch via coming across DAVIX, a security data visualization live linux distro.

Download:
You can download TimeSearch directly from the University of Maryland.

Formatting CSV input:
TimeSearch expects a CSV file in a specific format, noted as CIV.

Lazy, I mean, busy… yea… busy:
I haven’t looked into it much more than this, but wanted to note it for future reference, as I started writing my own in Visual C#.

Me thinks: Live word cloud generator of strings in network packets

December 16, 2012 Leave a comment

I think it would be great to scrape then visualize strings that are floating by, with faceted restrictions, in network packets; restricting by port, protocol, or address would be great.

http://www-958.ibm.com/software/data/cognos/manyeyes/page/create_visualization.html

http://www.softpedia.com/get/Office-tools/Other-Office-Tools/IBM-Word-Cloud-Generator.shtml

Oddly found via this research (more).

This is an opinion piece on the value of machine learning in the field of info sec.

HighScalability.com a wiki of architecture

October 29, 2012 Leave a comment
%d bloggers like this: