Part 2: Locating outliers using an empirical method in python with scipy’s mquantile()

Setup environment:

yum -y install make gcc gcc-c++ gcc-gfortran cmake python-devel python-pip pcre pcre-devel freetype-devel libpng-devel atlas-devel libgfortran
tar zxvf Python-*
cd Python-*
./configure && make && make install
sh setuptools-0.6c11-py2.7.egg
curl -O
pip-2.7 install MySQL-python numpy matplotlib scipy
#matplotlib's install is dependant on the numpy egg, freetype-devel and libpng-devel as installed previously with yum
#scipy's install is dependant on the numpy egg, atlas-devel, libgfortran and gcc-gfortran

Creating a graphical histogram to review the distribution of data:
Plot a histogram of data with python to understand the distribution. Is it uniform, normal, exponential?

In this instance, it is exponential.

Locating which (per mille) quantile a given X falls into:
If you need to determine the distribution of data:

But, it is known that this method works just fine.

Using an empircal model to find the quantile at x%:
1) get tuple/list
2) sort list ascending
3) calculate: index=trunc(x%*(N-1))
4) get N[index]
5) this is the value

See this gist for a python script.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: