Data-Driven Security: Elements of Security Data Science

Jay Jacobs & Bob Rudis
May 13, 2014

Data-Driven Security

Find us at:

And, on Twitter:

@jayjacobs
@hrbrmstr
@ddsecblog

Elements

(h/t @drewconway => https://l.datadrivensecurity.info/1nbwaDB)

Skills: The Hacker

The Coder (Scripting/programming)
The Data Munger (Slice & dice data)
The Thinker (Critical & algorithmic thinking)
The Visualizer (Communication skills)

Skills: The Security Domain Expert

Process

Tools

tools

R (www.r-project.org)
RStudio (www.rstudio.com)
Python (www.python.org)
IPython (ipython.org)
Tableau (www.tableausoftware.com)
Excel (yes, Excel)

More at https://l.datadrivensecurity.info/1kywuJW

CC BY 2.0 | Windell Oskay | www.flickr.com/photos/oskay/

Hello, world!

date,l_ipn,r_asn,f
2006-07-01,0,701,1
2006-07-01,0,714,1
2006-07-01,0,1239,1
2006-07-01,0,1680,1
2006-07-01,0,2514,1
2006-07-01,0,3320,1
2006-07-01,0,3561,13
2006-07-01,0,4134,3
2006-07-01,0,5617,2
2006-07-01,0,6478,1
2006-07-01,0,6713,1
2006-07-01,0,7132,1
2006-07-01,0,9105,1
2006-07-01,0,10738,1
2006-07-01,0,10994,1
2006-07-01,0,12334,1
2006-07-01,0,12524,1
2006-07-01,0,12542,1
2006-07-01,0,13343,1

Plotting workstation
network flow data from:

statweb.stanford.edu/~sabatti/data.html

Hello, world!

Three lines of code:

# read in workstation data
ips.dat <- read.csv("data/cs448b_ipasn.csv", 
                colClasses=c("Date", "character", 
                             "character", "numeric"))

# count all the flows
pivot <- count(ips.dat, .(l_ipn), wt_var=.(f))

# plot the data with bars
qp <- qplot(x=l_ipn, y=freq, data=pivot,       # data to use
            geom="bar",                        # make a bar chart
            main="Title # flows / IP") +       # title of chart
      coord_flip() +                           # horizontal bars
      theme_bw() +                             # remove chart junk
      theme(text=element_text(size=20),        # BIG text (for pres)
            axis.text=element_text(size=30))   # moar BIG text

Hello, world!

The result:

plot of chunk hello_ips

Visualization

	x1	y1	x2	y2	x3	y3	x4	y4
1	10.00	8.04	10.00	9.14	10.00	7.46	8.00	6.58
2	8.00	6.95	8.00	8.14	8.00	6.77	8.00	5.76
3	13.00	7.58	13.00	8.74	13.00	12.74	8.00	7.71
4	9.00	8.81	9.00	8.77	9.00	7.11	8.00	8.84
5	11.00	8.33	11.00	9.26	11.00	7.81	8.00	8.47
6	14.00	9.96	14.00	8.10	14.00	8.84	8.00	7.04
7	6.00	7.24	6.00	6.13	6.00	6.08	8.00	5.25
8	4.00	4.26	4.00	3.10	4.00	5.39	19.00	12.50
9	12.00	10.84	12.00	9.13	12.00	8.15	8.00	5.56
10	7.00	4.82	7.00	7.26	7.00	6.42	8.00	7.91
11	5.00	5.68	5.00	4.74	5.00	5.73	8.00	6.89

Visualization

sapply(anscombe, mean)  # SAME mean

   x1    x2    x3    x4    y1    y2    y3    y4 
9.000 9.000 9.000 9.000 7.501 7.501 7.500 7.501

sapply(anscombe, sd)  # SAME standard deviatioin

   x1    x2    x3    x4    y1    y2    y3    y4 
3.317 3.317 3.317 3.317 2.032 2.032 2.030 2.031

sapply(anscombe, var)  # SAME variance

    x1     x2     x3     x4     y1     y2     y3     y4 
11.000 11.000 11.000 11.000  4.127  4.128  4.123  4.123

for (i in 1:4) cat(cor(anscombe[, i], anscombe[, i + 4]), "\n")

Visualization

plot of chunk anscombevis

Visualization

plot of chunk hello_ips2

plot of chunk moravis

“The human visual system is a pattern seeker of enormous power and subtlety. The eye and the visual cortex of the brain form a massively parallel processor that provides the highest bandwidth channel into human cognitive centers.”

– Colin Ware

Visualization - Science

Visualization: Preattentive Processing

Count the X’s in the picture:

Visualization: Preattentive Processing

Count the X’s in the picture:

Visualization: Saccadic Eye Movement

Preattentively driven
Ballistic

Visual suppression
Cannot change

Visualization: Saccadic Eye Movement

https://bodyinmovement.se/?pid=57&sub=41&sub2=37

Visualization - Science

Visualization - Working Memory

https://www4.symantec.com/mktginfo/whitepaper/053013_GL_NA_WP_Ponemon-2013-Cost-of-a-Data-Breach-Report_daiNA_cta72382.pdf

Visualization - With Data

Visualization - Science

Some botnets are so big...

…you can see them from space
(or at least, Google Earth)

Some botnets are so big...

…We can learn from the data

Some botnets are so big...

…We can get more data from Symantec.