Data-Driven Security: Elements of Security Data Science

Jay Jacobs & Bob Rudis
May 13, 2014

Data-Driven Security

Find us at:

And, on Twitter:

  • @jayjacobs
  • @hrbrmstr
  • @ddsecblog

Elements

Skills: The Hacker

  • The Coder (Scripting/programming)

  • The Data Munger (Slice & dice data)

  • The Thinker (Critical & algorithmic thinking)

  • The Visualizer (Communication skills)

Skills: The Security Domain Expert

domains

Process

flow

Tools

tools

More at https://l.datadrivensecurity.info/1kywuJW

CC BY 2.0 | Windell Oskay | www.flickr.com/photos/oskay/

Hello, world!

date,l_ipn,r_asn,f
2006-07-01,0,701,1
2006-07-01,0,714,1
2006-07-01,0,1239,1
2006-07-01,0,1680,1
2006-07-01,0,2514,1
2006-07-01,0,3320,1
2006-07-01,0,3561,13
2006-07-01,0,4134,3
2006-07-01,0,5617,2
2006-07-01,0,6478,1
2006-07-01,0,6713,1
2006-07-01,0,7132,1
2006-07-01,0,9105,1
2006-07-01,0,10738,1
2006-07-01,0,10994,1
2006-07-01,0,12334,1
2006-07-01,0,12524,1
2006-07-01,0,12542,1
2006-07-01,0,13343,1

Plotting workstation
network flow data from:

statweb.stanford.edu/~sabatti/data.html

img

Hello, world!

Three lines of code:

# read in workstation data
ips.dat <- read.csv("data/cs448b_ipasn.csv", 
                colClasses=c("Date", "character", 
                             "character", "numeric"))

# count all the flows
pivot <- count(ips.dat, .(l_ipn), wt_var=.(f))

# plot the data with bars
qp <- qplot(x=l_ipn, y=freq, data=pivot,       # data to use
            geom="bar",                        # make a bar chart
            main="Title # flows / IP") +       # title of chart
      coord_flip() +                           # horizontal bars
      theme_bw() +                             # remove chart junk
      theme(text=element_text(size=20),        # BIG text (for pres)
            axis.text=element_text(size=30))   # moar BIG text

Hello, world!

The result:

plot of chunk hello_ips

Visualization

x1 y1 x2 y2 x3 y3 x4 y4
1 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.58
2 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.76
3 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.71
4 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.84
5 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.47
6 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.04
7 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.25
8 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.50
9 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56
10 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.91
11 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89

Visualization

sapply(anscombe, mean)  # SAME mean
   x1    x2    x3    x4    y1    y2    y3    y4 
9.000 9.000 9.000 9.000 7.501 7.501 7.500 7.501 
sapply(anscombe, sd)  # SAME standard deviatioin
   x1    x2    x3    x4    y1    y2    y3    y4 
3.317 3.317 3.317 3.317 2.032 2.032 2.030 2.031 
sapply(anscombe, var)  # SAME variance
    x1     x2     x3     x4     y1     y2     y3     y4 
11.000 11.000 11.000 11.000  4.127  4.128  4.123  4.123 
for (i in 1:4) cat(cor(anscombe[, i], anscombe[, i + 4]), "\n")
0.8164 
0.8162 
0.8163 
0.8165 

Visualization

plot of chunk anscombevis

Visualization

plot of chunk hello_ips2

plot of chunk moravis

“The human visual system is a pattern seeker of enormous power and subtlety. The eye and the visual cortex of the brain form a massively parallel processor that provides the highest bandwidth channel into human cognitive centers.”

– Colin Ware

Visualization - Science

Visualization: Preattentive Processing

Count the X’s in the picture:

Visualization: Preattentive Processing

Count the X’s in the picture:

Visualization: Saccadic Eye Movement

 

 

  • Preattentively driven
  • Ballistic
    • Visual suppression
    • Cannot change

Visualization: Saccadic Eye Movement

https://bodyinmovement.se/?pid=57&sub=41&sub2=37
https://bodyinmovement.se/?pid=57&sub=41&sub2=37
https://bodyinmovement.se/?pid=57&sub=41&sub2=37

Visualization - Science

Visualization - Working Memory

https://www4.symantec.com/mktginfo/whitepaper/053013_GL_NA_WP_Ponemon-2013-Cost-of-a-Data-Breach-Report_daiNA_cta72382.pdf

Visualization - With Data

Visualization - With Data

Visualization - With Data

Visualization - Science

Some botnets are so big...

…you can see them from space
(or at least, Google Earth)

Some botnets are so big...

…We can learn from the data

Some botnets are so big...

…We can get more data from Symantec.

Some botnets are so big...

Some botnets are so big...

Some botnets are so big...

Some botnets are so big...

Some botnets are so big...

Some botnets are so big...