Jay Jacobs & Bob Rudis
May 13, 2014
Find us at:
And, on Twitter:
@jayjacobs@hrbrmstr@ddsecbloghttps://l.datadrivensecurity.info/1nbwaDB)
The Coder (Scripting/programming)
The Data Munger (Slice & dice data)
The Thinker (Critical & algorithmic thinking)
The Visualizer (Communication skills)
RStudio (www.rstudio.com)
Python (www.python.org)
IPython (ipython.org)
Tableau (www.tableausoftware.com)
Excel (yes, Excel)
https://l.datadrivensecurity.info/1kywuJWwww.flickr.com/photos/oskay/date,l_ipn,r_asn,f
2006-07-01,0,701,1
2006-07-01,0,714,1
2006-07-01,0,1239,1
2006-07-01,0,1680,1
2006-07-01,0,2514,1
2006-07-01,0,3320,1
2006-07-01,0,3561,13
2006-07-01,0,4134,3
2006-07-01,0,5617,2
2006-07-01,0,6478,1
2006-07-01,0,6713,1
2006-07-01,0,7132,1
2006-07-01,0,9105,1
2006-07-01,0,10738,1
2006-07-01,0,10994,1
2006-07-01,0,12334,1
2006-07-01,0,12524,1
2006-07-01,0,12542,1
2006-07-01,0,13343,1
Plotting workstation
network flow data from:
statweb.stanford.edu/~sabatti/data.htmlThree lines of code:
# read in workstation data
ips.dat <- read.csv("data/cs448b_ipasn.csv", 
                colClasses=c("Date", "character", 
                             "character", "numeric"))
# count all the flows
pivot <- count(ips.dat, .(l_ipn), wt_var=.(f))
# plot the data with bars
qp <- qplot(x=l_ipn, y=freq, data=pivot,       # data to use
            geom="bar",                        # make a bar chart
            main="Title # flows / IP") +       # title of chart
      coord_flip() +                           # horizontal bars
      theme_bw() +                             # remove chart junk
      theme(text=element_text(size=20),        # BIG text (for pres)
            axis.text=element_text(size=30))   # moar BIG text
The result:
| x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |
|---|---|---|---|---|---|---|---|---|
| 1 | 10.00 | 8.04 | 10.00 | 9.14 | 10.00 | 7.46 | 8.00 | 6.58 | 
| 2 | 8.00 | 6.95 | 8.00 | 8.14 | 8.00 | 6.77 | 8.00 | 5.76 | 
| 3 | 13.00 | 7.58 | 13.00 | 8.74 | 13.00 | 12.74 | 8.00 | 7.71 | 
| 4 | 9.00 | 8.81 | 9.00 | 8.77 | 9.00 | 7.11 | 8.00 | 8.84 | 
| 5 | 11.00 | 8.33 | 11.00 | 9.26 | 11.00 | 7.81 | 8.00 | 8.47 | 
| 6 | 14.00 | 9.96 | 14.00 | 8.10 | 14.00 | 8.84 | 8.00 | 7.04 | 
| 7 | 6.00 | 7.24 | 6.00 | 6.13 | 6.00 | 6.08 | 8.00 | 5.25 | 
| 8 | 4.00 | 4.26 | 4.00 | 3.10 | 4.00 | 5.39 | 19.00 | 12.50 | 
| 9 | 12.00 | 10.84 | 12.00 | 9.13 | 12.00 | 8.15 | 8.00 | 5.56 | 
| 10 | 7.00 | 4.82 | 7.00 | 7.26 | 7.00 | 6.42 | 8.00 | 7.91 | 
| 11 | 5.00 | 5.68 | 5.00 | 4.74 | 5.00 | 5.73 | 8.00 | 6.89 | 
sapply(anscombe, mean)  # SAME mean
   x1    x2    x3    x4    y1    y2    y3    y4 
9.000 9.000 9.000 9.000 7.501 7.501 7.500 7.501 
sapply(anscombe, sd)  # SAME standard deviatioin
   x1    x2    x3    x4    y1    y2    y3    y4 
3.317 3.317 3.317 3.317 2.032 2.032 2.030 2.031 
sapply(anscombe, var)  # SAME variance
    x1     x2     x3     x4     y1     y2     y3     y4 
11.000 11.000 11.000 11.000  4.127  4.128  4.123  4.123 
for (i in 1:4) cat(cor(anscombe[, i], anscombe[, i + 4]), "\n")
0.8164 
0.8162 
0.8163 
0.8165