Jay Jacobs & Bob Rudis
May 13, 2014
Find us at:
And, on Twitter:
@jayjacobs
@hrbrmstr
@ddsecblog
https://l.datadrivensecurity.info/1nbwaDB
)
The Coder (Scripting/programming)
The Data Munger (Slice & dice data)
The Thinker (Critical & algorithmic thinking)
The Visualizer (Communication skills)
RStudio (www.rstudio.com
)
Python (www.python.org
)
IPython (ipython.org
)
Tableau (www.tableausoftware.com
)
Excel (yes, Excel)
https://l.datadrivensecurity.info/1kywuJW
www.flickr.com/photos/oskay/
date,l_ipn,r_asn,f
2006-07-01,0,701,1
2006-07-01,0,714,1
2006-07-01,0,1239,1
2006-07-01,0,1680,1
2006-07-01,0,2514,1
2006-07-01,0,3320,1
2006-07-01,0,3561,13
2006-07-01,0,4134,3
2006-07-01,0,5617,2
2006-07-01,0,6478,1
2006-07-01,0,6713,1
2006-07-01,0,7132,1
2006-07-01,0,9105,1
2006-07-01,0,10738,1
2006-07-01,0,10994,1
2006-07-01,0,12334,1
2006-07-01,0,12524,1
2006-07-01,0,12542,1
2006-07-01,0,13343,1
Plotting workstation
network flow data from:
statweb.stanford.edu/~sabatti/data.html
Three lines of code:
# read in workstation data
ips.dat <- read.csv("data/cs448b_ipasn.csv",
colClasses=c("Date", "character",
"character", "numeric"))
# count all the flows
pivot <- count(ips.dat, .(l_ipn), wt_var=.(f))
# plot the data with bars
qp <- qplot(x=l_ipn, y=freq, data=pivot, # data to use
geom="bar", # make a bar chart
main="Title # flows / IP") + # title of chart
coord_flip() + # horizontal bars
theme_bw() + # remove chart junk
theme(text=element_text(size=20), # BIG text (for pres)
axis.text=element_text(size=30)) # moar BIG text
The result:
x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |
---|---|---|---|---|---|---|---|---|
1 | 10.00 | 8.04 | 10.00 | 9.14 | 10.00 | 7.46 | 8.00 | 6.58 |
2 | 8.00 | 6.95 | 8.00 | 8.14 | 8.00 | 6.77 | 8.00 | 5.76 |
3 | 13.00 | 7.58 | 13.00 | 8.74 | 13.00 | 12.74 | 8.00 | 7.71 |
4 | 9.00 | 8.81 | 9.00 | 8.77 | 9.00 | 7.11 | 8.00 | 8.84 |
5 | 11.00 | 8.33 | 11.00 | 9.26 | 11.00 | 7.81 | 8.00 | 8.47 |
6 | 14.00 | 9.96 | 14.00 | 8.10 | 14.00 | 8.84 | 8.00 | 7.04 |
7 | 6.00 | 7.24 | 6.00 | 6.13 | 6.00 | 6.08 | 8.00 | 5.25 |
8 | 4.00 | 4.26 | 4.00 | 3.10 | 4.00 | 5.39 | 19.00 | 12.50 |
9 | 12.00 | 10.84 | 12.00 | 9.13 | 12.00 | 8.15 | 8.00 | 5.56 |
10 | 7.00 | 4.82 | 7.00 | 7.26 | 7.00 | 6.42 | 8.00 | 7.91 |
11 | 5.00 | 5.68 | 5.00 | 4.74 | 5.00 | 5.73 | 8.00 | 6.89 |
sapply(anscombe, mean) # SAME mean
x1 x2 x3 x4 y1 y2 y3 y4
9.000 9.000 9.000 9.000 7.501 7.501 7.500 7.501
sapply(anscombe, sd) # SAME standard deviatioin
x1 x2 x3 x4 y1 y2 y3 y4
3.317 3.317 3.317 3.317 2.032 2.032 2.030 2.031
sapply(anscombe, var) # SAME variance
x1 x2 x3 x4 y1 y2 y3 y4
11.000 11.000 11.000 11.000 4.127 4.128 4.123 4.123
for (i in 1:4) cat(cor(anscombe[, i], anscombe[, i + 4]), "\n")
0.8164
0.8162
0.8163
0.8165