A podcast on the journey to discovery through data in information security by Jay Jacobs and Bob Rudis. Episode 10: Community Discussion with John Langton & Alex Baker of VisiTrend
Episode 10
In this episode, Jay & Bob have a community discussion with John Langton & Alex Baker about their security data analysis & visualization startup: VisiTrend, and take a look at what's made the headlines in the data science community since last show.
Resources / people featured in the show:
- VisiTrend - visitrend (twitter)
- Data science can't be point and click
- In-depth introduction to machine learning in 15 hours of expert videos
- Data Playlists
- Running RStudio via Docker in the Cloud
- Building a DGA Classsifier (in R) - Part 1
- Building a DGA Classsifier (in R) - Part 2
- Building a DGA Classsifier (in R) - Part 3
Link Insights from VisiTrend
VERIS/VCDB general vis - we have a tree map version of the actors, actions, assets, and attributes breakdown which better shows the distribution of events (description on snapshot).
Snapshot - can be posted and viewed without logging in
Actual analysis and data you can load after signing up and logging in
VERIS/VCDB clustering - each square is an event in the data set. Squares are first grouped based on # of employees (e.g. companies with 1k employees will be grouped together), and then based on industry. Squares are colored based on clustering output - we found 7 clusters. We will provide more detail on what defines these clusters in a blog post. It’s interesting to see that particular industries do have particular attack types according to clustering, shown by blocks of similar color.
Snapshot - Actual analysis and data
Honeypot overview - this is really cool (I think). Black, square nodes are the honey pots. Node size is based on the # of packets they’re sending. Computers use more different ports are colored red (big red guy doing massive port scan drowns out the others). The force directed layout clusters nodes if they hit the same honeypots. For instance, click a node in an “outer ring” twice to highlight the honeypot it’s hitting, and it will be one. All other nodes in that ring hit the same one. Double click one of the center nodes and you’ll se they’re hitting all of the honeypots. Treemap groups nodes according to subnet addressing. The timeline view shows time-based histogram of packets coming in colored by destination port. The red guy is selected in the snapshot, so you can see that he blasts all the honey pots at relatively same time.
Snapshot - Actual analysis and data
Honeypot port highlighting - Square nodes are attackers, and circle nodes are ports. Size of the port is how many times packets were sent to that port. Mouse over big purple circle and you see port 1433 is the most popular. You could double click it to see all machines hitting that port. There are two color layers for the node-link graph, you can toggle between them. They both show a version of variability over time (more red = more variable port usage). Treemap shows subnet addressing again but colors a green heat map based on # of diff ports each machine uses. Size based on # of packets they send.
Snapshot - Actual analysis and data
Finally, a great mentor and visionary pioneer of InfoVis named Matt Ward passed away last weekend. He wrote the most recent, comprehensive infovis book with some other really big guys in the field including Keim and Grinnel. Link to the book.
In This Episode
Time Index | Title |
00:00:00 | Data Driven Security |
00:00:30 | Intro + SIRAcon recap |
00:03:18 | Introducing John Langton and Alex Baker of VisiTrend |
00:07:03 | What is VisiTrend? |
00:12:50 | What types of analytics back-ends does VisiTrend support? |
00:17:33 | Were you surprised at the state of infosec-specific data exporation tools? |
00:21:59 | Building the VisiTrend Community |
00:24:06 | The science behind visualization |
00:27:34 | Comparing visualizations on the RSA floor |
00:29:51 | Balance of visualization for analysis and visualization for communication |
00:31:49 | How do you handle concerns over putting production data in the cloud? |
00:34:16 | How can we improve visualization literacy? |
00:37:32 | Do folks need to be "hadoop experts" to work with your tool? |
00:41:28 | Data Science Headlines |
00:41:41 | Building a DGA Classifier three-fer |
00:44:01 | Data Science Can't be Point & Click |
00:48:59 | Machine Learning in 15 hours |
00:50:49 | Data Playlists |
00:52:45 | Docker for Data Science |
00:56:32 | Wrapping up |