DDS Dataset Collection

One of the reasons we wrote the book Data Driven Security and started the DDS blog & podcast was to provide security-related analysis and visualization examples in a data world full of flowers & dead bodies.

When you’re looking for risky data to play with, bookmark this page and check back often for updates. We’ll be updating it with data sets we feature on the blog and will provide pointers to other publicly-available security-related data sets at other sites.

Honeypots

  • marx.csv [4MB] : A tar/gzip’d CSV file from a collection of AWS honeypots. See Jay’s blog post for more information.
  • marx-geo.csv [7.2MB] : A tar/gzip’d CSV file from a collection of AWS honeypots with both long int and string IPv4 addresses and full geolocation information (via MaxMind GeoIP2)

Malware Domains

  • legit-dga_domains.csv [1.6MB] : A zip’d CSV file of domains and a high level classification of “dga” or “legit” along with a subclass of either “legit”, “cryptolocker”, “goz” or “newgoz”. See Jay’s blog series for more information.