By Jay Jacobs (@jayjacobs)
Sat 07 November 2015 | tags: blog, statistics, -- (permalink)

There is a lot of misperception around sample sizes and the confusion happens on both sides of the research. A common question when researchers are starting out is, “How big should my sample size be?.” To help with that, there are handy calculators all over the Internet. But the more ...

By Bob Rudis (@hrbrmstr)
Wed 07 October 2015 | tags: blog, r, rstats, -- (permalink)

We have some strange data in cybersecurity. One of the (IMO) stranger data files is a Domain Name System (DNS) zone file. This file contains mappings between domain names and IP addresses (and other things) represented by “resource records”.

Here’s an example for the dummy/example domain ...

By Bob Rudis (@hrbrmstr)
Sun 23 August 2015 | tags: blog, r, rstats, python, javascript, html, phantomjs, mhn, -- (permalink)

This was (initially) going to be a blog post announcing the new mhn R package (more on what that is in a bit) but somewhere along the way we ended up taking a left turn at Albuquerque (as we often do here at ddsec hq) and had an adventure in ...

By Bob Rudis (@hrbrmstr)
Sun 09 August 2015 | tags: blog, r, rstats, -- (permalink)

We just did a github release for an R package that provides an interface to the DomainTools API. It provides access to the core API functions that aren’t restricted (i.e. the ones we have access to):

By Bob Rudis (@hrbrmstr)
Fri 07 August 2015 | tags: blog, r, rstats, -- (permalink)

For those not involved with all things “cyber”, let me start with a description of what Shodan is (though visiting the site is probably the best introduction to what secrets it holds).

Shodan is—at it’s core—a search engine. Unlike Google, Shodan indexes what I’ll call “cyber ...

By Bob Rudis (@hrbrmstr)
Mon 27 July 2015 | tags: blog, r, rstats, -- (permalink)

UPDATE: RBerkeley is now on CRAN

If you made it to Chapter 8 of Data-Driven Security after ~October 2014 and tried to run the BerkeleyDB R example, you were greeted with:

Warning in install.packages :
  package ‘RBerkely’ is not available (for R version [YOUR_R_VERSION])

That’s due to the fact ...

By Bob Rudis (@hrbrmstr)
Wed 22 July 2015 | tags: blog, r, rstats, graph, bots, -- (permalink)

The R world has come a long way since Jay & I wrote Data-Driven Security. We had to make a conscious decision to stick with R 2.14.0 (R is at version 3.2.1 now) and packages such as knitr and dplyr either didn’t exist or were in ...

By Bob Rudis (@hrbrmstr)
Tue 14 July 2015 | tags: blog, r, rstats, time series, r101, -- (permalink)

We were asked a question on how to (in R) aggregate quarterly data from what I believe was a daily time series. This is a pretty common task and there are many ways to do this in R, but we’ll focus on one method using the zoo and dplyr ...

By Bob Rudis (@hrbrmstr)
Thu 09 July 2015 | tags: blog, r, rstats, xml, xslt, webscraping, -- (permalink)

Sometimes you just need the salient text from a web site, often as a first step towards natural language processing (NLP) or classification. There are many ways to achieve this, but XSLT (eXtensible Stylesheet Language) was purpose-built for slicing, dicing and transforming XML (and, hence, HTML) so, it can make ...

By Jay Jacobs (@jayjacobs)
Tue 07 July 2015 | tags: blog, -- (permalink)

I was recently asked for advice on hiring someone for a data science role. I gave some quick answers but thought the topic deserved more thought because I’ve not only had the experience of hiring for data science but also interviewing (I have recently changed jobs - hello BitSight!). So ...

Page 1 / 11 »