 Vectorizing IPv4 Address Conversions - Part 1

By Bob Rudis (@hrbrmstr)
Fri 16 May 2014 | tags: rstats, r, rcpp, -- (permalink)

Our previous post showed how to speed up the conversion of IPv4 addresses to/from integer format by taking advantage of a simple `Rcpp` wrapper to “boosted” native functions. However, to convert more than one IP address, you need to stick those functions into one of the R `*apply` functions, which does the job, but is not an optimal solution. Ideally, it would be advantageous to be able to pass in a vector (with more than one element) of character IP addresses or a vector of integer format IP addresses and know that the function will “just work”.

In this post we’ll introduce a shortcut method of vectorization with the `Vectorize()` function. Then, in the second and final part of the series, we’ll look at implementing the necessary code at the `Rcpp` layer to perform the vectorization at the C++-level and show some benchmarks for each method.

### The Vectorize() Shortcut

At the end of our previous exercise, we had two functions: `rinet_pton()` & `rinet_ntop()`. Each took a single argument (the former a single element character vector and the latter a single element numeric vector) and returned a single element vector as a result. Let’s vectorize each one using the `Vectorize()` function:

```# the following code assumes you've already done the "sourceCpp" in the prev article

ip_to_long <- Vectorize(rinet_pton)
long_to_ip <- Vectorize(rinet_ntop)
```

Yes, that’s all it takes. Now we can pass in a vector of one or more elements and each function will return a vector of the same size as a result. The proof is in the output, so let’s give them a go, first with the original single-element vector use case:

```# try a single IP address first

ip_to_long("10.0.0.0")
##  10.0.0.0
## 167772160

long_to_ip(167772160)
##  "10.0.0.0"
```

So far, so good except that the default behavior (in `Vectorize()`) of producing a named vector when a character vector is passed in is probably not what we really want, so we’ll tweak the call to `Vectorize()` for each function:

```ip_to_long <- Vectorize(rinet_pton, USE.NAMES=FALSE)
long_to_ip <- Vectorize(rinet_ntop, USE.NAMES=FALSE)

ip_to_long("10.0.0.0")
##  167772160

long_to_ip(167772160)
##  "10.0.0.0"
```

Now, let’s test it with more than one element:

```srcIp <- c("146.178.58.99", "174.5.172.152", "146.178.58.99", "213.186.42.8",
"146.178.58.99", "170.138.152.142", "170.138.152.142", "174.5.172.152",
"146.178.58.99", "213.186.42.8")

srcInt <- c(2461153891, 2919607448, 2461153891, 3585747464, 2461153891,
2861209742, 2861209742, 2919607448, 2461153891, 3585747464)

ip_to_long(srcIp)
##   2461153891 2919607448 2461153891 3585747464 2461153891 2861209742
##   2861209742 2919607448 2461153891 3585747464

long_to_ip(srcInt)
##   "146.178.58.99"   "174.5.172.152"   "146.178.58.99"
##   "213.186.42.8"    "146.178.58.99"   "170.138.152.142"
##   "170.138.152.142" "174.5.172.152"   "146.178.58.99"
##  "213.186.42.8"
```

Everything works as expected and we can now use those conversion routines without resorting to `*apply` calls.

To see what `Vectorize()` does under the covers, just enter `ip_to_long` or `long_to_ip` at an R console prompt without the parenthesis. This will show the source of the functions that `Vectorize()` built. Try to build your own vectorized versions by trimming down what’s in the generated source code.
We’ll see how to perform the same vectorization task at the `Rcpp` level in the next post and put each version in a head-to-head benchmark test. NOTE: Using `Rcpp` with R markdown takes some extra steps, and I’ve posted a gist that shows some of the options you need to set to ensure the `Rcpp` code compiles and links properly and also the wicked-cool way you can embed `Rcpp` code right in markdown documents.