By Bob Rudis (@hrbrmstr)
Thu 09 July 2015
|
tags:
blog,
r,
rstats,
xml,
xslt,
webscraping,
-- (permalink)
Sometimes you just need the salient text from a web site, often as a first step towards natural language processing (NLP) or classification. There are many ways to achieve this, but XSLT (eXtensible Stylesheet Language) was purpose-built for slicing, dicing and transforming XML (and, hence, HTML) so, it can make ...