Initial version November 2020; Updated August 2021
The tidyCpp
package offers a simple, small and clean C++
layer over the C API offered by R. As of version 0.0.4, it also adds a
(truly minimal) numeric vector class for C++. This vignette highlights a
few usage examples, often taken from the Writing
R Extensions vignette that comes with R, to highlight some
features.
tidyCpp
has no further dependencies on any other
package. It can however be used with Rcpp simply to take advantage of its
helper functions cppFunction()
or
sourceCpp()
.
tidyCpp
is still a fairly small package. Please free to
contribute by make suggestions, or sending bugfixes or extension
proposals.
This example comes from Writing R Extension, Section 5.9.4 which highlights attribute setting from the C API.
It takes two (named) numeric vectors, computes the outer product
matrix and uses the names to set row- and column names. Note that we
modified the existing example ever so slight by ensuring (as is
frequently done) remapping of symbols. For example, length
(which can clash easily with existing symbols in the global namespace)
is now Rf_length
. We also added an export
tag
for Rcpp
simply to facilitate integration into R. No Rcpp
header or data structures are used; we simply rely on its logic in
getting C or C++ source into R.
#define R_NO_REMAP
#include <R.h>
#include <Rinternals.h>
// [[Rcpp::export]]
(SEXP x, SEXP y)
SEXP out{
int nx = Rf_length(x), ny = Rf_length(y);
=
SEXP ans (Rf_allocMatrix(REALSXP, nx, ny));
PROTECTdouble *rx = REAL(x),
*ry = REAL(y),
*rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
[i + nx*j] = tmp * ry[j];
rans}
=
SEXP dimnames (Rf_allocVector(VECSXP, 2));
PROTECT(dimnames, 0,
SET_VECTOR_ELT(x,R_NamesSymbol));
Rf_getAttrib(dimnames, 1,
SET_VECTOR_ELT(y,R_NamesSymbol));
Rf_getAttrib(ans, R_DimNamesSymbol, dimnames);
Rf_setAttrib
(2);
UNPROTECTreturn ans;
}
#include <tidyCpp>
// [[Rcpp::depends(tidyCpp)]]
// [[Rcpp::export]]
(SEXP x, SEXP y)
SEXP out{
int nx = R::length(x), ny = R::length(y);
::Protect ans(R::allocMatrixReal(nx, ny));
R
double *rx = R::numericPointer(x),
*ry = R::numericPointer(y),
*rans = R::numericPointer(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
[i + nx*j] = tmp * ry[j];
rans}
::Protect dimnames(R::allocVectorList(2));
R
::setVectorElement(dimnames, 0,
R::getNames(x));
R::setVectorElement(dimnames, 1,
R::getNames(y));
R::setDimNames(ans, dimnames);
R
return ans;
}
Some key differences:
tidyCpp
: simple and clean;PROTECT
and UNPROTECT
with manual
calling of the number of calls made: C++ takes care of that for us via
Protect
which is a modernized (and simplified) version of
class Shield
in Rcpp (and see below for more discussion of
Protect
);Rf_*
calls: everything used comes from a clean new
namespace R
and is easily identified;Protect
are capitalized,
andR::getNames(x)
instead of
Rf_getAttrib(x, R_NamesSymbol)
;Note that the use of Rcpp::export
does not imply use of
Rcpp data structures. We simply take advantaged of the tried and true
code generation to make it easy to call the example from R. You can copy
either example into a temporary file and use
Rcpp::sourceCpp("filenameHere")
on it to run the
example.
This example comes from Writing
R Extension, Section 5.10.1 which introduces the
.Call()
interface of the C API for R.
It takes two numeric vectors and computes a convolution. Note that as
above we modified the existing example ever so slight by ensuring (as is
frequently done) remapping of symbols, once again added an
export
tag for Rcpp
simply to facilitate
integration into R, and changing whitespace. No Rcpp header or data
structures are used; we simply rely on its logic in getting C or C++
source into R.
#define R_NO_REMAP
#include <R.h>
#include <Rinternals.h>
// [[Rcpp::export]]
(SEXP a, SEXP b)
SEXP convolve2{
int na, nb, nab;
double *xa, *xb, *xab;
;
SEXP ab
= PROTECT(Rf_coerceVector(a, REALSXP));
a = PROTECT(Rf_coerceVector(b, REALSXP));
b = Rf_length(a);
na = Rf_length(b);
nb = na + nb - 1;
nab = PROTECT(Rf_allocVector(REALSXP, nab));
ab = REAL(a);
xa = REAL(b);
xb = REAL(ab);
xab for(int i = 0; i < nab; i++)
[i] = 0.0;
xabfor(int i = 0; i < na; i++)
for(int j = 0; j < nb; j++)
[i + j] += xa[i] * xb[j];
xab(3);
UNPROTECTreturn ab;
}
#include <tidyCpp>
// [[Rcpp::depends(tidyCpp)]]
// [[Rcpp::export]]
(SEXP a, SEXP b)
SEXP convolve2{
int na, nb, nab;
double *xa, *xb, *xab;
::Protect pa(R::coerceVectorNumeric(a));
R::Protect pb(R::coerceVectorNumeric(b));
R= R::length(pa);
na = R::length(pb);
nb = na + nb - 1;
nab ::Protect ab(R::allocVectorNumeric(nab));
R= R::numericPointer(pa);
xa = R::numericPointer(pb);
xb = R::numericPointer(ab);
xab for(int i = 0; i < nab; i++)
[i] = 0.0;
xabfor(int i = 0; i < na; i++)
for(int j = 0; j < nb; j++)
[i + j] += xa[i] * xb[j];
xabreturn ab;
}
Like the previous example, the new version operates without macros,
does not require manual counting in PROTECT
and
UNPROTECT
and is, to our eyes, a little more readable.
As the existing example from Writing
R Extension, Section 5.10.1 used PROTECT
on the two
incoming SEXP
objects (whereas the previous example, from
the same source, does not) we need to allocate two tempary objects
pa
and pb
with the explicit C++ ownership
providing the protect and unprotect pairing. Because pa
and
pb
go out of scope at the end of the function, the
destructor will then unprotect correctly.
(And as above, the two ‘tags’ for Rcpp use are present only to
facilitate use via the Rcpp::sourceCpp()
package.)
For the third example, we use an unrelated package: uchardet which
provides R bindings to the eponymous C++ library to detect character
encodings. This example cannot be sourced simply into R as it requires
the underlying C++ library. One can, however, download the R package and
then replace the file src/detect.cpp
with the content
below, and add LinkingTo: tidyCpp
to the
DESCRIPTION
file.
#include <tidyCpp>
#include <R_ext/Visibility.h>
#include <fstream>
#include <uchardet.h>
#define BUFFER_SIZE 65536
char buffer[BUFFER_SIZE];
(uchardet_t handle) {
SEXP attribute_hidden get_charsetconst char* ans = uchardet_get_charset(handle);
if (strlen(ans) == 0) {
// Rf_warning("Can not detect encoding.");
return NA_STRING;
}
return R::mkChar(ans);
}
(SEXP x, uchardet_t handle) {
SEXP attribute_hidden do_detect_sexpR_xlen_t x_len = R::xlength(x);
if (x_len == 0) {
return NA_STRING;
}
const char* x_data;
switch(R::Typeof(x)) {
case CHARSXP: {
if (x == NA_STRING) {
return NA_STRING;
}
= R::charPointer(x);
x_data break;
}
case RAWSXP: {
= (const char*) R::rawPointer(x);
x_data break;
}
default: {
::warning("Unsupported data type '%s'.", R::type2char(R::Typeof(x)));
Rreturn NA_STRING;
}
}
int retval = uchardet_handle_data(handle, x_data, x_len);
(handle);
uchardet_data_endif (retval != 0) {
// Rf_warning("Can not handling data.");
return NA_STRING;
}
return get_charset(handle);
}
(SEXP x, uchardet_t handle) {
SEXP attribute_hidden do_detect_fileif (x == NA_STRING) {
return NA_STRING;
}
const char* fname = R::charPointer(x);
std::ifstream fs(R_ExpandFileName(fname), std::ios::binary);
if (!fs.is_open()) {
::warning("Can not open file '%s'.", fname);
Rreturn NA_STRING;
}
while (!fs.eof()) {
.read(buffer, BUFFER_SIZE);
fsstd::size_t len = fs.gcount();
(handle, buffer, len);
uchardet_handle_data}
(handle);
uchardet_data_end.close();
fs
return get_charset(handle);
}
template<typename T>
(SEXP x, T fun) {
SEXP attribute_hidden do_detect_vecif (R::Typeof(x) != STRSXP) {
::error("'x' must be character vector.");
R}
R_xlen_t n = R::xlength(x);
::Protect res(R::allocVectorCharacter(n));
Ruchardet_t handle = uchardet_new();
for (R_len_t i = 0; i < n; ++i) {
::setStringElement(res, i, fun(R::stringElement(x, i), handle));
R(handle);
uchardet_reset}
(handle);
uchardet_deletereturn res;
}
(SEXP x) {
SEXP detect_characterreturn do_detect_vec(x, do_detect_sexp);
}
(SEXP x) {
SEXP detect_filereturn do_detect_vec(x, do_detect_file);
}
(SEXP x) {
SEXP detect_rawif (R::Typeof(x) != RAWSXP) {
::error("'x' must be raw vector.");
R}
::Protect res(R::allocVectorCharacter(1));
Ruchardet_t handle = uchardet_new();
::setStringElement(res, 0, do_detect_sexp(x, handle));
R(handle);
uchardet_deletereturn res;
}
For the fourth example, we modified a function from an older version
of package ichimoku. It
implements a rolling minimum and maximum operator (in an earlier version
of the package, see e.g. for
the file. Its use of Rcpp is fairly standard, however the package
chose to make changes based on the compilation time so we took a look
too. The version below deploys tidyCpp
and its new
numvec
header and class instead (and we tested this in a
local fork of the package).
To make the code fit in the dual display below, we added linebreaks
on the left, and adjusted whitespace. Our code on the right also
slightly changes the interface by simplyfing the implementation: without
the enum
and struct Args
and dual callers, we
add a third argument to determine whether we operate as min or max which
saves the two extra functions at the bottom. We also altered whitespace
away from our preferred use of four spaces; see the original function
(also containing full copyright headers and more) here.
The key point, however, is immediately apparent. The two version are essentially identical (though the Rcpp version will have more type checks, exception handling, wrapper generation and all the other reasons why often use Rcpp).
#include <deque>
#include <utility>
#include <Rcpp.h>
using namespace Rcpp;
// types of calculations
enum CalcType {MIN, MAX};
// function arguments for non-data
struct Args {
int window;
;
CalcType ctype};
// calculates rolling window for {min, max}
NumericVector(const NumericVector& x, Args a) {
roll_minmax
int n = x.length();
(n);
NumericVector rollx
std::deque<std::pair<long double, int>> deck;
for (int i = 0; i < x.size(); ++i) {
if(a.ctype == MIN) {
while (!deck.empty() &&
.back().first >= x[i])
deck.pop_back();
deck} else {
while (!deck.empty() &&
.back().first <= x[i])
deck.pop_back();
deck}
.push_back(std::make_pair(x[i], i));
deck
while(deck.front().second <= i - a.window)
.pop_front();
deck
long double min = deck.front().first;
if (i < a.window - 1) {
[i] = NA_REAL;
rollx} else {
[i] = min;
rollx}
}
return rollx;
}
// [[Rcpp::export]]
(const SEXP& x,
NumericVector maxOverint window) {
;
Args a.window = window;
a.ctype = MAX;
areturn roll_minmax(x, a);
}
// [[Rcpp::export]]
(const SEXP& x,
NumericVector minOverint window) {
;
Args a.window = window;
a.ctype = MIN;
areturn roll_minmax(x, a);
}
#include <deque>
#include <utility>
#include <tidyCpp>
extern "C" {
// forward declaration
::NumVec rollMinMax(tidy::NumVec x,
tidyint window, bool isMin);
// this SEXP variant is referenced from init.c
(SEXP x, SEXP win, SEXP isMin){
SEXP _rollMinMaxreturn rollMinMax(x, R::asInteger(win),
::asLogical(isMin)));
R}
// Calculates rolling window for {min, max}
::NumVec
tidy(tidy::NumVec x, int win, bool isMin){
rollMinMax
int n = R::length(x);
::NumVec rollx(n);
tidy
std::deque<std::pair<long double, int>> deck;
for (int i = 0; i < n; ++i) {
if (isMin) {
while (!deck.empty() &&
.back().first >= x[i])
deck.pop_back();
deck} else {
while (!deck.empty() &&
.back().first <= x[i])
deck.pop_back();
deck}
.push_back(std::make_pair(x[i], i));
deck
while(deck.front().second <= i - win)
.pop_front();
deck
long double min = deck.front().first;
if (i < win - 1) {
[i] = NA_REAL;
rollx} else {
[i] = min;
rollx}
}
return rollx;
}
} // extern "C"
The R::Protect()
class ensures proper
PROTECT
wrapping for the lifetime of an object. A very
important, yet easy-to-overlook, detail is that the form of assigning to
a SEXP
variable looks correct, but is in fact
incorrect. For example in
SEXP ans = R::Protect(R::allocMatrixReal(nx, ny));
, the
class will correctly construct around the result from
allocMatrix()
. But as it assigned to a SEXP
variable, the compiler realizes that it is a temporary object and after
calling operator SEXP()
the destructor is
called—essentially immediately.
So the correct use is to write
R::Protect ans(R::allocMatrixReal(nx, ny));
which turns
this into an instance of the the Protect
which will live to
the end of the scope and only UNRPROTECT
via the destructor
at the end of the scope.
The example file snippets/protectExamples.cpp
illustrates this by using RcppSpdlog to
log invocation of the three relevant parts constructor, destructor and
operator SEXP()
for both the correct and incorrect form of
the ’convolution()example. When
sourceCpp()`-ed into an R
session, and skipping the first example in the file, we may see the
following result (with of course different timestamps).
> convolveIncorrect(c(1, 2, 3), c(4, 5, 6))
R11:25:56.418260] starting convolveIncorrect
[11:25:56.418286] entered ctor
[11:25:56.418288] entered SEXP()
[11:25:56.418290] entered dtor
[11:25:56.418291] entered ctor
[11:25:56.418293] entered SEXP()
[11:25:56.418295] entered dtor
[11:25:56.418297] entered ctor
[11:25:56.418298] entered SEXP()
[11:25:56.418300] entered dtor
[11:25:56.418302] ending convolveIncorrect
[1] 4 13 28 27 18
[
> convolveCorrect(c(1, 2, 3), c(4, 5, 6))
R11:25:56.418431] starting convolveCorrect
[11:25:56.418434] entered ctor
[11:25:56.418435] entered ctor
[11:25:56.418436] entered SEXP()
[11:25:56.418438] entered SEXP()
[11:25:56.418439] entered ctor
[11:25:56.418441] entered SEXP()
[11:25:56.418442] entered SEXP()
[11:25:56.418444] entered SEXP()
[11:25:56.418445] ending convolveCorrect
[11:25:56.418447] entered SEXP()
[11:25:56.418449] entered dtor
[11:25:56.418450] entered dtor
[11:25:56.418452] entered dtor
[1] 4 13 28 27 18 [
In the first example, we see that the triplet constructor/SEXP()/destructor is called twice when the two vectors are coerced, and then a third time when the result vector is allocated. This shows the incorrect behavior: destructors essentially immediately after constructors, leaving the object unprotected—which is not what was intended.
The second example shows the correct behavior. Two constructor calls,
then two SEXP()
calls (when the length are determined),
another constructor followed by three SEXP()
calls from the
three numericPointer()
calls—as well as a final
SEXP()
call for the return result and three destructors at
end. This is the intended behavior of protecting the three objects
during their lifetime.
tidyCpp
provides a cleaner layer on top of the C API for
R (as well as a so-far still minimal C++ class layer). That has its
advantages: we find it more readable. It conceivably has possible
disadvantages. Those familiar with the C API for R may not need this,
and may find it an unnecessary new dialect. Time will tell if new
adoption and use may outway possible hesitation by other. In the
meantime, the package “does no harm”, has no further dependencies and
can be used, or dropped, at will
The tidyCpp
package provides a simplifying layer on top
of the time-tested but somewhat crusty C API for R.