Skip to content

Rcpp is not using R_PreserveObject/R_ReleaseObject` as intended #1081

@ltierney

Description

@ltierney

This is based on an issue found by @bhive01 and discussed with him
and @clauswilke in wilkelab/gridtext#9.

Running the code included at the end in R versions up through 4.0.0
with the C stack set to 1M will segfault with a C stack overflow. The
same happens (for me on Ubuntu at least) with the default 8M stack if
you change the value of N to 1000.

Running this code as is (i.e. N == 500) and with an 8M stack will not
fail, but takes 14 minutes due to several multi-minute-long pauses in
the second loop.

The problem is that this code, via calls to R_PreserveObject in
Rcpp, puts around 300K objects into the preserved object list. This is
a simple linked list, so has to be searched linearly to remove objects
pushed on early. That is the reason for the long pauses. The segfault
happens because the code for R_ReleaseObject was using recursion to
work down the list. R-devel and R-patched now use iteration and no
longer segfault, but that doesn't help anyone running R 4.0.0 or
earlier.

The R Extensions Manual says about the
R_PreserveObject/R_ReleaseObject mechanism:

It is less efficient than the normal protection mechanism, and
should be used sparingly.

The linked list data structure, and even the recursive delete, are
reasonable for this usage but are not suited for handling 300K objects
that may need to be removed in random or first in/first-out order.

Assuming you need to keep alive R objects, a better way to do this is
to create an appropriate data structure that you control, and protect
only that data structure with R_PreserveObject (and maybe release it
on unload with R_ReleseObject). That way you can use a more
appropriate data structure, such as a hash table. Using a hash table
the code that currently takes 14 minutes would only take about 1.5
minutes. The improvement is even larger for N == 1000.

Even if you don't want to go to a hash table now it would be a good
idea to switch to using your own list so users of R 4.0.0 or earlier
don't run the risk of a segfault.

For using your own list you need something like this

static SEXP Rcpp_precious = NULL;
	
void R_init_Rcpp(DllInfo* dllinfo) {
    Rcpp_precious = CONS(R_NilValue, R_NilValue);
	R_PreserveObject(Rcpp_precious);
   ...
}

Then replace R_PreserveObject and R_ReleaseObject with

void Rcpp_PreserveObject(SEXP object)
{
    SETCDR(Rcpp_precious, CONS(object, CDR(Rcpp_precious)));
}

void Rcpp_ReleaseObject(SEXP object)
{
    SETCDR(Rcpp_precious, DeleteFromList(object, CDR(Rcpp_precious)));
}

where (this is now in R_devel/R_patched)

static SEXP DeleteFromList(SEXP object, SEXP list)
{
    if (CAR(list) == object)
		return CDR(list);
    else {
        SEXP last = list;
	    for (SEXP head = CDR(list); head != R_NilValue; head = CDR(head)) {
	        if (CAR(head) == object) {
                SETCDR(last, CDR(head));
                return list;
	        }
	        else last = head;
	    }
	    return list;
    }
}

This maintains your own list, which is protected by being in the CDR
field of the protected Rcpp_precious list cell. This will give you
the same reliability and performance across all R versions. [Code is
not tested, so there may be typos, but that is design I recommend].

If you want better performance for a use pattern as exhibited in the
example here you could look at using a hash table stored in a VECSXP
placed in one of the cells of Rcpp_precious. In a quick and dirty
experiment a simple hash table did much better, but the code is of
course more complicated.

All that said, I have a feeling that it should be possible to do
better by using a combination of the protected fields in external
pointer objects and weak references. If you have a document describing
the design of your memory management I could take a look and see if I
can suggest something.

Here is the code to run, from @clauswilke in wilkelab/gridtext#9.

library(ggplot2)
library(ggtext)
library(grid)

plot_grob <- function(df) {
  p <- ggplot(data.frame(x = 1, y = 1)) +
    geom_point(aes(x, y)) +
    labs(caption = "<br>Pink dots represent outliers and are removed from downstream analyses.<br>Error bars represent the standard error of the mean.<br>ANOVA model significance *p* < 0.05.<br>Treatments with the same letter are not significantly different at *α* = 0.05 according to Tukey's HSD. ") + 
    theme(plot.caption =  element_textbox_simple())
  ggplotGrob(p)
}

N <- 500

l <- list()
for (i in 1:N) {
  cat(i, " ")
  g <- plot_grob()
  grid.newpage()
  grid.draw(g)
  l[[i]] <- g
}

l <- NULL
l <- list()
for (i in 1:N) {
  cat(i, " ")
  g <- plot_grob()
  grid.newpage()
  grid.draw(g)
  l[[i]] <- g
}

Activity

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Thanks for the detailed post, and the worked suggestion using Rcpp_precious (love the name ;-) ) and a simple list. For disruptive changes I appreciate the minimalism. If this works (as expected) we could look into a hashed structure later.

Now, is there by chance a truly minimally reproducible example that would not involve the ggplot2 stack, tibbles, and a number of other packages (including one not on CRAN) ?

ltierney

ltierney commented on May 17, 2020

@ltierney
Author
eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Use of R_PreserveObject (and the matching release) goes back 15 years to the very first C++ class wrappers around SEXP objects. By using C++ facilities like the constructor and destructor we can generally ensure proper resource management (cf the RTTI idiom in C++ parlance).

And we rather happily relied on this base R feature all those years.

But presumably all use cases up to now were only 'lower-case s' stress testing where @clauswilke and @bhive01 seem to have found a way to put a capital S there. Or, quite possibly, other use cases that may have seen failures were less determined to drill down. You all now did, so thanks for that.

Resource management in the context of a dynamically managed system can have its challenges. Relying on this mechanism worked really well so far. Let's see if you can turn it up one notch. Looking at the code now, requires some tweaks.

ltierney

ltierney commented on May 17, 2020

@ltierney
Author
kevinushey

kevinushey commented on May 17, 2020

@kevinushey
Contributor

This would probably function as a reproducible example (depending on stack size + n). Save this to a file and then call Rcpp::sourceCpp("file.cpp"):

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void barf(std::size_t n) {

  std::vector<RObject> objects(n);
  for (int i = 0; i < n; i++)
    objects[n - i - 1] = Rf_ScalarInteger(i);

  // destructor for 'objects' runs: uh oh?
}

/*** R
barf(1E6L)
*/

(Note: reproducibility may depend on whether your C++ standard library implementation destructs elements from first-to-last, or last-to-first. AFAIK this isn't explicitly mandated by the standard)

The problem occurs because of order of destruction for vector elements doesn't match the order in which they were created, leading to this deep recursion into the precious list.

Ultimately, the goal of the Rcpp constructors and destructors is to protect the associated object from the GC for the lifetime of that object, and unprotect it once collection is safe.

I think using a hash table is worth considering just because it would insulate us from other issues (e.g. performance) related to order-of-destruction issues.

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Same for the 20-or-so-year old recursive code in R_ReleaseObject.

Correct, and we were on the same calling path.

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

While the suggested design can be added to the package fairly easily, it does not account for use from sourceCpp() et al so this will need a refinement. Not as quick a change as I had hoped.

clauswilke

clauswilke commented on May 17, 2020

@clauswilke

I have been trying to create a simpler reprex, but I'm not sure I've been successful. The code below attempts to simulate what I think is causing the issues in the ggplot2 example, and it crashes my R session if I run it twice. But the error is segfault: 'memory not mapped', which seems different from stack overflow. Posting it here in case it's helpful.

#include <Rcpp.h>
#include <vector>
using namespace Rcpp;

class A {
  std::vector<Rcpp::String> s;
public:
  A() {
    for (int i = 0; i <50000; i++) {
      s.push_back("A");
    }
  }
};

// [[Rcpp::export]]
XPtr<A> test() {
  XPtr<A> a(new A());
  return a;
}

/*** R
N <- 1000

l <- list()
for (i in 1:N) {
  cat(i, " ")
  l[[i]] <- test()
}

l <- list()
for (i in 1:N) {
  cat(i, " ")
  l[[i]] <- test()
}
*/
eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

(But that is slightly different. XPtr means "user knows what he is doing, we leave it alone". When you as a user pick one and allocate with via new, you are also either supposed to delete (or ensure it is done for you). Plus we're now entering a whole new element to the story by adding an STL vector too.)

clauswilke

clauswilke commented on May 17, 2020

@clauswilke

Ah, Ok. I'm the user here, but I may not know what I'm doing. :-) The gridtext package definitely wraps things into XPtrs. The STL vector may not be needed for the example. I'll try to refine the reprex.

clauswilke

clauswilke commented on May 17, 2020

@clauswilke

@eddelbuettel Could you provide a minimal correct example using XPtr? How does the C++ code know when the R object holding the pointer goes out of scope? I always assumed delete is called automatically when the last reference to the pointer disappears.

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Sorry, I need to rephrase that. The XPtr generally has a default finaliser too so I stated that poorly, my bad. I use them a lot myself when I also rely on a default cleanup. It's just that maybe in this example ensuring a cleanup at end of scope may help. To be seen.

clauswilke

clauswilke commented on May 17, 2020

@clauswilke

In my reprex, you could just replace std::vector<Rcpp::String> s; with Rcpp::StringVector s; to avoid the STL. It just gets very slow, and I'm not sure it triggers the stack overflow either.

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Simplifying is good. And apologies again for the comment two up---I also rely on the default finalizer (see inst/include/Rcpp/XPtr.h but that is of course gnarly code -- key is standard_delete_finalizer() ). In your example, though. you write to the same long list twice overwriting elements. Methinks those are then lost and never cleaned so that looks like a code smell to me.

Other (standard) XPtr use (I do a ton myself) is to ... do some work, set up some data structures or pointers to persist ... do some other stuff ... have a code block relying on the XPtr allocated objects ... some other things ... and then end-of-scope (end of subroutine say) to clean things up. That works of course as intended.

"It just gets very slow": Yes, we always say that R objects we wrap should not use push_back() because by the semantics of R vectors each of those is a full copy. So to that extend you were right in that code piece using a STL vector. Anyway, maybe we can go back the example by @kevinushey .

eddelbuettel

eddelbuettel commented on May 17, 2020

@eddelbuettel
Member

Ok, needed to take a break, clear my head, work on something else ... to then come back and now have the (basic idea and solution of) Luke's proposal implemented (in a slightly different way because of the way Rcpp is called etc pp). Still passes unit tests, but (once I installed package ggtext) it does not appear to run the example above any better: it still slows to halt in loop and then spends minutes sitting there. (R 4.0.0, Ubuntu 19.10, everything current from CRAN).

45 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @bhive01@eddelbuettel@kevinushey@s-u@ltierney

      Issue actions

        Rcpp is not using `R_PreserveObject`/R_ReleaseObject` as intended · Issue #1081 · RcppCore/Rcpp