Skip to content

Conversation

alanpaxton
Copy link
Contributor

@alanpaxton alanpaxton commented Sep 26, 2022

Overview

Experimental re-imagining of how the Java API maintains consistency with internal RocksDB objects.

The current RocksObject / AbstractImmutableNativeReference / AbstractNativeReference implementation expects a single Java object to be the single owning handle (isOwningHandle()) for an underlying C++ object, for instance a database handle or a column family handle.

Multiple Java objects sharing the same underlying object can exist, and then the owning protocol must be observed in order that neither double-deallocation nor a resource leak occurs.

We want to do away with the problem by introducing a form of reference counting so that the last owner of an object always close()s it for real, and no other owner does.

Our idea is that we can do this by changing the handle from being a C++ T* to being a reference to a C++ shared_ptr *, where each Java object has its own companion shared_ptr. On closing a Java object, the shared pointer is reset() and deallocated.

There is a pull request with the changes WIP - #10736

Implementation Detail (Java)

We have introduced a new class, RocksNative which wraps the native reference to a RocksDB C++ object
in a modified way to the old RocksObject.

  • Naming of methods is systematically different; we can adopt an incremental transition of some classes but not all, for instance we have not yet converted options classes to derive from RocksNative.
  • All accesses are performed behind an AtomicBoolean isOpen. There is no concept of having disowned the native reference at the Java side; ownership and lifetime is dealt with entirely by smart pointers on the C++ side.
  • The class hierarchy is simplified, we don't need to distinguish AbstractImmutableNativeReference because all references are necessarily to an immutable shadow object.

Implementation Detail (C++)

In rocksdb repo branch we have added a number of API files to the java/rocksjni directory, named api_<item>.h thus

  • api_base.h - common support functions
  • api_rocksdb.h - the shadow database handle
  • api_columnfamilyhandle.h - the shadow column family handle
  • api_iterator.h - the shadow iterator handle
  • api_wrapper.h - a standard wrapper for less commonly used objects which have shadows

These shadow objects all implement variations of the same pattern:

api_rocksdb.h - the shadow database handle

The shadow database handle contains shared pointers to the RocksDB itself (TDatabase is usually ROCKSDB_NAMESPACE::DB)
and to the column family handles open within the database.

std::shared_ptr<TDatabase> db;
std::vector<std::shared_ptr<ROCKSDB_NAMESPACE::ColumnFamilyHandle>>
      columnFamilyHandles;
  ...

The shadow database handle is created on a database open() JNI call, and its reference is cast to long and returned as the handle
in the parallel Java object (class RocksDB in RocksDB.java). Previously, a pointer to the ROCKSDB_NAMESPACE::DB was returned.

When the RocksDB class, is accessed in native code, the handle is cast to the shadow object, so we define the
shorthand in C++ using API_DB = APIRocksDB<ROCKSDB_NAMESPACE::DB>; and then

auto* dbAPI = reinterpret_cast<API_DB*>(jdb_handle);

this replaces the previous common idiom

auto* db = reinterpret_cast<ROCKSDB_NAMESPACE::DB*>(jdb_handle);

The shadow database handle also defines the standard pointer methods on itself,
so that it can be used as if it were a pointer

TDatabase* operator->() const { return db.get(); }
std::shared_ptr<TDatabase>& operator*() { return db; }
TDatabase* get() const { return db.get(); }

and this leads to the idiom (*dbAPI)->Method(...) being used as a replacement for db->Method(...)
throughout the changed codebase.

The shadow object is deleted in the Java_org_rocksdb_RocksDB_nativeClose method.
When this happens, reference counts on the std::shared_ptr members are reduced,
and by the magic of shared pointers the ROCKSDB_NAMESPACE::DB and ROCKSDB_NAMESPACE::ColumnFamilyHandle
objects are deleted (with the C++ delete() method) on their final reference.

Crucially, however, these references will persist when the shared pointers have been copied into other
shadow objects. This gives us the guarantees that we will be able to access certain structures (e.g. iterators)
where their lifetime persists beyond that of the database, while still cleaning them up when all references
to them are closed.

This means that we can use idioms like closing objects out of nested order, or passing objects out of a loop, while still being secure that our calls will work, or at the very worst fail cleanly with an obvious java exception.
What they won't ever do is crash in C++ when invoked from Java (practically speaking, not never but much less frequently and in much more corner cases).

api_iterator.h - the shadow iterator handle

Iterators are used with shadow objects in a similar way to databases. Because iterators are created with a
database in hand, we can copy the database (and if there is one, column family handle) from the RocksDB shadow
object into the iterator shadow object. The shadow object for an iterator then consists of the following:

std::shared_ptr<TDatabase> db;
std::shared_ptr<ROCKSDB_NAMESPACE::ColumnFamilyHandle> cfh;
std::unique_ptr<TIterator> iterator;

the std::unique_ptr<TIterator> iterator is all that is necessary because we never have multiple reference to a single iterator. The std::shared_ptrs maintain the lifetime of the database and column family handle for the
lifetime of the shadow object, so that the iterator is always valid until it is closed. The only requirement on the
user is to close() the iterator when they are done with it, most often just by using Java's try-with-resources
mechanism, but reserving the option to close() it manually if its lifetime is more sophisticated.

api_columnfamilyhandle.h - the shadow column family handle

Column family handles also have shadow objects. We choose to make their lifetime different from that of iterators.
When the corresponding database (and all other objects which implicitly use it) are closed, the column family handle becomes invalid. We take this view because usually a column family is accessed with a database in hand anyway,
so it is wrong to do so if that database is closed.

When the column family handle is used closed, we return a clean Java exception.

This behaviour is achieved using the C++ std::weak_ptr mechanism. Each weak pointer is associated with a
std::shared_ptr and can be lock()-ed to yield a std::shared_ptr as long as there are other outstanding copies of the shared pointer (its reference count has not fallen to 0). A successful lock() yields a std::shared_ptr<ROCKSDB_NAMESPACE::ColumnFamilyHandle> which is valid at least until it goes out of scope, and then in C++ we dereference that to make the same calls to the database as before. An unsuccessful lock() of the database weak pointer tells us that all database and iterator handles are already closed.

std::weak_ptr<TDatabase> db;
std::weak_ptr<ROCKSDB_NAMESPACE::ColumnFamilyHandle> cfh;

So when accessing column families in client code, we must access the column family while we have an open database.
If the database has been closed, we will fail with a RocksDBException with the message "column family already closed".

Performance

The new code does what might be considered as some extra work in 2 ways

  1. checking the AtomicBoolean isOpen before every access of the native handle
  2. Dereference of RocksDB objects via a std::shared_ptr or a std::weak_ptr rather than a raw pointer.

For smart pointers, we devised a worst case performance test based on existing JMH tests for JNI performance of alternative implementations of get() methods.
The RocksDB JNI Performance branch has been
adapted to provide an extra test which simulates the access pattern using a std::shared_ptr to access a
simulated database object, and a std::weak_ptr to access a simulated column family object.

The performance of 4 separate implementations of an implementation of get() which uses the JNI SetRegion methods
to return data in a java byte[] are compared. A distinct 5th implementation using SetRegion to get into an indirect
ByteBuffer is compared for reference.

The 4 implementations are:

  1. Use a compiled static reference to the model database in get() calls
  2. Pass a reference to the model database to get() calls
  3. Pass a shadow object containing a std::shared_ptr holding a reference to the model database.
  4. Pass a shadow object containing std::shared_ptr and std::weak_ptr references; fetch both,
    check that they are equal (of course this is true), and then use the reference.
  • As the buffer size grows, get() performance becomes indistinguishable between all the SetRegion into byte[] implementations.

  • SetRegion into an indirect ByteBuffer is significantly slower.

  • SetRegion into an indirect ByteBuffer is significantly slower.

  • For small buffer sizes there is a statistically significant but still extremely small difference between (1) and
    (2,3,4). This may be attributable to optimizations possible when the address of the mock database is a known, static value or it may be due to passing 1 less parameter to the get() method.

Otherwise, dereferencing a shadow object via a smart pointer, even where locking is involved, seems to have negligible performance impact. Bear in mind that the performance being tested is entirely stripped down to the
nuts and bolts of buffer transfer, so it would exaggerate any performance difference compared to what we would
see in RocksDB, which has to go through a longer code path before it can access buffers.

Status

A large number of RocksObject classes have been converted to RocksNative, and the corresponding C++ code updated. All Java unit tests are passing in this branch. As well as the fundamental classes (RockDB, ColumnFamilyHandle, Iterator) this has necessitated the conversion of a large number of derivative or related classes.

The classes that remain to be converted include the various Options classes.

TODO

  • Identify objects that we have not yet converted
  • Code Review
  • Broader testing - major public Java API clients

opening a db.
Now we need to close it, use CFs from it successfully (before DB close) and throwing an expcetion (after DB close).
Then similarly with iterators.
Then check performance.
Add some C++ wrapper to turn the C++ handle into a C++ RCA (Reference Counted API) object.
Pull on the cascading changes necessary.
Nearly in a position to run RefCountTest in the new world, at least for opening up a DB.
And remove all the shadow (org.rocksdb.api package) code as that path isn't the one we need to take.
Just a few.. lots still to do.. working out how it works for the moment.
It seems better to just incrementally change in place the existing classes.
RocksNative is now in the org.rocksdb package, org.rocksdb.api package seems unnecessary.
Further steps on the way to the new reference counted API.
Introduce some operator->() jiggery pokery
All the iterator.cc methods
Some more DB methods
Plenty more cases not fixed yet.
There is a single default_cf_handle_ which gets deleted as part of the deletion of the DB itself. So we can't put it in a shared_ptr or it will get deleted with the final shared_ptr instance, which is too soon, rather than explicitly deleted only when the DB is finally deleted.
So move it into that class which we use to confirm the issues with ref counts, or which we want to fix with ref counts.
nondefault is an unowned reference
rocksdb_get_helper converted to new model, along with the methods that call it. That's just the indirect methods, we will do rocksdb_get_helper_direct next.

As part of this, add/revise some helper methods for API objects (getters, lockers) that make code look more consistent where we access stuff via the new model.
We can still improve things by adding a WeakDB to the DB, and using it to verify that the C++ rocks database itself has truly been closed.
JNI name mangling nonsense causing a problem. Can't figure out why the weakdb names are mangled. Bah.
Now linking the C++ method with the correct name.
Added closed DB checks.
Removed a ReadOptions test which wasn't relevant to ref count testing.
@alanpaxton alanpaxton marked this pull request as draft September 26, 2022 16:05
fix put(), enableAutoCompaction()
Convert parameters to use the new-style handles.
Provide a helper function to copy a vector of CF handles.
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from cd88951 to b1b0b1a Compare September 29, 2022 18:31
More template params for iterators so that they cover the WBWI iterators.

WBWI iterator already delete()s the BaseDeltaIterator internally (C++ Rocks level), so we can’t run it as a shared_ptr. Best we can manage is a unique_ptr but we leave a hole where the base iterator Java object may crash if used after the derived is closed.

Some polish/tweaks around references and constructors.
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from 5775970 to 5dd112f Compare October 3, 2022 16:44
WriteBatchWithIndex passing all existing tests in new shared reference model.
CF->lockDBCheck() is too complicated, would need more context when dealing with CF used against base DB extracted from OptimisticTransactionDB (which should be valid). There’s some checking internally anyway, so I think what we’re doing may count as unnecessary.

Some small bits of localized code improvement/polish, and comments, where I came across them.
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from e1acf5a to cf0eafa Compare October 5, 2022 13:43
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from 0bcf435 to 53dee7f Compare October 6, 2022 11:35
remove booby traps
Convert to ref-counted API
Add test
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from 0c9fddc to ae52ca9 Compare October 10, 2022 11:10
@alanpaxton alanpaxton changed the title Eb 843 ref counted objects - very much a WIP Evolved Binary (EB-843) reference counted objects - Java interface proposal Oct 13, 2022
@alanpaxton alanpaxton force-pushed the eb-843-ref-counted-objects branch from f6e9e9c to ae52ca9 Compare October 13, 2022 16:28
alanpaxton added a commit to alanpaxton/rocksdb that referenced this pull request Feb 2, 2023
ComparatorOptions is AutoCloseable.

AbstractComparator does not hold a reference to its ComparatorOptions, but the native C++ ComparatorJniCallback holds a reference to the ComparatorOptions’ native C++ options structure. This gets deleted when the ComparatorOptions is closed, either explicitly, or as part of try-with-resources.

Later, the deleted C++ options structure gets used by the callback and the comparator options are effectively random.

The original bug report facebook#8715 was caused by a GC-initiated finalization closing the still-in-use ComparatorOptions. As of 7.0, finalization of RocksDB objects no longer closes them, which worked round the reported bug, but still left ComparatorOptions with a potentially broken lifetime.

The solution implemented here is to copy the native C++ options object into the ComparatorJniCallback. Then the deletion of the native object held by ComparatorOptions is *correctly* deleted when its scope is closed in try/finally.

There are probably other instances of the same problem. They could all be addressed using the model proposed/prototyped in facebook#10736

In the meantime, we will look for and fix other instances using the same copy technique.
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2023
Summary:
The problem
-------------
ComparatorOptions is AutoCloseable.

AbstractComparator does not hold a reference to its ComparatorOptions, but the native C++ ComparatorJniCallback holds a reference to the ComparatorOptions’ native C++ options structure. This gets deleted when the ComparatorOptions is closed, either explicitly, or as part of try-with-resources.

Later, the deleted C++ options structure gets used by the callback and the comparator options are effectively random.

The original bug report #8715 was caused by a GC-initiated finalization closing the still-in-use ComparatorOptions. As of 7.0, finalization of RocksDB objects no longer closes them, which worked round the reported bug, but still left ComparatorOptions with a potentially broken lifetime.

In any case, we encourage API clients to use the try-with-resources model, and so we need it to work. And if they don't use it, they leak resources.

The solution
-------------
The solution implemented here is to make a copy of the native C++ options object into the ComparatorJniCallback, rather than a reference. Then the deletion of the native object held by ComparatorOptions is *correctly* deleted when its scope is closed in try/finally.

Testing
-------
We added a regression unit test based on the original test for the reported ticket.

This checkin closes #8715

We expect that there are more instances of "lifecycle" bugs in the Java API. They are a major source of support time/cost, and we note that they could be addressed as a whole using the model proposed/prototyped in #10736

Pull Request resolved: #11176

Reviewed By: cbi42

Differential Revision: D43160885

Pulled By: pdillinger

fbshipit-source-id: 60b54215a02ad9abb17363319650328c00a9ad62
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants