[RFC] Data Structure Fork Interface

**phantomotap** · 02-29-2012

This is strictly a request for opinions on different interfaces.

I'm not sure how much information is really relevant so I'll try to be brief while still providing enough context.

I'm finally well enough to rewrite a library I've built for years. A very old version of the library built "commit/rollback" semantics into the algorithms. (So for example, `Sort', `StableSort', `AtomicSort', and `AtomicStableSort' were the names of some interfaces.) Unfortunately, this meant maintaining a lot of nearly duplicate implementations for a lot of different algorithms. (I had twelve different versions of the standard "Quicksort" sorting algorithm.) A more recent version moved the "commit/rollback" semantics into the data structures. Even though I could prove that this wasn't costly if no forks were made, it was still an issue because some algorithms had to initialize a fork even if it would never have been used (because success was otherwise guaranteed). For the rewrite, I plan to further generalize the notion of the relevant bits (like iterator traits and container types) into more categories so that support for "commit/rollback" semantics can be coded into the implementation for the data structures, iterators, and algorithms without having any real cost when the data structure doesn't already have support for data forks. (Naturally, you could always get "commit/rollback" manually; I'm talking about an "automagic" situation.)

Before I can start developing such fancy data structures and code the relevant wrappers for use by the algorithms I need to decide on an interface for the fork operations.

Okay, so, all that crap out of the way, here are the options I'm comfortable with:

Code:

// Always Fork:
SparseTable lData;
// ...
SparseTable lOverlappedView(lData);
// ...
ChangeSomeData(lData); // `lOverlappedView' is unchanged.
// ...
UpdateView(lOverlappedView); // `lData' is unchanged.
// ...

I think this method makes the most sense, but people don't usually get how `lData' and `lOverlappedView' are related. (The old version of the library used this for an early file mapped "B+Tree".) I see why. It looks like a copy. One could certainly argue that it should be a copy. My argument has always been that it is less costly to make a fork than an entire copy so why not share the resources.

Code:

// Explicit Fork
SparseTable lData;
// ...
SparseTable lOverlappedView(lData.createAnonymousFork());
// ...
ChangeSomeData(lData); // `lOverlappedView' is unchanged.
// ...
UpdateView(lOverlappedView); // `lData' is unchanged.
// ...

*shrug*

This also makes perfect sense. (I've just gotten too close to this issue.) With a properly named method, it would be hard to confuse what is really happening. However, programmers, especially those coming from other languages, have a tendency to assume that the method creates a reference to the original object (and not the members of that object). That little assumption convinces people that `lData' needs to live at least as long as any forks derived from it. I know it isn't my job to babysit. That's not the point. A free function might even muddy the waters for C++ programmers. (That is, `CreateAnonymousFork(lData)' clearly looks like a "C-ism" creating a reference of some kind to `lData'.) I'd like this to be as obvious as possible.

Code:

// Immutable Parents
SparseTable lData;
// ...
SparseTable lDataClone;
SparseTable lOverlappedView;
// ...
lData.CreateAnonymousFork(lDataClone, lOverlappedView);
// ...
// ChangeSomeData(lData); // Fails because `lData' has become an immutable object.
// ...
ChangeSomeData(lDataClone); // `lOverlappedView' is unchanged.
// ...
UpdateView(lOverlappedView); // `lDataClone' is unchanged.
// ...

I don't really like this version, but I can see the intent. I know some libraries and even a couple of "DSEL" that do it exactly like this. Heck, depending on the implementation the data structure that is what would happen internally regardless of what the interface looks like.

Soma

**CornedBee** · 03-01-2012

So these forks look like copy-on-write to me. If that is so, implicit forking is the way to go. A copy is, as you say, a copy, and I don't really care how it is implemented behind the scenes.

So the real question is, do "forked" objects behave differently from real copies in any semantically observable way? If they do, the copy constructor is the wrong syntax. Otherwise, it's the right syntax.

The third way looks weird and unclear to me.

**phantomotap** · 03-01-2012

So these forks look like copy-on-write to me.

Well, that's certainly an option for other data structure, but what I'm working on at the moment works by shadowing the shared part of the tree.

So the real question is, do "forked" objects behave differently from real copies in any semantically observable way?

Nope. Well, not without violating the private interface.

Soma

**phantomotap** · 03-06-2012

Well, I've made a design on this and got going again.

Thanks for the comments CornedBee.

Cheers,
Soma

View Poll Results: Which interface is looks best?

Thread: [RFC] Data Structure Fork Interface

Thread Tools

Search Thread

Display

[RFC] Data Structure Fork Interface

Similar Threads

difference between data object and data structure?

Reading data from SPI Interface

Data structure for storing serial port data in firmware

data structure design for data aggregation

structure of a tcp driven qt interface