New archive support in Pothos

New serialization support has been merged into the Pothos framework. This replaces the existing pothos-serialization project.

Some background

When I first started this project I needed a serialization library for C++; and specifically something that didn't need additional tools and source generation. Compare how something like boost::serialization works vs protocol buffers. I really liked the interface of boost::serialization, but I didn't want to make boost a forced dependency for Pothos since its very large and sometimes a difficulty to get installed.

To solve this, I created the pothos-serialization project. I created a python script to traverse the boost source tree, pick out only the headers needed for the boost::serialization support, and to rename everything: namespace, macro, and file-wise to be pothos instead of boost as to not conflict with an existing installation of boost.

Heres some stats that shows how deep the boost tree really goes:

  • Only 44 C++ sources were copied...
  • but 1869 header files were dragged in!

The problem

If you thought that extracting out a piece of boost rather than just using it as is was a little "cringy", then I agree. I was really prioritizing a no boost dependencies and an all C++11 implementation. Also, from a packaging perspective (ex debian), you don't want an additional copy of an existing library that isn't being maintained. Consider security updates, bug fixes, and compiler patches. So lets get Pothos ready for debian inclusion!

The replacement

Since making pothos-serialization, I have leaned a ton about C++11, templates, and template meta-programming, and techniques like SFINAE. And boost::serialization really isn't even that large project; the 1000's of headers are just implementing missing C++ features, and obscure/old compilers that not even Pothos itself is supported on. With that in mind, I felt confident that I could create an in-library solution that implements a similar boost::serialization API using C++11 and only a handful of C++ sources and headers.

The result

Say hello to the new Pothos::Archive API:

A quick demo of what the API looks like:

//The object that we want to serialize
MyDataType x;

//serialize
std::stringstream so;
Pothos::Archive::OStreamArchiver ao(so);
ao << x;

//deserialize
std::stringstream si(so.str());
MyDataType y;
Pothos::Archive::IStreamArchiver ai(si);
ai >> y;

Serialize functions

How do the StreamArchiver classes know how to work with a given type?

  • Most primitive types are implemented in the Archive headers,
  • or custom types have a member function called serialize,
  • or custom types have an free function called serialize,
  • or custom types have two member functions called save/load,
  • or custom types have two free functions called save/load.

Clearly there are a lot of options for custom types. That flexibility lets the programmer pick the best fitting option given the type. Sometimes its possible to write a single serialize function, and sometimes separate save and load are needed because of implementation differences. Same idea with member functions: some types can support a new member function, but others may be from an external library that cannot be changed. In this case we need to write free functions to compliment the type.

A member serialize for example:

struct MyDataType
{
    template <typename Archive>
    void serialize(Archive &a, const unsigned int)
    {
        a & foo;
        a & bar;
    }

    int foo;
    std::string bar;
};

Invoking templates

So we have established a variety of ways to define serialization methods, but we actually need to invoke said methods from the archiver classes.

Basically, we are going to have to call serialization functions without having the definitions for those functions present. Maybe its not obvious that this is a problem but we are trying to compile a templated function that calls serialize before said specific serialize implementation is defined in the code. Thats a problem because the compiler (at least GCC) will spew errors.

The solution is to force two-phase template resolution. This technique involves adding a layer of type misdirection so the compiler has to re-parse after its first run-through of the file, where it now has loaded the entire file, and therefore, order of declaration no longer matters. Sounds a bit mischievous, but its how boost does it as well, which has been reliable...

Invoking templates (cont...)

The other oddity about invoking serialize is that its not clear given a type, whether or not that type has a member function or free function implementation. We need to use template meta-programming to check if the type has a member serialize:

I found several different styles for detecting a member function. This particular style worked on VC12 and uses SFINAE to match a type to test function which yields true/false at compile time on its ability to fit the template definition.

Put it all together, two-phase resolution and member detection:

template <typename Archive, typename T>
typename std::enable_if<hasSerialize<T, Archive>::value>::type
invokeSerialize(Archive &ar, T &value, const unsigned int ver)
{
    value.serialize(ar, ver);
}

//create a type that can be implicitly casted
struct VersionType
{
    VersionType(const unsigned int ver);
    operator const unsigned int &(void) const;
    operator unsigned int &(void);
};

template <typename Archive, typename T>
typename std::enable_if<!hasSerialize<T, Archive>::value>::type
void invokeSerialize(Archive &ar, T &value, const unsigned int ver)
{
    const VersionType vt(ver); //force two-phase resolution
    serialize(ar, value, vt);
}

Polymorphic support

Another major hurdle for the Archive library is to support polymorphic types. Suppose we have pointer to class MyBaseType, the goal is to serialize the derived class, whatever that may be. And in the reverse direction, create an instance of that derived class, and load with the serialized data. Further, the code invoking the archiver may have never seen the implementation for any of the derived classes. (This is the basis for serializing the Pothos::Object container).

The basic idea behind the implementation is that

  • 1) the library that defines a custom derived type exports that type,
  • and 2) the serialize implementation can lookup these exports that runtime.

In this case, exporting entails calling a macro provided by Pothos::Archive for every type of interest. This macro instantiates a save and load function for that type and creates a table entry mapping the data type to those functions.

All the serialization class must do now to support polymorphic types is to serialize the data type so that the type can be recovered, and to invoke the relevant save/load functions registered for that type.

Here is a quick example showing the polymorphic support:

//export somewhere in the library source
POTHOS_CLASS_EXPORT(MyDerivedType)

//pointer of base type to serialize
MyBaseType *x = new MyDerivedType();

//serialize
std::stringstream so;
Pothos::Archive::OStreamArchiver ao(so);
ao << x;

//deserialize
std::stringstream si(so.str());
MyBaseType *y(nullptr);
Pothos::Archive::IStreamArchiver ai(si);
ai >> y;

//y now contains a pointer holding MyDerivedType

Missing features

The Pothos::Archive API is complete and usable in terms of the existing library, GUI, and examples. But compared to its boost predecessor, its not as complete. Pothos::Archive could use some more pre-defined serialization headers for STL types. And it could use some additional archiver formats. Currently we just have a portable binary format (which is all Pothos ever used), but boost offered XML and text formats as well.

Have fun with Pothos::Archive!

Last edited: Sat, Dec 31 2016 - 11:20PM