Biggy: A Fresh Look at High-Performance, Synchronized In-Memory Persistence for .NET

Image by Elif Ayiter | Some Rights Reserved

About a month ago, Rob Conery popped off with the following tweet:

And hence, Biggy was born. Rob’s “most ridiculous thing he has ever created” has quickly morphed into a high-performance document/relational query tool that is fully LINQ-compliant.

Biggy is an open source project hosted on Github, and is in the early stages of active development. Things have evolved rapidly, and I am proud to say I have contributed some real code and fixes on this, from almost the beginning. This is an exciting project with some interesting, convention-challenging aspects and some cool and really, really knowledgeable leaders and contributors, and I am just damn stoked to be a part of it.

Look, ma! I’m an open source contributor!

What the Hell is it?

Well, we’re not precisely sure . . . ok, we kind of are now. Biggy began as Rob’s experiment with a flat-file JSON store, which would be materialized into an in-memory, queryable list structure, which is synchronized with the flat-file store for writes (inserts and updates).

Soon enough, he began experimenting with integrating document-style persistence of data as JSON with standard relational database structures. In Biggy Basics, Part I Rob discusses his thoughts on appropriate persistence models for different types of data:

Our store is a simple process – the “input” data (Products and Customers) generate “output”, or “record” data. The input data changes fairly often – Customers logging in and changing things, store owners changing prices, etc.

The output data doesn’t change much – in analytical terms this is called “slowly changing over time” – you might go in and tweak an order here and there, but mostly it’s a matter of historical record and should never be changed.

To me the “input” stuff (Products/Customers) is perfect for a looser, document structure. The output stuff should not only be in a relational structure – it should be denormalized, ready for analytical export to CSV or some other reporting system.

It was around this time that I jumped in. I’ve been looking for an OSS project to jump into, for a variety of reasons, and this seemed perfect. The size and scope were such that one could easily grasp the whole of the project. Further, I cut my coding teeth on database stuff, so this was right up my alley.

Ultimately, what Biggy does is combine many of the best characteristics of several data management and persistence models in one package. Biggy is a high-performance, synchronized in-memory query and persistence tool for .NET.

Or, as the README file on Github states, Biggy is “A Very Fast Document/Relational Query Tool with Full LINQ Compliance”

Fast, In-Memory Queries

One of the fundamental aspects of Biggy as a data access tool is that your backing store is represented completely in memory using your domain object model. Where this truly shines is in query performance. Once your data is loaded up, queries are executed, via LINQ, against this in-memory object model.

For example, as things sit currently we could “new up” our data in-memory from the Chinook sample database, and run the following code, with a multi-join LINQ query against it (the Biggy API is evolving as we speak, but this code works against the current master branch available from the Biggy repo as of 3/22/2014):

Example Multi-Join LINQ Query against a Biggy Relational Store:

var _artists = new SQLServerList<Artist>(_connectionStringName, "artist");
var _albums = new SQLServerList<Album>(_connectionStringName, "album");
var _tracks = new SQLServerList<Track>(_connectionStringName, "track");
  
var recordsLoaded = _artists.Count() + _albums.Count() + _tracks.Count();
var actracks = from ar in _artists
             join a in _albums on ar.ArtistId equals a.ArtistId
             join t in _tracks on a.AlbumId equals t.AlbumId
             where ar.Name == "AC/DC"
             select t;
foreach (var track in actracks)
{
    Console.WriteLine("\t-{0}", track.Name);
}

Note, where we “new” up the table data into memory for the first time, we take a little hit as the connection pool is initialized. Not much of one, though! From there, things are FAAASSSST. Here are some really rough performance numbers from my machine (everything is different from one machine to the next):

Indeed, that triple-join query returned in 1 millisecond. As for the Loading in 117 ms, the bulk of that was the opening of the connection in the first place.

The idea here is that (within sensible parameters determined by our application needs), we can load our application data into memory, and eschew round trips to the database except to perform writes, to keep our in-store model in sync with the back-end.

Structured, Normalized Relational Storage

Let’s face it. Despite the advent and popularity of the many flavors of NoSQL, the relational model of data persistence is not going anywhere. Further, Structured Query Language (SQL) also happens to be an excellent way to talk to a database. While often reviled by programmers, I say, man up. When it comes to describing what you want from your data store, SQL beats any “Map-Reduce” style DSL hands down for ease of use and readability.

Also, for a good many types of data, relational data storage is simply (as of this writing, anyway) the best available model, time-tested and mature.

Biggy plays nicely with relational databases. The current implementation focuses on SQL Server and PostgresSql, but as it evolves, it becomes ever easier to port implementation for other SQL database platforms.

As data is added, updated, or deleted in memory, Biggy maintains sync with the backing store without having to reload the in-memory cache.

Loosely Structured, De-normalized Document Storage

For more modest persistence needs (small web applications and the like) Biggy offers an evolved version of Rob’s original flat-file JSON store concept. For tables of small to moderate size, performance is strong, and management of data as simple as pointing Biggy at the directory you want to use as your database, and you’re off and running. Biggy will de-serialize your JSON data into memory as domain objects. New data is appended to the end of the file as it is added to the in-memory list, and updates are written by flushing all data back to the file.

Biggy also brings the document storage model to your relational database. Domain objects can be persisted as documents within the relational structure, taking advantage of the existing indexing capabilities of the relational Db, and the efficient, flexible record structure of a document Db. Records are saved to a document table indexed with a standard Primary Key, with the actual data serialized as JSON in a “body” field.

This is especially well-suited for PostgreSQL, with its JSON data type. However, SQL Server does just fine with JSON serialized as text. As you might expect, you can save complex objects as a JSON document, and re-serialize later with any parent/child relationships intact:

For example, consider two tables in our database, Artist and Album. We represent these in our domain as you would expect:

Example Artist and Album Classes:

public class Artist 
{
    public int ArtistId { get; set; }
    public string Name { get; set; }
}
  
public class Album 
{
    public int AlbumId { get; set; }
    public string Title { get; set; }
    public int ArtistId { get; set; }
}

Using Biggy’s hybrid relational/document features, we could decide we need a handy way to store and retrieve artist data in a de-normalized format, such that the artist, and their album catalog, were persisted as a single JSON document, ready for retrieval. We can simply add a handy document container class to our model:

Extending the Artist Class for Document Persistence:

public class ArtistDocument : Artist 
{
    public ArtistDocument() 
    {
        this.Albums = new List<Album>();
    }
    public List<Album> Albums { get; set; }
}

Now, we could write something akin to the following code:

var _artists = new SQLServerList<Artist>(_connectionStringName, "artist");
var _albums = new SQLServerList<Album>(_connectionStringName, "album");
  
var list = new List<ArtistDocument>();
foreach(var artist in _artists) 
{
    var artistAlbums = from a in _albums
                     where a.ArtistId == artist.ArtistId
                     select a;
    var newArtistWithAlbums = new ArtistDocument() 
    {
        ArtistId = artist.ArtistId,
        Name = artist.Name,
        Albums = artistAlbums.ToList()
    };
    list.Add(newArtistWithAlbums);
}
var _artistDocuments = new SQLDocumentList<ArtistDocument>(_connectionStringName);
_artistDocuments.AddRange(list);

In the code above, we new up our Artist and Album data from the Chinook backing store, and use it to hydrate our document container class, ArtistDocument. We then create a new SQLDocumentList<ArtistDocument> instance.

But wait, John, you say, there’s no table for this in Chinook Db. Well, you’re correct, there’s not. But Biggy knows this, and takes care of that for us. When we initialize a DocumentList<T>, Biggie creates a table for us if one does not already exist.

Once that’s done, we just drop our list of complex objects into the AddRange() method and we’re done. If we look at our backing store, we see the following table data:

Artist Document Data Persisted in SQL Server as Complex JSON:

The image above may be hard to see in detail, be we have persisted each ArtistDocument record with a serial integer Primary Key, and a body of JSON, which includes the artist data, as well as a JSON array representing the album catalog for each artist.

A Closer look reveals standard JSON:

{"Albums":[
    {"AlbumId":163,"Title":"From The Muddy Banks Of The Wishkah [Live]","ArtistId":110},
    {"AlbumId":164,"Title":"Nevermind","ArtistId":110}],
"ArtistId":110,"Name":"Nirvana"}

And yet, in our in-memory list, each artist will be serialized back into its proper domain model, with the parent/child relation between artist and albums intact.

Picking Up Where Massive Left Off

Underlying the relational database integration in Biggy is re-worked version of another of Rob’s projects, Massive. We’ve re-worked some of the internals, and added strong support for static types (the original Massive was all dynamic). Much of the speed and power of Biggy with respect to relational data comes from the innate flexibility of Massive, coupled with some customization to meet the needs of Biggy.

In a way, one might say that Biggy is the next layer that Massive was always waiting for. With the addition of static type handling and easy, synchronized in-memory persistence, Biggy helps take Massive to the next level.

Check it Out, but Be Warned

There’s a lot going on with Biggy right now, as I write this. We are exploring a new architecture proposed by none other than K. Scott Allen ( @OdeToCode on Twitter), and things are in flux. That said, if you enjoy messing about with innovative persistence and data access technologies, do visit Github and check out the code, pull it down, and play with it. We need to know how it works “in the wild” and under conditions we haven’t thought of.

But, be warned. As I said, things are moving fast, and as far as I know, the API is not locked in yet.

What’s in it for Me?

This is the first Open Source Project I have really jumped in to. As I mentioned earlier, the scope was right, the technology I know well, and I think that my views on data access jibe well with the project owner – Rob does interesting things that challenge convention, and I totally subscribe to that. I have been looking for a project to jump on for a while, and this was perfect.

I have been coding for myself (and some odd projects related to my day job) for a few years, and until now have never really worked with someone else’s code. I recognized early on that the thing to do is to try to ascertain and stay in tune with the project owner’s direction. In fact, this is one of the more interesting aspects of participation, actually. I am accustomed to working my project out my own way, to suit my needs. I have never really had to try and follow someone’s “vision” other than my own “how can I make this work” approach.

In the few short weeks I have been contributing to Biggy, I have already grown as a coder, and I intend to keep it up.

What’s Next?

Over the next few posts, I’ll be taking a look at some of the more interesting things I have run into on this project, and some of the things I have learned. Hope you check out Biggy, and follow along (we can ALWAYS use GOOD bug reports – at present, I am certain there are plenty to be had!).

Resources and Items of Interest

John on Google

Author @xivSolutions

John Atten

John
Author Reply

April 2, 2014

Hey Matt –

Thanks for reading. Indeed, a pub-sub layer has been discussed, I believe. Ahem . . . we are happily taking PR's, and/or even your thoughts as an issue on Github.

Would love to see what you cooked up for your music library. Also, seeing Biggy used in a "real-world" setting would be informative. Did you run into anything where you were thinking "Damn, if only it did . . .?"

Thanks for taking the time to comment! Cheers!

-J
Matt R.
Author Reply

April 2, 2014

I think I stumbled across Biggy on Rob's blog after listening to him on .NET Rocks IIRC.

I've actually been fooling around with a home project revolving around helping me to manage my wife's and my music libraries to keep them from duplicating or stepping on each other (yes I know there are packages solutions, but it was something fun). Biggy was plopped straight into the daemon and the ease of use is astounding.

I'd love to see someone build a pub-sub layer on top of it to essentially create a .NET replacement for REDIS – just because.