Biggy: Evolving the Architecture

Image by Ian Kershaw | Some Rights Reserved

Recently, K. Scott Allen proposed a re-thinking of the fundamental architecture for Rob Conery’s Biggy project. Over the past month, the project has grown from a simple, flat-file, in-memory JSON store into a more ambitious, LINQ-compliant high-performance document/relational query tool. The existing structure worked, but there are clearly places where it is beginning to become brittle.

Biggy is a high-performance, synchronized in-memory document/relational query tool for .NET. The project attempts to combine the most desirable features of document and relational data stores, along with some ORM-like features.

As of this writing, the existing implementation is heavily inheritance-driven. K. Scott Allen’s solution proposes an interface-centric approach, and more cleanly separates the concerns of data access from the backing store, versus managing the list-based, in-memory representation of the data with domain models. Of the existing architecture, Mr. Allen states, and I agree, that:

Looking over the Biggy implementation, every different data store becomes coupled to an InMemoryList<T> class through inheritance. The coupling isn’t necessarily wrong, but it does complicate the implementation of each new data store.

Having worked rather extensively with the code base on the relational database implementation, I found exactly this to be the case. Not that the current architecture was bad, but simply that the project had grown outward from Rob’s early implementations, to the point where what was once simple was becoming increasingly complex.

Hiding Implementation with Interface Abstraction

Under the proposed architecture, Biggy would utilize a few interfaces to take some of the pain out of extending the implementation to support various data stores.

In his post, Scott Allen proposes, first and foremost, separating Lists from stores through interface implementation. The strength of Biggy as a library results from delivering an in-memory list representation of application data as domain objects.

K. Scott’s approach cleanly separates the responsibilities of the Store (fetch/push data to the back-end database) from the in-memory list (query/manipulate data within the domain model), and proposes to abstract the store functionality behind a set of interfaces, IBiggyStore<T> such that that different backing store implementations can be easily ported into the library. Further, the in-memory list implementation will be abstracted behind its own interface, IBiggy<T>, into which an instance of IBiggStore<T> is injected, thereby completing the de-coupling of store from list.

As I undertook to implement the proposed structure, I made a few minor adjustments, finally arriving at the following interface for the in-memory IBiggy<T> abstraction:

The IBiggy<T> Interface and Associated Sub-Classes:

public interface IBiggy<T> : IEnumerable<T>
{
    void Clear();
    int Count();
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
    T Add(T item);
    List<T> Add(List<T> items);
    IQueryable<T> AsQueryable();
  
    event EventHandler<BiggyEventArgs<T>> ItemRemoved;
    event EventHandler<BiggyEventArgs<T>> ItemAdded;
    event EventHandler<BiggyEventArgs<T>> ItemsAdded;
  
    event EventHandler<BiggyEventArgs<T>> Changed;
    event EventHandler<BiggyEventArgs<T>> Loaded;
    event EventHandler<BiggyEventArgs<T>> Saved;
}

The primary differences between my implementation above and K. Scott’s proposed structure is the addition of a Remove(List<T>) method to remove a range of items from the list, and changing everything from IEnumerable<T> to List<T>. The reason for this last was that we seemed to have variations between List<T>, IEnumerable<T>, IList<T>, and others scattered about the API. While I can see a case coming to move back to IEnumerable, I was having to do a whole lot of myEnumerable.ToList() and such. We’ll see what happens. For now, I made everything I could List<T>.

Abstracting the Backing Store

One of the principle drivers behind the architectural changes was to separate the responsibilities of in-memory list management from those of data transfer to and from the backing store. Per K. Scott’s original proposal, the core IBiggyStore<T> interface is a simple, brute-force affair. Again, I have made some minor modifications which may be reversed before this reaches production, but for now, the basic Interface looks like this, with the additional IUpdateable and IQueryable variants as well:

The Biggy Store Interfaces:

public interface IBiggyStore<T>
{
    List<T> Load();
    void SaveAll(List<T> items);
    void Clear();     
    T Add(T item);
    List<T> Add(List<T> items);
}
  
public interface IUpdateableBiggyStore<T> : IBiggyStore<T>
{
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
}
  
public interface IQueryableBiggyStore<T> : IBiggyStore<T>
{
    IQueryable<T> AsQueryable();
}

In the above, we have abstracted the basic store functionality behind a set of interfaces. This allows us to swap backing stores with ease, while ensuring code which consumes instances of IBiggy will continue to function properly.

In my current implementation, I have defined a base class BiggyRelationalStore which contains code which will be common to any relational database implementation. Then, there are abstract methods which require concrete implementation in platform-specific subclasses.

As we can see, a concrete implementation of IBiggy through BiggyList can call out to the backing store through the various store interface method. As K. Scott says in his post:

“…the implementation of an actual data store doesn’t need to call into a base class or worry about raising events. The store only does what it is told…”

A Single Concrete List Implementation

The injection of the data store as an instance of IBiggyStore<T> allows us to create a single implementation class for the basic BiggyList. BiggyList is, of course, the business end of the Biggy library – its raison d’etre if you will. By injecting an abstract IBiggyStore instance into the constructor of the BiggyList class, we are able to get all of the code which previously managed database platform-specific store interaction out of the BiggyList class and safely stick it behind the IBiggyStore interface.

Sometimes, Abstraction Comes with a Price

On the whole, K. Scott’s new architecture cleaned up the Biggy code base significantly. However, this does not come without a price. Store abstraction and injection now requires the following to initialize a new BiggyList<T> instance:

Initializing a new BiggyList Instance with Store Injection:

// Initialize a Store Instance:
IBiggyStore<Artist> _artistStore = new SQLServerStore<Artist>("chinook");
  
// Inject the store into the BiggyList Constructor:
IBiggy<Artist> _artists = new BiggyList<Artist>(_artistStore);

We have added a bit of ceremony on the front side, in that we have to “new up” a store instance (in this case, a SQL Server store) for injection into our list constructor. Of course, we could do this in-line:

Initializing a new BiggyList Instance with Inline Store Injection:

// Inject a new store into the BiggyList Constructor:
IBiggy<Artist> _artists = new BiggyList<Artist>(new SQLServerStore<Artist>("chinook"));

But there is still some additional cognitive overhead (and a lot of repetitive type arguments to deal with!).

Also, because each concrete BiggyStore<T> is tied to a specific type argument <T> (and by extension, a specific database table, depending upon the backing store), we need to initialize a new store for each type-specific list we wish to consume:

Initializing Multiple BiggyList Instances:

// Initialize aseveral Store Instances:
IBiggyStore<Artist> _artistStore = new SQLServerStore<Artist>("chinook");
IBiggyStore<Album> _albumStore = new SQLServerStore<Album>("chinook");
IBiggyStore<Track> _trackStore = new SQLServerStore<Track>("chinook");
// Inject a new store into each BiggyList Constructor:
IBiggy<Artist> _artists = new BiggyList<Artist>(_artistStore);
IBiggy<Album> _albums = new BiggyList<Album>(_albumStore);
IBiggy<Track> _tracks = new BiggyList<Track>(_trackStore);

Hmmmm …

Now here we are looking at some repetitive coding, particularly since we need to specify the type argument <T> for each no less than five times . . .

However, since the whole purpose of Biggy is to get your data into memory for fast performance, while keeping things in sync with the backing store, you should mostly be able to do this once within your application, and then you’re off and running.

We shall see how things evolve. As I am learning, there is a balance between a friendly, easy-to-use API and “proper architecture” that is not always clear-cut.

Cache Schema Information

Biggy has a relatively sophisticated system for matching domain objects and properties with database tables and columns. Also, specific to relational database stores, there is the issue of primary keys, and whether or not these are auto-incrementing (“Identity” columns in SQL Server, and “serial” column types under Postgres).

Since Biggy relies heavily upon mapping database object names to domain object names to do its job, it made sense to pull as much schema information from the actual database as possible, then map objects and properties accordingly. We can accomplish this by hitting INFORMATION_SCHEMA once for a list of tables in the database, and again for a list of all the columns in the database. We then map columns to tables in memory, and make these mappings available for comparison to object names and properties.

This cache of schema information can be retrieved during initialization of an IBiggyStore<T> instance, or passed to a constructor override, depending on application requirements.If we needed just a single table’s data, we might just initialize our store using the database connection string name, as previously:

IBiggyStore<Artist> _artistStore = new SQLServerStore<Artist>("chinook");
IBiggy<Artist> artists = new BiggyList<Artist>(_artistStore);

No harm no foul above. It still seems a little clunkier than previous versions of Biggy, but not too bad.

Behind the scenes, during store initialization, the cache is still being retrieved, just specific for this instance. In the constructor for the sub-class SQLServerStore<T> we find:

public SQLServerStore(DbCache dbCache) : base(dbCache) { }
public SQLServerStore(string connectionString) 
    : base(new SQLServerCache(connectionString)) { }

As we can see, the constructor override is simply initializing a new SQLServerCache instance and passing it to the constructor of the base class, BiggyRelationalStore.

If, on the other hand, we need to spin up several tables (as in our earlier example above) it will make more sense to grab our schema stuff at the start, and pass references in to our store objects:

var schema = new SQLServerCache("chinook");
  
// Initialize several Store Instances, but pass the cached schema info in:
IBiggyStore<Artist> _artistStore = new SQLServerStore<Artist>(schema);
IBiggyStore<Album> _albumStore = new SQLServerStore<Album>(schema);
IBiggyStore<Track> _trackStore = new SQLServerStore<Track>(schema); 
  
IBiggy<Artist> artists = new BiggyList<Artist>(_artistStore);
IBiggy<Album> _albums = new BiggyList<Album>(_albumStore);
IBiggy<Track> _tracks = new BiggyList<Track>(_trackStore);

Yup. That looks pretty clunky.

All the abstraction/injection has made the Biggy code base more robust (much more, in my mind), but has made it less friendly from an API perspective.

What to do?

Simpler, More Friendly API or Stronger, More Flexible Library Structure?

The architecture proposed by K. Scott Allen most definitely improved the code organization, created better separation of concerns between the BiggyList and the backing store, and in general has created a code base which is more extensible. However, it has also introduced a good deal more ceremony from an API usage standpoint.

Do we decide, from a project standpoint, to wrap it all up somehow such that the API is simplified, but less extensible? Or do we provide the library as-is, and allow the consumer to decide how to best wrap it up in the context of their project.

I love the new structure, and while there is room for it to evolve (and I am CERTAIN I have missed some easy ways to make it more friendly!), I think the basics are there, and for the moment, the tradeoff is worth it. But that’s just my opinion, and the simple, no-ceremony API Biggy was born with is no longer so simple. And utter simplicity is one of Biggy’s strong points.

Wrap it Up in a Factory of Sorts . . .

Of course, depending on your application requirements, solutions to the simplicity conundrum might be easy to find. For example, If I were whipping up an application today, I might add a thingamajig like so:

Example Biggy Implementation Wrapper:

public class MyDatabase 
{
    DbCache _cache;
    public MyDatabase(string connectionStringName) 
    {
        _cache = new SQLServerCache(connectionStringName);
    }
  
    public IBiggyStore<T> CreateStoreFor<T>() where T : new()
    {
        return new SQLServerStore<T>(_cache);
    }
  
    public IBiggy<T> CreateBiggyList<T>() where T : new() 
    {
        return new BiggyList<T>(CreateStoreFor<T>());
    }
}

The above could then be called like so:

Consuming the Example Wrapper:

_db = new MyDatabase("chinook");
  
var artists = _db.CreateBiggyList<Artist>();
foreach (var artist in artists) 
{
    Console.WriteLine(artist.Name);
}

. . . Or Wrap it Up in a Context

Alternatively, one could borrow a page from Entity Framework, and go the “context” route:

Example Biggy Context Wrapper:

public class MyDatabaseContext : MyDatabase 
{
    public MyDatabaseContext(string connectionStringName) 
        : base(connectionStringName) 
    {
        this.Artists = this.CreateBiggyList<Artist>();
        this.Albums = this.CreateBiggyList<Album>();
        this.Tracks = this.CreateBiggyList<Track>();
    }
    public IBiggy<Artist> Artists { get; set; }
    public IBiggy<Album> Albums { get; set; }
    public IBiggy<Track> Tracks { get; set; }
}

The above makes possible the materialization of your data store into memory immediately upon initialization. Data can then be consumed directly, like so:

Consuming the Example Context Wrapper:

_db = new MyDatabaseContext("chinook");
foreach (var artist in _db.Artists) 
{
    Console.WriteLine(artist.Name);
}

The above are two similar, but slightly different ways one might consume Biggy within an application. While the new architecture adds a degree of (probably) undesirable ceremony to the simplest use case, it does enable some flexibility and extensibility which may have been more difficult previously.

It could be we wrap the new architecture into something like the above as part of the API for the library. I’m interested to see what others come up with.

Pushed Breaking Changes

At the suggestion of the project owner, I’ve just pushed my changes to the master repo. This is going to break things for anyone who has built out around the previous structure.

I’ve revised/Adapted the tests and the perf demos to work with the new architecture, and as of now, await feedback from the project owner as to what is acceptable, and what may need additional work.

More will be revealed . . .