Biggy

Biggy: Interface Semantics and The Interface Segregation Principle


Image by Dennis Yang  |  Some Rights Reserved

Over the past few months I’ve been working hard on Rob Conery’s Biggy project, a high-performance, in-memory query and persistence tool for .NET.

Recently, the architecture of Biggy underwent a major overhaul, implementing an interface-driven structure proposed by K.Scott Allen. The new structure created a much cleaner separation between the core, in-memory list function of Biggy itself and the backing data store.

At its core, Biggy provides an in-memory, synchronized abstraction over various types of backing store. The current architecture is such that store implementations are represented by one or more interfaces and injected into the memory list as a constructor argument. Abstracting the backing store behind a set of interfaces isolates the in-memory list implementation from changes to the backing store, and also allows the list to call into the backing store without worrying about store implementation.

It all works pretty darn well. However, I wonder if we might be approaching things from the wrong perspective in designing our interface structure.

NOTE: For additional background on Biggy, see the following:

Abstracting Backing Stores – Some Background from Biggy

In its current form, the Biggy architecture present a set of interfaces to represent a backing data store:

Interfaces to Represent Backing Stores in Biggy:
public interface IBiggyStore<T>
{
    List<T> Load();
    void Clear();     
    T Add(T item);
    List<T> Add(List<T> items);
}
  
public interface IUpdateableBiggyStore<T> : IBiggyStore<T>
{
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
}
  
public interface IQueryableBiggyStore<T> : IBiggyStore<T>
{
    IQueryable<T> AsQueryable();
}

The interfaces above represent the potential capabilities of different backing stores. In general, when we work with databases, file stores, and other persistence mechanisms, we generally seek to perform the basic CRUD functions: Create new records, Read records from the store, Update existing records, and Delete records.

Different Backing Stores Afford Different Levels of Compatibility

However, not all persistence engines afford all of the above capabilities directly. For example, with a file-based JSON store, we can read the records from the file by reading the file data into memory. However, the file store itself is not directly queryable in the same sense that a relational database might be.

Nor can we (technically) Update or Delete a specific record in the middle of the file – the mechanics of the file system are such that, without essentially writing an actual database system (and dealing with some ugly, low-level file system stuff while we’re at it). Since the reason for utilizing a file-based store is to keep things simple (and often, keep the data in a human-readable format), this is at cross purposes. We might as well scale up to a relational system at that point.

With a file-based store, we can Read the data from the file, and we can write data to the file, either in bulk, or by appending records to the end of the file. we can also clear the entire file. These functions are effectively represented by the IBiggyStore<T> interface. Lacking are methods to Update records, or to Remove (Delete) records. In order to Update or Remove an existing record, we are essentially stuck flushing the entire modified file back to disk.

Contrast this with a Relational store, which tends to include all of the CRUD functionality: We can query specific records or sets of records, we can INSERT, UPDATE, and DELTE records, etc. These functions are represented by the IBiggyStore<T> interface, an the additional IUpdateableBIggyStore<T> and (Potentially)IQueryableBiggyStore<T> interfaces used in conjunction.

Meaningful Backing Store Abstractions

In proposing the interface-driven architecture, K. Scott Allen indicates he was thinking in terms of the Interface Segregation Principle, as he explains in a discussion on a Github issue from the Biggy repo:

For example, Mongo [DB] is an IQueryable data source, so it might make sense to allow for effecient queries by “advertising” with an IQueryableBiggyStore. The only reason to implement this interface is if the behind the scenes store is IQueryable. A text file of data isn’t IQueryable, so I would not implement the interface on a text file store. Same with the updateable interface – it makes sense for data stores that know how to update and persist individual items, which I think . . . would be tricky with a file full of JSON.

In Allen’s view, different stores present different potential capabilities with respect to data access and manipulation, and the interfaces implemented should properly reflect (or “advertise”) this to the client. This makes total sense, and is a solid example of the interface representing the actual capabilities available for a given store.

Under this scenario we might define a file-based store something like this:

Simplified Example of a File-Based Store Implementation:
public class ExampleFileStore<T> : IBiggyStore<T> {
  
    // ...
    // ... Local methods supporting the core file store functionality
    // ...
  
    List<T> IBiggyStore<T>.Load() {
        // ... Code to read all data from a file
    }
  
    void IBiggyStore<T>.Clear() {
        // ... Code to clear data from a file
    }
  
    T IBiggyStore<T>.Add(T item) {
        // ... Code to append a record to the end of a file
    }
  
    List<T> IBiggyStore<T>.Add(List<T> items) {
        // ... Code to append many records to the end of a file
    }
}

And we might similarly define a store against a relational database like so:

A Simplified Relational Store Implementation:
public class ExampleRelationalStore<T> 
    : IBiggyStore<T>, 
    IUpdateableBiggyStore<T>, 
    IQueryableBiggyStore<T> {
    // IBIGGY IMPLEMENTATION:
    List<T> IBiggyStore<T>.Load() {
        // ... Code to read all records from a table
    }
    void IBiggyStore<T>.Clear() {
        // ... Code to clear all records from a table
    }
    T IBiggyStore<T>.Add(T item) {
        // ... Code to INSERT a record into a table
    }
    List<T> IBiggyStore<T>.Add(List<T> items) {
        // ... Code to INSERT many records into a table
    }
    // UPDATEABLE IMPLEMENTATION:
    T IUpdateableBiggyStore<T>.Update(T item) {
        // ... Code to UPDATE a specific record in a table
    }
    T IUpdateableBiggyStore<T>.Remove(T item) {
        // ... Code to DELETE a specific record from a table
    }
    List<T> IUpdateableBiggyStore<T>.Remove(List<T> items) {
        // ... Code to DELETE a set of specific records from a table
    }
    // QUERYABLE IMPLMENTATION:
    IQueryable<T> IQueryableBiggyStore<T>.AsQueryable() {
        // ... Code to return an instance of IQueryable representing table records
    }
}

As we can see here, a relational store implementation will generally make use of all the available interface methods with the possible exception of IQueryableBiggyStore<T>. In order to properly return a queryable instance, we would need a provider which returns IQueryable<T> , such as LinqToSql, or we would need to write our own.

Our two example cases present interface-based APIs which accurately reflect the capabilities of each store, and client code can implement the functionality required by implementing the proper combination of interfaces.

A Closer Look at the Interface Segregation Principle

The Interface Segregation Principle (ISP), described as part of Robert “Uncle Bob” Martin’s SOLID principles of Object-Oriented Design, states that:

“No client should be forced to depend on methods it does not use”

A potential corollary to ISP which is often cited, but for which I cannot find a direct source, is:

“Interfaces belong to the client, not the implementation”

From the perspective of defining an interface to abstract various data store implementations, the separations implied by IBiggyStore<T> , IUpdateableBiggyStore<T> , and IQueryableBiggyStore<T> make sense. Different clients can consume the featured represented by various store implementations, and the dependencies created between client and interface will indeed be consistent with ISP.

Within Biggy, though, our objective is to represent a data store as an in-memory list structure, and synchronize the data in the in-memory abstraction with that persisted in the actual store. Further, one of our primary goals in separating store implementation from in-memory list implementation is to isolate the memory list implementation from changes to the store implementation.

In other words, once we have built out an application which consumes Biggy as an in-memory data source, we want to be able to swap out (or more commonly, scale up) the backing store without impacting (or at least, minimally impacting) our application code.

Defining the BiggyList API

For the Biggy project, we have defined another interface, IBiggy<T> , which represents the core feature set required in any given BiggyList<T> implementation:

The IBiggy Interface:
public interface IBiggy<T> : IEnumerable<T>
{
    void Clear();
    int Count();
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
    T Add(T item);
    List<T> Add(List<T> items);
    IQueryable<T> AsQueryable();
    bool InMemory { get; set; }
  
    event EventHandler<BiggyEventArgs<T>> ItemRemoved;
    event EventHandler<BiggyEventArgs<T>> ItemAdded;
    event EventHandler<BiggyEventArgs<T>> ItemsAdded;
  
    event EventHandler<BiggyEventArgs<T>> Changed;
    event EventHandler<BiggyEventArgs<T>> Loaded;
    event EventHandler<BiggyEventArgs<T>> Saved;
}

As we can see, any BiggyList implementation requires all of the CRUD features we saw defined previously in multiple BiggyStore interfaces. In other words, while not all store implementations will offer all required features, any Biggy list will expect some way to perform all of these actions anyway.

One of the assumptions behind the current interface architecture essentially is the injection of a store into the BiggyList implementation as a constructor argument. A simplified, generic example might look something like this:

Simplified BiggyList Implementation:
public class BiggyListExample<T> : IBiggy<T> {
  
    IBiggyStore<T> _store;
    IUpdateableBiggyStore<T> _updatableStore;
    IQueryableBiggyStore<T> _queryableStore;
  
    List<T> _items;
  
    // . . . Code for various Event Hooks . . . 
  
    // ... Other Implementation Code ...
  
    public BiggyListExample(IBiggyStore<T> store) {
        _store = store;
        _updatableStore = store as IUpdateableBiggyStore<T>;
        _queryableStore = store as IQueryableBiggyStore<T>;
  
        _items = _store.Load();
    }
  
    public void Clear() {
        _store.Clear();
        _items.Clear();
    }
  
    public int Count() {
        return _items.Count();
    }
  
    public T Update(T item) {
        return _updatableStore.Update(item);
    }
  
    public T Remove(T item) {
        _items.Remove(item);
        return _updatableStore.Remove(item);
    }
  
    public List<T> Remove(List<T> items) {
        foreach (var item in items) {
        _items.Remove(item);
        }
        _updatableStore.Remove(items);
        return items;
    }
  
    public T Add(T item) {
        _items.Add(item);
        return _store.Add(item);
    }
  
    public List<T> Add(List<T> items) {
        _items.AddRange(items);
        return _store.Add(items);
    }
  
    public IQueryable<T> AsQueryable() {
        return _queryableStore.AsQueryable();
    }
  
    public IEnumerator<T> GetEnumerator() {
        return _items.GetEnumerator();
    }
  
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
        return _items.GetEnumerator();
    }
}

Given the example above is a very simplified implementation, it would appear that this class could actually represent a core, base-class type implementation. Except that, under our current assumptions about stores and interfaces discussed above, not all stores will offer up all of the features required for the class above to function properly. For example, if we passed in our ExampleFileStore class defined previously, this code will fail, since ExampleFileStore does not implement IUpdateableBiggyStore<T> or IQueryableBiggyStore<T>.

We could (and did, in the current iteration of Biggy) add null checks against the store instances, so that code won’t fail if we try to execute an interface method that is not present:

Checking for Null Before Calling Potentially Missing Interface Method:
public T Update(T item) {
    if (_updatableStore != null) {
        return _updatableStore.Update(item);
    }
}

This is even more evil, though. Now we think we have updated some data, but nothing happened.

Originally, we had defined a SaveAll() method as part of the IBiggyStore<T> interface. This made some sense when working against a file-based store, in that updating a record essentially requires re-writing the entire file to disk. With a SaveAll() method defined as part of IBiggyStore<T> , the above could be re-written (roughly):

Modified Null Check Example:
public T Update(T item) {
    if (_updatableStore != null) {
        return _updatableStore.Update(item);
    }  else {
        _store.SaveAll(_items);
        return item;
    }
}

The code above assumes that, by passing the entire modified list back to the store, the store contents will be over-written reflecting any changes, including updates and deletions. This works well for the file-based store. For other stores where explicit Update and/or Delete methods are available, the first part of the conditional will execute.

The trouble with this is two-fold. First off, one of the core assumptions implied by the Biggy library is that the in-memory list and the backing store are kept in sync in “Real-time” (or it should appear that way to the client, in any case). Including a SaveAll() method as part of the public interface implies that it is possible to make a series of changes before saving, or “flushing” back to the store. In other words, NOT in real time.

Secondly, SaveAll() implies a “Unit of Work” architecture due to widespread use in popular ORMs. The project owner, Rob, has explicitly opted against Unit of Work in favor of a composed transaction model, and we would be prudent to avoid using this term as part of the public store API. The goal is that Biggy will present a more transactional persistence model for writes to the store, and “Save All” implies something else.

We could, of course, create implementations for IBiggyList<T> which take into account the limitations of certain stores, and accomplish through brute force what cannot be provided by the basic interface. But this, to me, limits the effectiveness of our interface architecture.

In a perfect world, it seems we should be able to create a single, basic BiggyList implementation which is unaffected by moving from one store to the next. Special cases, such as stores and lists which are capable of using IQueryable, could then be created using specialized interfaces. But we should be able to assume core CRUD functionality, and compatibility within Biggy for all the various stores.

Re-thinking the Interface Structure from a Different Perspective

It seems like we have some decisions to make here. On the one hand, defining the store interfaces according to the available functionality makes 100% sense, and is probably in keeping with “proper” Object-Oriented Design principles. After all, if it is not technically possible (for example) to UPDATE a specific item in the store without re-writing the entire list back to disk, then should we be advertising that capability with and Update(item) method?

This would make sense if we were defining our stores as a straight-forward data-access library. However, we are explicitly defining store interfaces to provide compatibility with the Biggy library. Since “the interface belongs to the client, not the implementation” perhaps we should consider meeting the needs of the client first.

The over-arching premise of Biggy as a library is to provide an in-memory abstraction over a data store, along with all of the things we might expect to do with such a store. When we access data from any source, we expect some version of the basic CRUD functionality to be available.

Different stores may present other capabilities beyond the basic CRUD (the IQueryable returned by Mongo and/or LinqToSql is an example), but the basic ability to Create, Read, Update, and Delete records is a core function we expect to do from the context of a Biggy in-memory list. In cases where a certain store does not directly support one or more of the basic CRUD features, we might instead opt to implement the brute-force approach at the store implementation level. In other words, in order to be consumable by Biggy, and store must be compliant with the minimum feature set required by IBiggy<T> .

I propose combining the former IBiggyStore<T> and IUpdateableBiggyStore<T> :

Proposed Store Interface Organization:
public interface IBiggyStore<T> {
    List<T> Load();
    void Clear();
    T Add(T item);
    List<T> Add(List<T> items);
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
}
  
    
public interface IQueryableBiggyStore<T> : IBiggyStore<T> {
    IQueryable<T> AsQueryable();
}

We can leave the IQueryableBiggyStore<T> interface as a distinct and separate item, since:

A. Most store options do not directly return an IQueryable anyway, and;

B. The premise of Biggy is that we are working with data directly in memory, and not querying directly against the backing store. IQueryable<T> is explicitly contrary to this notion, in that you are creating queries which will execute in a deferred manner against the backing store.

C. Including AsQueryable<T> as part of the basic IBiggy<T> interface implies that it will always be possible to work with the store via an IQueryable instance, when in fact this is not necessarily possible (at least in any practical sense). This here might be considered a violation of the Interface Segregation Principle, in theory if not in fact.

While we can take a brute-force approach to implement (for example) an Update() method with a file-based store by writing the entire file contents to disk (including any updated data) then reading the updated file back from disk into memory, we can’t, in a practical sense, write an effective IQueryable<T> provider against a file-based store that isn’t doing exactly what the file-based store already does – read file contents into an in-memory list.

This doesn’t mean an IQueryableBiggyStore is not useful, nor that using IQueryable from a BiggyList is not a desirable option. Simply, that to do so may be a case where one should “opt-in” through use of a specialized store, interface, and, potentially, a specialized implementation of IBiggy<T>.

In cases where the IQueryable option is desired, it almost makes sense to create a specialized IBiggy<T> implementation for this purpose, since the usage may differ significantly from the standard implementation.

In keeping with the above, we might then find the following Biggy List interfaces:

Modified IBiggy<T> with Separate IQueryableBiggy Interface
public interface IBiggy<T> : IEnumerable<T>
{
    void Clear();
    int Count();
    T Update(T item);
    T Remove(T item);
    List<T> Remove(List<T> items);
    T Add(T item);
    List<T> Add(List<T> items);
    bool InMemory { get; set; }
  
    event EventHandler<BiggyEventArgs<T>> ItemRemoved;
    event EventHandler<BiggyEventArgs<T>> ItemAdded;
    event EventHandler<BiggyEventArgs<T>> ItemsAdded;
  
    event EventHandler<BiggyEventArgs<T>> Changed;
    event EventHandler<BiggyEventArgs<T>> Loaded;
    event EventHandler<BiggyEventArgs<T>> Saved;
}
  
  
public interface IQueryableBiggy<T> : IBiggy<T> {
    IQueryable<T> AsQueryable();
}

Summing Up

If we were creating our stores and attendant interfaces in the context of designing a straight-forward data-access/querying library, for direct consumption by client code, the original interface structure would likely be the right choice. The interface semantics would correctly describe the functionality each store implementation is capable of providing.

In Biggy, we have created and additional abstraction layer, the in-memory list representation of a store, and we might better define our basic store interface in terms of what a minimal BiggyList implementation will require. Then, we can build out store implementations to meet the needs of the client – in this case, the BiggyList.

I would be fascinated to here contrary views on this in the comments, or you can email me at the address in the “About the Author” section.

Additional Resources and Items of Interest

C#
Splitting and Merging Pdf Files in C# Using iTextSharp
ASP.Net
ASP.NET MVC: Configuring ASP.NET MVC 4 Membership with a SQL CE Database
CodeProject
Use Postgres JSON Type and Aggregate Functions to Map Relational Data to JSON
  • jatten

    jattenjatten

    Author Reply

    Matt – Terrific feedback thanks! Glad to know I am not alone in my perspective on this. :-)


  • Matt R.

    Matt R.Matt R.

    Author Reply

    I think your propose approach makes more sense – especially keeping in mind Rob's goal: working against the in-memory data – not the backing store. The backing store is secondary. It's just how the memory store is persisted/initially populated.

    The only reason I can really see for BackingStore details leaking into the front-facing IBiggyStore is if you're implementing some sort of synchronization for out-of-process updates to the backing store getting pushed to the in-memory list.

    Otherwise, as someone using the API, I just want to have all of my code be the same, and if I switch backing stores, that should be something I change via config, not via code.

    My $0.02.