Image by Dee Ashley | Some Rights Reserved
I’ve recently been hard at work contributing to an exciting new open source project, Rob Conery’s Biggy. Biggy is just getting off the ground, so things are evolving rapidly, and getting things working, for the moment, trumps elegance. At this point, we are refining things and evolving the structure and capabilities of the project.
Biggy offers a fresh take on maintaining an in-memory, synchronized representation of your data, along with a hybrid document/relational persistence model.
The engine Biggy currently uses to talk to relational databases is a tuned-up, modified version of Massive. Originally a highly dynamic micro ORM, for Biggy we have added strong support for static types, and customized things so that Biggy can materialize relational data into domain POCO objects in a high-performance way.
Biggy supports multiple database platforms. While currently the focus is on SQL Server and Postgresql, the notion is that ultimately, Biggy should be able to support any platform which implements the correct IBiggyStore
interface (see K. Scott Allen’s recent post positing our upcoming new architecture in this regard).
One of the major problems we needed to solve was the mapping of database objects to domain objects. Yes, this is one of the ever-present issues with ORM design (note that Biggy is NOT an ORM, although it shares some handy features with ORMs).
Mapping Domain Object Properties Using Type Introspection
The original Massive, and Biggy as well, use a wee bit of type introspection to identify object property names. Massive would originally assume that each property would correspond to a like-named database field.
Of course, this is not a safe assumption, especially in cases where one is handed an existing database with names which don’t follow convention. Also, certain Db platforms (Postgresql among them) are are case-sensitive with respect to object names, and expect, by convention, that multi-part names will be separated using underscores.
Consider a domain object Artist
:
And example Artist Class:
public class Artist { public int ArtistId { get; set; } public string LastName { get; set; } public string FirstName { get; set; } }
SQL Server is a fairly forgiving, and makes no distinction between a column named LastName
and a column named lastName
, or a column named lastname
. SQL Server doesn’t care about case. SQL Server will require delimiters for a column named Last Name
, so this would have to be wrapped up like so: [Last Name]
.
Postgres, however, is a different story. Postgres comes from a Unix heritage, where all the things are case-sensitive. Postgres expects database object names to be all lower-case unless you tell it otherwise by using double-quotes as delimiters. On top of that, when you pass raw SQL in which includes mixed-case object names, Postgres “helpfully” down-cases everything for you, unless specific object names are wrapped in delimiters. This is all fine and dandy, until you have actual database columns which include upper-case characters.
In Postgres-land, the object naming convention is to lower-case names, and separate words with an underscore. We expect to see columns such as last_name
and first_name
, and we don’t often see things like LastName
or lastName
. None of which is to say it doesn’t happen, of course. In fact, the default Postgres version of our chosen test database, Chinook, arrives out-of-the-box with proper-cased table and column names. For example, the Artist
table contains the columns ArtistId
and Name
.
NOTE: For those interested, I have created a modified version of the Chinook database with object names which conform to the Postgres convention, and which implements serial primary keys. The version available on Codeplex has neither (though it is a terrific sample database nonetheless!).
If you have an Artist
table in Postgres with the columns ArtistId
and Name
, the following SQL will fail:
This SQL Statement will Fail Against Postgres:
SELECT ArtistId, Name FROM Artist
In order for the above query to work, it would need to be re-written as follows:
This SQL will Work Against the Artist Table, but YUCK!
SELECT "ArtistId", "Name" FROM "Artist"
Lastly, if one is using Postgres platform conventions, dutifully using all-lower-case table and column names, properly separated with underscores, then under the original implementation of Massive and Biggie, we would have to name our domain objects and object properties accordingly:
Example Artist Class With Postgres-Compatible Object and Property Names:
public class artist { public int artist_id { get; set; } public string last_name { get; set; } public string first_name { get; set; } }
For the exceptionally simple artist
class, this may not seem such a big deal. However, with a number of more complex classes, a bunch of property names which include underscores as word separators might become is simply painful. Not to mention, idiomatic C# code utilizes either proper-casing or camel-casing. It is nice to respect the conventions of whatever platform, language or framework one is using, even when they mix and clash.
Solving Two Distinct Problems
As we see in the examples above, we are actually faced with two similar, yet overlapping problems:
- Properly mapping domain objects and properties to the appropriate, corresponding database entity or field.
- Properly handling cases where database object names need to be delimited in a platform-specific manner when serialized into dynamic SQL.
Since Biggy dynamically assembles SQL based on object properties, it was clear that we had a few choices to make. There were two potential approaches to this problem:
The “Opinionated” Approach
Decide which conventions Biggy will support, and encourage users to use these as standard defaults. Provide a mechanism to override the defaults, but throw until they either explicitly apply either our default convention, or the override mechanism. The “Rails” approach, if you will.
The “Kitchen Sink” approach
Assume that client code will sensibly use a few accepted conventions, and try to map those first. If this fails, then attempt to resolve the mapping through progressive permutation. If that fails, then look for an override mechanism. Fail only if we are unable to find a match using magic. More like the ASP.NET MVC approach, if you will.
I was surprised to find Rob was in favor of the second, “Kitchen Sink” approach. I fully expected him to favor a more opinionated policy. In any case, it was decided we would do everything we reasonably could to resolve column names to object properties.
It was decided Biggy would attempt to resolve name mappings along the following lines:
- See if there is a direct match between the column name and the property name. If so, we’re done – map it.
- See if, by ignoring case and/or stripping spaces and underscores out of the Db Column name, a match could be determined. If so, use it.
- If neither of the above successfully produces a match, look for a custom attribute on the property in question which specifies the Db Column to map to.
- Throw an informative exception indicating the source of the problem.
It would have been easy to decide that seeking out a custom attribute would be the first thing to do, and under what I consider the “opinionated” approach, this would have been the next course of action failing an initial match. However, in one of my favorite observations this year, Rob pointed out that using custom attributes in such a manner is distasteful because “It throws the DB stuff right into your model and it’s the ultimate push of the database right through the abstraction.”
“I really dislike attributes because it throws the DB stuff right into your model and it’s the ultimate push of the database right through the abstraction.”
-Rob Conery
I couldn’t agree more. And I wish *I* had said that.
Get the Schema from the Source – Information Schema
All that said, at some level, the code needs to know some things about the database in order to use it. Just not at the domain model level.
Since Biggy is purpose-built to maintain an in-memory representation of application data and sync to the store as needed (on writes and updates), I figured, why not go straight to the source for a canonical schema against which to compare property names to columns. To my way of thinking, standard relational databases already provide access to this canon for schema information – the ASNI Standard collection of Views known as INFORMATION_SCHEMA.
At some level, Biggy needs to understand and know about the database schema in order to do its job. We can query Information_Schema, and then cache the result at the lowest level possible. That being the case, we introduced some new class structures to the Biggy code base.
A Column Mapping Class
At the lowest level, I wanted to be able to represent everything Biggy would need to know about a database column and how it maps to its associated object property. Initially, I created the DBColumnMapping
class, as follows:
The DBColumnMapping class:
public class DbColumnMapping { string _delimeterFormatString; public DbColumnMapping(string delimiterFormatString) { _delimeterFormatString = delimiterFormatString; this.IsAutoIncementing = false; this.IsPrimaryKey = false; } public bool IsAutoIncementing { get; set; } public bool IsPrimaryKey { get; set; } public Type DataType { get; set; } public string TableName { get; set; } public string ColumnName { get; set; } public string PropertyName { get; set; } public string DelimitedColumnName { get { return string.Format(_delimeterFormatString, this.ColumnName); } } }
Notice that the database platform delimiter is passed in as a format string. For example, in the case of SQL Server, the format string would look like this
SQL Server Delimiter Format String:
"[{0}]"
As you can see from the DBColumnMapping
code, the DelimitedColumnName
property uses the string.Format()
method to wrap the actual column name in the delimiter characters specific to the platform before returning it to the client code.
The two properties related to Primary key status are set to sensible default values. Most columns are NOT primary keys, and most columns are NOT auto-incrementing.
Next, I wanted a way to easily retrieve the column name for a specific property, or the property name from a specific column. To this end, I created the DBColumnMappingLookup
class. I composed this around dual Dictionaries which contain references to the same set of DBColumnMapping
objects, but which refer to them using different sets of keys. One is keyed with the property name associated with each DBColumnMapping
object, and the other uses the column name as the key. In both cases, the actual DBColumnMapping
instance contains all the information we might need about the association, once retrieved.
The DBColumnMappingLookup Class
public class DbColumnMappingLookup { Dictionary<string, DbColumnMapping> ByProperty; Dictionary<string, DbColumnMapping> ByColumn; string _delimiterFormatString; public DbColumnMappingLookup(string NameDelimiterFormatString) { _delimiterFormatString = NameDelimiterFormatString; this.ByProperty = new Dictionary<string, DbColumnMapping> (StringComparer.InvariantCultureIgnoreCase); this.ByColumn = new Dictionary<string, DbColumnMapping> (StringComparer.InvariantCultureIgnoreCase); } public int Count() { return this.ByProperty.Count(); } public DbColumnMapping Add(string columnName, string propertyName) { string delimited = string.Format(_delimiterFormatString, columnName); var mapping = new DbColumnMapping(_delimiterFormatString); mapping.ColumnName = columnName; mapping.PropertyName = propertyName; // add the same instance to both dictionaries: this.ByColumn.Add(mapping.ColumnName, mapping); this.ByProperty.Add(mapping.PropertyName, mapping); return mapping; } public DbColumnMapping Add(DbColumnMapping mapping) { this.ByColumn.Add(mapping.ColumnName, mapping); this.ByProperty.Add(mapping.PropertyName, mapping); return mapping; } public DbColumnMapping FindByColumn(string columnName) { DbColumnMapping mapping; this.ByColumn.TryGetValue(columnName, out mapping); return mapping; } public DbColumnMapping FindByProperty(string propertyName) { DbColumnMapping mapping; this.ByProperty.TryGetValue(propertyName, out mapping); return mapping; } public bool ContainsPropertyName(string propertyName) { return this.ByProperty.ContainsKey(propertyName); } public bool ContainsColumnName(string columnName) { return this.ByColumn.ContainsKey(columnName); } }
The Add()
method adds each new DBColumn
instance to both dictionaries, using the appropriate property value as the key. There is also an overridden Add()
method by which a new DBColumn
can be added implicitly by passing a column name and a property name as arguments.
Given the above, now, if I have a particular column name and I need to find the matching property, I can simply pass the column name as the argument to the FindByColumn()
method and the DBColumn Object will be returned, from which I can grab the associated property name (and any other column details I need).
Likewise, I can test to see if either a specific property, or a specific column is represented in the lookup, by passing the appropriate string to the ContainsColumnName()
or ContainsPropertyName()
methods.
But wait, john – what if the same column or property name exists for more than one table or object?
Well, that’s where the DBTableMapping
class comes in.
Mapping Columns to Tables, and Tables to Domain Types
The final piece in the puzzle, the DBTableMapping
class gathers all of the columns together for each table, as well as performing similar name mapping function for tables and domain objects.
The DBTableMapping Class:
public class DBTableMapping { string _delimiterFormatString; public DBTableMapping(string delimiterFormatString) { _delimiterFormatString = delimiterFormatString; this.ColumnMappings = new DbColumnMappingLookup(_delimiterFormatString); this.PrimaryKeyMapping = new List<DbColumnMapping>(); } public string DBTableName { get; set; } public string MappedTypeName { get; set; } public string DelimitedTableName { get { return string.Format(_delimiterFormatString, this.DBTableName); } } public List<DbColumnMapping> PrimaryKeyMapping { get; set; } public DbColumnMappingLookup ColumnMappings { get; set; } }
Once again, we initialize DBTableMapping
with the database platform delimiter format string. Also not that the DBTableMapping
contains a List<DBColumnMapping>
specifically to hold references to the primary key(s) for the table in question. While not common, composite PKs do occur in the wild.
Cache Schema Information and Inject into the Store
Coming down the pike very quickly is a revamped architecture for Biggy. We will see the in-memory list functionality cleanly separated from the backing store implementation. At the same time, we might add a “DBCache” object which would be (optionally) injected into the store. The DBCache object would allow a single round trip to the database to retrieve schema info, into the structures described above, which could be passed by reference to store instances at initialization.
The store itself would be concerned with managing the serialization and deserialization of data between the database, the domain objects consumed by Biggy, and your application code. Sitting on top of the store would be an instance of IBiggy
itself. The concrete implementation of an IBiggy
would be concerned only with managing the in-memory data, and calling into the store to retrieve, write, or synchronize data.
All that said, the DBCache
would be the ideal place to cache all of our database schema info. With two quick pokes at Information_Schema at initialization, we can grab everything we need. Let’s take a look at how this would work in a SQL Server implementation.
First, the cache (I’ve omitted non-relevant code to focus on retrieving and loading our column mapping data, and I’ve simplified things a bit). The cache consists of a base class, (in our case, the DBCache
), and a subclass specific to the database platform:
The BiggyRelationalContext Class (Simplified):
public abstract class DBCache { public abstract string DbDelimiterFormatString { get; } protected abstract void LoadDbColumnsList(); protected abstract void LoadDbTableNames(); public string ConnectionString { get; set; } public List<DbColumnMapping> DbColumnsList { get; set; } public List<string> DbTableNames { get; set; } public DBCache(string connectionStringName) { ConnectionString = ConfigurationManager .ConnectionStrings[connectionStringName].ConnectionString; this.LoadSchemaInfo(); } public void LoadSchemaInfo() { this.LoadDbTableNames(); this.LoadDbColumnsList(); } // ... A bunch of other code here unrelated to mapping }
In the above, you can see we have a List<DBColumnMapping>
and a List<string>
representing database columns and table names, respectively. We are going to use the call to LoadSchemaInfo()
to pull in the table names, and the column info from the database using Information_Schema. However, LoadSchemaInfo()
in turn calls two abstract methods which are defined on our subclass. In this case, SQLServerCache
:
The SQLSeverCache Subclass of DBCache:
public class SQLServerCache : DbCache { public SQLServerCache(string connectionStringName) : base(connectionStringName) { } public override string DbDelimiterFormatString { get { return "[{0}]"; } } public override DbConnection OpenConnection() { var conn = new SqlConnection(this.ConnectionString); conn.Open(); return conn; } protected override void LoadDbColumnsList() { this.DbColumnsList = new List<DbColumnMapping>(); var sql = "" + "SELECT c.TABLE_NAME, c.COLUMN_NAME, " + " CASE tc.CONSTRAINT_TYPE WHEN 'PRIMARY KEY' THEN CAST(1 AS BIt) ELSE CAST(0 AS Bit) END AS IsPrimaryKey, " + " CASE (COLUMNPROPERTY(object_id(tc.TABLE_NAME), kcu.COLUMN_NAME, 'IsIdentity')) WHEN 1 THEN CAST(1 AS Bit) ELSE CAST(0 AS Bit) END as IsAuto " + "FROM INFORMATION_SCHEMA.COLUMNS c " + "LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu " + "ON c.TABLE_SCHEMA = kcu.CONSTRAINT_SCHEMA AND c.TABLE_NAME = kcu.TABLE_NAME AND c.COLUMN_NAME = kcu.COLUMN_NAME " + "LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc " + "ON kcu.CONSTRAINT_SCHEMA = tc.CONSTRAINT_SCHEMA AND kcu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME"; using (var conn = this.OpenConnection()) { using (var cmd = conn.CreateCommand()) { cmd.CommandText = sql; var dr = cmd.ExecuteReader(); while (dr.Read()) { var clm = dr["COLUMN_NAME"] as string; var newColumnMapping = new DbColumnMapping(this.DbDelimiterFormatString) { TableName = dr["TABLE_NAME"] as string, ColumnName = clm, PropertyName = clm, IsPrimaryKey = (bool)dr["IsPrimaryKey"], IsAutoIncementing = (bool)dr["IsAuto"] }; this.DbColumnsList.Add(newColumnMapping); } } } } protected override void LoadDbTableNames() { this.DbTableNames = new List<string>(); var sql = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo'"; using (var conn = this.OpenConnection()) { using (var cmd = conn.CreateCommand()) { cmd.CommandText = sql; var dr = cmd.ExecuteReader(); while (dr.Read()) { this.DbTableNames.Add(dr.GetString(0)); } } } } }
In the subclass SQLServerCache
, we find the implementation for LoadDBTableNames()
and LoadDbColumnsList()
, which fetch the table name and column info we need, respectively. That big, ugly blob of raw SQL retrieves all the columns in our database, and includes all the information required for each to properly populate a DBColumnMapping
object. Except, of course, the mapped property name. That’s next. For now, we just fill it in with the column name. We will overwrite it later as appropriate.
Output from Big Ugly SQL from Information_Schema:
Mapping Table and Column Data at the Table Level
Now, after all that ceremony, we get to the heart of the matter. We will add a method to the base class DBCache
which accepts a type argument <T>, and attempts to map that type to a database table, and then maps the table columns to the type properties.
Ready for a monolithic wall of code? Good! There are some potential refactorings in here, but for the most part, we have a lot of layered type introspection going in, and for now it made as much (more) sense to leave it all together. Here it is, I give you, the getTableMappingForT()
method:
The GetTableMappingForT Method:
public virtual DBTableMapping getTableMappingFor<T>() where T : new() { // This is what we want to return to the caller: var tableMapping = new DBTableMapping(this.DbDelimiterFormatString); // Always cache the results of reflection-based calls where possible: var item = new T(); var itemType = item.GetType(); var properties = itemType.GetProperties(); string replaceString = "[^a-zA-Z1-9]"; var rgx = new Regex(replaceString); // Get a down-cased version of the type name string flattenedItemTypeName = rgx.Replace(itemType.Name.ToLower(), ""); // Get a pluralized version, in case the table is pluralized: string plural = Inflector.Inflector .Pluralize(flattenedItemTypeName); // Does the fully-lower-case type name match any fully-lower-case table name? var dbTableName = this.DbTableNames .FirstOrDefault(t => rgx.Replace(t.ToLower(), "") == flattenedItemTypeName); if (dbTableName == null) { // If not, does the plural match any table name? dbTableName = this.DbTableNames .FirstOrDefault(t => rgx.Replace(t.ToLower(), "") == plural); } else { // Is the type decorated with a DBTableName Attribute? var tableNameAttribute = itemType.GetCustomAttributes(false) .FirstOrDefault(a => a.GetType() == typeof(DbTableAttribute)) as DbTableAttribute; if (tableNameAttribute != null) { dbTableName = tableNameAttribute.Name; } } tableMapping.DBTableName = dbTableName; tableMapping.MappedTypeName = itemType.Name; // Find the column info for this table in the local DBColumnsList: var dbColumnInfo = from c in this.DbColumnsList where c.TableName == dbTableName select c; foreach (var property in properties) { // Downcase the property name string flattenedPropertyName = rgx.Replace(property.Name.ToLower(), ""); // does the down-cased property name match any donw-cased column name? DbColumnMapping columnMapping = dbColumnInfo .FirstOrDefault(c => rgx.Replace(c.ColumnName.ToLower(), "") == flattenedPropertyName); if (columnMapping != null) { // use it as-is columnMapping.PropertyName = property.Name; columnMapping.DataType = itemType; } else { // Look for a custom column name attribute: DbColumnAttribute mappedColumnAttribute = null; var attribute = property.GetCustomAttributes(false) .FirstOrDefault(a => a.GetType() == typeof(DbColumnAttribute)); if (attribute != null) { // Use the column name found in the attribute: mappedColumnAttribute = attribute as DbColumnAttribute; string matchColumnName = mappedColumnAttribute.Name; columnMapping = dbColumnInfo .FirstOrDefault(c => c.ColumnName == matchColumnName); columnMapping.PropertyName = property.Name; columnMapping.DataType = itemType; } } if (columnMapping != null) { // Add the column mapping instance to the // DBTableMapping.ColumnMappings list: tableMapping.ColumnMappings.Add(columnMapping); if (columnMapping.IsPrimaryKey) { // Add the DBColumnMapping Instance to // the list of primary key columns for the current table: tableMapping.PrimaryKeyMapping.Add(columnMapping); } } } return tableMapping; }
The getTableMappingForT
method above returns a complete mapping of the table represented by the type <T> to the caller (in this case, an instance of SQLServerStore
). The Store can then use the mapping to rapidly retrieve the proper database object mappings for various domain objects as needed to build SQL on-the fly, serialize/de-serialize objects, and ensure that SQL pushed into the database is properly delimited by default.
Custom Attributes as a Last Resort
Notice in the above code there is a point where, while attempting to match a table name, if we have not found a match by any other means, we look for any custom attributes of type DBTableAttribute
. Likewise when trying to match column names, we eventually end up looking for a DBColumnAttribute
.
In other words, when all else fails, we look to see if the user has decorated a class or property with a custom attribute which provides a name mapping. In reality, the down-cases name comparison, the pluralized comparison, and the regex which allows comparison with any odd characters stripped out, we have likely covered 95% of the name mismatch cases.
But sometimes you may need to map a property to a column name which, for whatever reason, is not going to match. in this case, you can break out the EF style Custom attributes we added for just this purpose.
Custom Attributes for when Nothing Else Works:
public class DbColumnAttribute : Attribute { public string Name { get; protected set; } public DbColumnAttribute(string name) { this.Name = name; } } public class DbTableAttribute : Attribute { public string Name { get; protected set; } public DbTableAttribute(string name) { this.Name = name; } }
Putting it All Together
Well, gee whiz, John, that was a long rambling tour. Why do I care?
Well, now Biggy can provide automatic table and column mapping out of the box, 95% of the time. In those edge cases where Biggy can’t automagically figure out how to map database objects to domain objects, you can always add some custom attributes to make it work.
For example, something which happens frequently under the hood is that Biggy needs to de-serialize data incoming from the backing store into a proper domain object. If we take a look at some code from the BiggyRelationalStore
class, we can see that after initialization, the store reaches into the injected context to retrieve a DBTableMapping
instance.
Code Excepted from Base Class BiggieRelationalStore:
// Constructor for Base Class: public BiggyRelationalStore(DbCache dbCache) { this.Cache = dbCache; this.tableMapping = this.getTableMappingForT(); // Is there an auto PK? If so, set the member variable: if(this.tableMapping.PrimaryKeyMapping.Count == 1) { if(this.tableMapping.PrimaryKeyMapping[0].IsAutoIncementing) { this.PrimaryKeyMapping = this.tableMapping.PrimaryKeyMapping[0]; } } } public virtual DBTableMapping getTableMappingForT() { return this.Cache.getTableMappingFor<T>(); } // A bunch of other code... internal T MapReaderToObject<T>(IDataReader reader) where T : new() { var item = new T(); var props = item.GetType().GetProperties(); foreach (var property in props) { if (this.PropertyColumnMappings.ContainsPropertyName(property.Name)) { string mappedColumn = this.PropertyColumnMappings .FindByProperty(property.Name).ColumnName; int ordinal = reader.GetOrdinal(mappedColumn); var val = reader.GetValue(ordinal); if (val.GetType() != typeof(DBNull)) { property.SetValue(item, reader.GetValue(ordinal)); } } } return item; }
After initialization, the DBTableMappin
g instance is available locally, so that methods such as MapReaderToObject
can map object properties to database columns, retrieve the data from a DataReader, and hydrate a domain object instance.
Explore the Code
Much of the code you see above is in the final stages of development, but has not yet been pushed to the master branch of the Github repomaster branch of the Github repo. I expect it will be soon. In the meantime, I strongly encourage you to pull down the current code and see what we’ve got happening. The project is young, and it will be evolving rapidly in the weeks to come. There is a lot to see up there, and I hope you both find something useful, and also shoot us some feedback. Especially bug reports!
Additional Resources and Items of Interest
- Biggy: A Fresh Look at High-Performance, Synchronized In-Memory Persistence for .NET
- Biggy Basics, Part 1: Documents
- Biggy Basics Part 2: Getting Relational
- ASP.NET MVC 5 Identity: Implementing Group-Based Permissions Management
- ASP.NET MVC 5 Identity: Extending and Modifying Roles
- Git: Combine and Organize Messy Commits Using Interactive Rebase
Comments