At Salsify, we manage the flow of product information between manufacturers, and distributors through to large retail outlets such as Walmart and Google Shopping. Our flexible schema allows users to describe their products however they see fit. We store product attributes in a single table, but each product attribute has a potentially different data type and associated application-specific logic.
When we first introduced the concept of product attributes, we only supported text and picklist data types. The initial implementation lived in a single active record model class and with each new data type, especially ones that introduced unique new functionalities like images, we quickly realized that the "one size fits all" attribute model was not going to scale. A common choice for modeling a hierarchy of models with varying application logic is to use single table inheritance (STI).
Use of STI typically comes with a handful of caveats and gotchas, but there are still cases -like our attributes table- that seem to fit perfectly into the designed use case. Rails' STI implementation involves specifying a column (defaults to type), which stores the stringified class name of a typed subclass. Our product attribute data model did not store class name anywhere, so we resigned to writing a migration to support the transition to single table inheritance.
How do we most effectively introduce a hierarchy of model classes to our largest and most widely used table?
While migrations are often pretty simple to write, and usually only a little more complicated to deploy, we like to consider our options when we think migrations might be necessary. Here is the list of possible approaches we came up with:
- Introduce another column to store the subclass:
- Requires migration to add and populate column in our largest table.
- Error prone application logic for maintaining the synchronization of data type and class type.
- Requires us to store unnecessary details of code (class name) inside of database, undesirable coupling.
- Rewrite application code to store the class in the data type column:
- Requires migration of stored data types to class names. Same problem with undesired code-database coupling.
- Need to rewrite a decent amount meta-programming that uses data type (or application code to convert class to data type) to support changes to data model.
- Leverage the fact that our data type column already contains all the necessary information to determine a subclass and not use rails STI:
- Re-inventing framework concepts can be error prone, and introduce potential friction to future upgrades.
- Can be done without any changes to the data model.
- Does not require us to store/maintain details of code in database.
After giving it some thought we decided to pursue the third option. Adding a complete hierarchy of classes without needing to change the data model (or rewrite a bunch of application logic) seemed pretty intriguing. Plus it wouldn't require downtime or tricky online data migrations!!
The concept is simple; specify a model attribute that can be used to determine the desired instance type and hook into rails model instantiation to ensure we build the appropriate type. The class active_record/inheritance.rb included the majority of the interfaces we needed (discriminate_class_for_record, subclass_from_attributes) and after a little work we ended up with a general purpose concern,
In practice, it ends up looking something like this:
After rails loads the data from the database it invokes a series of callbacks, one of which hooks into the inheritance class we specified above. Our concern overrides the call to
discriminate_class_for_record and calls the configured discriminator with the value of the defined type column. The discriminator can be any block that takes a single argument; in our example we discriminate based on a data type such as 'image' and return a class, eg.
ImageProductAttribute. Our implementation (linked below) includes a couple other nice features you might expect like being able to look up only image attributes via a query such as
We've been using discriminable model for a couple of months now and have noticed some additional capabilities that we didn't initially anticipate.
- Model moves/renames are less error prone, and do not require a migration. Just an update to the discriminator.
- Improved agility when prototyping changes due to the ability to quickly add a discriminate subclass to a model without changing the data model. Want to introduce a discriminate type for models that do/don’t have an ID stored for an association? You can do that!
- Improved efficiency in indices built on discriminate columns. An integer enum is more compact and offers quicker comparison, than stringified class names.
To date, we support nine data types including, but not limited to, images, text, picklists, dates, numerics and html. Our data type-specific logic is properly encapsulated in subclasses, our database is free from any understanding or our code, and all was accomplished without requiring any changes to our data model. We are still waiting to get discriminable model some burn in before we gemify it, but if you'd like to take a look at the source and let us know what you think we'd be happy to hear your thoughts!