Rabu, 07 September 2011

Has de-normalisation had its day?

Ever since the relational database became king there has been a mantra in IT and information design.  De-normalisation is critical to the effective use of information in both transactional and, particularly, analytical systems.  The reason for de-normalisation is to do with the issues around read performance in relational models.  De-normalisation is always an increase in complexity over the business information model and its done for performance reasons alone.

But do we need that anymore?  For three reasons I think the answer is, if not already no, then rapidly becoming no.  Firstly its to do with the evolution of information itself and the addition of caching technologies, de-normalisation's performance creed is becoming less and less viable in a world where its actually the middle tier that drives the read performance via caching and the OO or hierarchical structures that these caches normally take.  This is also important because the usage of information changes and thus the previous optimisation becomes a limitation when a new set of requirements come along.  Email addresses were often added, for performance reasons, as child records rather than using a proper "POLE" model, this was great... until email became a primary channel.  So as new information types are added the focus on short term performance optimisations causes issues down the road directly because of de-normalisation.

The second reason is Big Data taking over in the analytical space.  Relational models are getting bigger but so are approaches such as Hadoop which encourage you to split the work up to enable independent processing.  I'd argue that this suits a 'normalised' or as I like to think of it "understandable" approach for two reasons.  Firstly the big challenge is often how to break down the problem, the analytics, into individual elements and that is easier to do when you have a simple to understand model.  The second is that grouping done for relational performance don't make sense if you are not using a relational approach to Big Data.

The final reason is to do with flexibility.  De-normalisation optimises information for a specific purpose which was great if you knew exactly what transactions or analytics questions would be answered but is proving less and less viable in a world where we are seeing ever more complex and dynamic ways of interacting with that information.  So having a database schema that is optimised for specific purpose makes no sense in a world where the questions being asked within analytics change constantly.  This is different to information evolution, which is about new information being added, but is about the changing consumption of the same information.  The two elements are most certainly linked but I think its worth viewing them separately.  The first says that de-normalisation is a bad strategy in a world where new information sources come in all the time, the later says its a bad reason if you want to use you current information in multiple ways.

In a world where Moore's Law, Big Data, Hadoop, Columnar databases etc are all in play isn't it time to start from an assumption that you don't de-normalise and instead model information from a business perspective and then most closely realise that business model within IT?  Doing this will save you money as new sources become available, as new uses for information are discovered or required and because for many cases a relational model is no-longer appropriate.

Lets have information stored in the way it makes sense to the business so it can evolve as the business needs, rather than constraining the business for the want of a few SSDs and CPUs.

Thanks for your visiting this blogHas de-normalisation had its day? any comments? please write your comments below ..

Subscribe

Followers

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Back to TOP