One Size Does Not Fit All - The Truth About Data Dictionaries

Jan 07, 2015

Data has the potential to be powerful, valuable, and versatile to an institution. Depending on how it is handled, it also has the potential to look like this:

In modern parlance: a hot mess.

Complexity and volume can be overwhelming when not organized properly – all of the threads can become horribly tangled and knotted together. The desire to throw the whole mess out and start over with a single thread is understandable but ultimately shortsighted and unsustainable.

When an institution is in the process of implementing a Business Intelligence system overhaul, they often decry the lack of a 'comprehensive data dictionary' in their existing environment. And it's true – a data dictionary is an essential tool in effective database management. However, there seems to be a misconception about what a comprehensive data dictionary should really look like, and it pivots on the word comprehensive. There is a tendency to call for a pared down list of data definitions, as though the current dictionary (if one exists) just needs to be tidied up a bit and the ideal result will be fewer definitions than before.

But that is not necessarily the case. Institutions are complex and their data and reporting needs reflect that. For example, higher education institutions are understandably concerned about student enrollment. However, the definition of 'student enrollment' might vary from department to department or even from report to report within a single department. What constitutes a 'student'? What constitutes 'enrolled'? Are these only degree-seeking students? Credit-seeking? Registered students or only students who are paid in full? Do dual enrolled students count? All of these? None of these? A truly comprehensive data dictionary would capture all of these variations and more.

Databases can grow unwieldy if not properly maintained, and there is certainly room for some streamlining in the process- getting rid of stale definitions and redundant terms, for example - but it is unrealistic to expect that there would be a single definition for every data element. One size simply does NOT fit all. The desire to simplify is understandable- confusion over data definitions can create real problems. If one department defines student enrollment as those students who are paid in full, another defines student enrollment as those who are registered for classes, and still another excludes Continuing Education students, that will obviously present entirely different totals on reports. An administrator, an auditor, the public, or any user, upon viewing these different totals for the same term, would be confused at best and assume the institution is incompetent at worst. This can easily lead to finger-pointing and a cry for ONE definition to rule them all and in the darkness bind them. However, the situation actually calls for an open discussion:

  • Do we need all of these definitions? Are they all still in use or are some of them holdovers from retired reporting requirements?
  • Is the current name actually appropriate for the measure? Should it be renamed to better reflect what it captures?
  • Do we need to create subcategories for this term?

Technology increasingly allows us to capture and slice data down to the finest grain - finally allowing for the exploration of specific and intricate questions to take place. Institutions have always had those questions and the tools are at last catching up, which is a wonderful development and full of possibilities. Tools alone cannot realize these possibilities, however. Nor can an approach that fails to recognize an institution's specific context. A one-size-fits-all approach to data dictionaries will fail for the same reason an out-of-the-box reporting solution will fail- they both sell a simplicity that does not exist.

Institutions are complex, and data will and should reflect that, as should our data dictionaries. That means expanding our concept of a 'definition; it is not just the term that must be defined but also its owner and its usage. Once the discussion above takes place, there is still more work to be done. Who is the user of each definition? Why do they need it (i.e., which reports use the definition?)? We have to let our data dictionaries reflect the complexity of the data, as frustrating and daunting as that task may be. Because you need ALL the data threads to weave a complete tapestry – or, since the raw material is versatile and winter is coming, maybe crochet a nice thick scarf: