I had some follow-up questions in both online and offline conversations after that post. In fact, you could say that our post lacked all the necessary context, so I wanted to provide a bit more structure around some of FlockData’s views on context.
Let’s take a look at a row in a database from a project that we have been working on with a customer (public data set, shared with customer’s permission):
Pretty useless, right? And yet, this is what’s often provided in database exports, CSV exchanges of data, etc.
Some of you might have the needed background information to understand what’s going on here; most of you probably won’t. I certainly didn’t.
What if I add the column names?
Any better? Again, maybe for some. Why is this only partly better? Because the column names still don’t provide all the context you might need. Why is this? Because databases are optimized for efficiency and performance. That includes doing things like compromising between fully descriptive column names, and short names that don’t take space and aren’t hard to write in SQL queries. Column names end up being somewhere in between – long enough to understand what data is contained therein (at least for the people who know the application) but no longer.
Let’s illuminate the data a bit so that hopefully everyone can understand what it’s talking about:
- Zip code: 00601
- State: 72 – maps to Puerto Rico
- County: 001 – maps to Adjuntas
- Tract: The specific geographic tract in question
And let’s add a little bit more information to finalize the description:
- This comes from the US census. This explains the tract.
- The census was performed in 2010
- The data was obtained in January 2015
- The data was obtained via US government open data services
Does that help? Context really enlightens and improves your data.
Photo credit: http://www.flickr.com/photos/42826854@N00/9574385695, What is Enriched Context?