The Challenges of Date Scoping in Enterprise Search
I enjoyed this post from Martin White about the single thing that would make enterprise/intranet search better.
He says it’s the ability to specify a date range.
In the enterprise there is a constant need to define a specific date ('the 2011 corporate social responsibility statement’), a date range ('projects undertaken in Germany between 2009 and 2011’) or a date representation ('the Q3 and Q4 sales reports for alternators’).
If you think about it, very few pieces of information in an enterprise are date-agnostic. Virtually everything short of perennials like the mission statement or dress code has (or should have) some date or date range as a critical piece of metadata.
Martin concentrates on the some of the technical implementation issues – lack of consistent date parsing and tokenizing, user interfaces problems, etc. – but my concern is more around the base logic of it: how do you standardize what “date” means in relation to something? In the very least, how do you determine scoping?
For example, the 2011 Annual Report. What date goes on this? January 1, 2011? Or February 7, 2011 – the date it was actually published? 2011 as a year – the date to which it applies? Does there need to be a way to specify a date range – “All of 2011”?
When you search and specify a date range, are you saying “find me things published in this range,” or are you saying “find me things that discuss topics relevant during this range?” If I search for “1942,” do I want things published in 1942, or would a book about World War 2 be relevant? The latter is arguably more valuable, but how do come up with a governance and standardization framework to get people to add consistent metadata around this?
Indeed, the training challenges alone in getting all publishers in an enterprise to understand and apply the abstract concept of “date relevancy” consistently would be daunting.