Dates are a pain
Dates are a pain.
A number of months ago I created a demonstration library catalog using MyLibrary. The implementation was certainly not perfect. One of the glaring omissions were dates. None of the records had them.
Yesterday I took another look at the application. Using MARC::Record I extracted the values of field 260 subfield c from each of my MARC records. Some of the resulting values were four characters long. Some were longer. Some contained non-digit characters. Some not. After doing my best to normalize the dates (years), I had to fudge the values by appending “-01-01″ since the underlying MyLibrary database uses a date in the form of YYYY-MM-DD for storage.
The indexing process was just as challenging. Once getting the date out of the resource I had to remove the “-01-01″ values since they were bogus.
In this particular MyLibrary implementation searches against the underlying SRU interface return only record numbers. I use these record numbers to look up the content in MyLibrary for display, and once again I need to munge the date values.
The worst part of this process is the data I loose. Some dates (years) in MARC records are unknown or estimates. Examples include 197? or [1543]. My computer program is not able to handle this ambiguity, and consequently, in the first example, the date is lost completely. In the second example, the date is expressed as a known value, not an estimate.
Dates are a pain, and this does not even start to get into time measurements.
One Response to “Dates are a pain”
Discussion Area - Leave a Comment
You must be logged in to post a comment.
Could the dates be kept in two fields ?
- one varchar(255), with the date as in the marc record (or as guessed from the page title)
- one integer(4), as parsed or guessed by a script or a human
the original date would serve for displaying the information, and the normalized date would serve for sorting, searches etc.