Dates are a pain

Dates are a pain.

A number of months ago I created a demonstration library catalog using MyLibrary. The implementation was certainly not perfect. One of the glaring omissions were dates. None of the records had them.

Yesterday I took another look at the application. Using MARC::Record I extracted the values of field 260 subfield c from each of my MARC records. Some of the resulting values were four characters long. Some were longer. Some contained non-digit characters. Some not. After doing my best to normalize the dates (years), I had to fudge the values by appending “-01-01″ since the underlying MyLibrary database uses a date in the form of YYYY-MM-DD for storage.

The indexing process was just as challenging. Once getting the date out of the resource I had to remove the “-01-01″ values since they were bogus.

In this particular MyLibrary implementation searches against the underlying SRU interface return only record numbers. I use these record numbers to look up the content in MyLibrary for display, and once again I need to munge the date values.

The worst part of this process is the data I loose. Some dates (years) in MARC records are unknown or estimates. Examples include 197? or [1543]. My computer program is not able to handle this ambiguity, and consequently, in the first example, the date is lost completely. In the second example, the date is expressed as a known value, not an estimate.

Dates are a pain, and this does not even start to get into time measurements.

One Response to “Dates are a pain”

  1. Could the dates be kept in two fields ?
    - one varchar(255), with the date as in the marc record (or as guessed from the page title)
    - one integer(4), as parsed or guessed by a script or a human

    the original date would serve for displaying the information, and the normalized date would serve for sorting, searches etc.

Discussion Area - Leave a Comment

You must be logged in to post a comment.