Supporting search

When you want to provide search against your MyLibrary content use and indexer.

MyLibrary is great for storing and manipulating the information of digital libraries. This is because it uses a database underneath. Ironcially, databases are weak when it comes to search because queries always need to be mapped to fields. Moreover databases, unless they exploit some sort of “-ism” do not support relevance ranking. This is where indexers come in. They do not require you to denote a field to search, and they do support relevance ranking.

When you want to support search against MyLibrary, write a report against MyLibrary and feed the content to your indexer of choice. While SOLR/Lucene seem to be the gold standard these days, I like Kinosearch because it uses the same query language as Lucene and the Lucene query language is supported by my SRU client.

Here is some code that loops through each MyLibrary resource object, extracts some metadata, and adds it to a Kinosearch index:

# define
use constant INDEX => '../etc/index';

# require/include
use KinoSearch::InvIndexer;
use KinoSearch::Analysis::PolyAnalyzer;
use MyLibrary::Core;

# configure
MyLibrary::Config->instance( 'catalog' );

# create an index
$analyzer   = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
$invindexer = KinoSearch::InvIndexer->new(
  invindex => INDEX,
  create   => 1,
  analyzer => $analyzer
);
$invindexer->spec_field( name => 'id' );
$invindexer->spec_field( name => 'fkey' );
$invindexer->spec_field( name => 'title' );
$invindexer->spec_field( name => 'creator' );
$invindexer->spec_field( name => 'subject' );
$invindexer->spec_field( name => 'description' );

# process each resource
my $index = 0;
my @ids = MyLibrary::Resource->get_ids;
foreach ( MyLibrary::Resource->get_ids ) {

  # get this resource
  my $resource = MyLibrary::Resource->new( id => $_ );

  # create, fill, and commit a document with content
  my $doc = $invindexer->new_doc;
  $doc->set_value ( id          => $resource->id );
  $doc->set_value ( fkey        => $resource->fkey );
  $doc->set_value ( title       => $resource->name ))   unless ( ! $resource->name );
  $doc->set_value ( creator     => $resource->creator ) unless ( ! $resource->creator );
  $doc->set_value ( subject     => $resource->subject ) unless ( ! $resource->subject );
  $doc->set_value ( description => $resource->note )    unless ( ! $resource->note );

  # done
  $invindexer->add_doc( $doc );

}

# clean up
print "noptimizing... ";
$invindexer->finish( optimize => 1 );

# done
exit;

Here is some code that searches the resulting index:

# define
use constant INDEX => '../etc/index';

# require/include
use KinoSearch::Searcher;
use KinoSearch::Analysis::PolyAnalyzer;
use MyLibrary::Core;

# configure
MyLibrary::Config->instance( 'catalog' );

my $query = shift;
if ( ! $query ) {

  # get the query
  print "Enter a query. "; chop ( $query = <STDIN> )

}

# open an index
$analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
$searcher = KinoSearch::Searcher->new(
  invindex => INDEX,
  analyzer => $analyzer
);

# search
$hits = $searcher->search( qq($query) );

# get the number of hits and report result
$total_hits = $hits->total_hits;
print "Your query ($query) found $total_hits record(s).\n\n";

# loop through the results
while ( my $hit = $hits->fetch_hit_hashref ) {

  &listOneResource( $hit->{ 'id' } );

}
print "\n";

sub listOneResource {

  my $id = shift;
  my $resource = MyLibrary::Resource->new( id => $id );
  print "           id = " . $resource->id   . "\n";
  print "         name = " . $resource->name . "\n";
  print "         date = " . $resource->date . "\n";
  print "         note = " . $resource->note . "\n";
  print "     creators = ";
  foreach ( split /|/, $resource->creator ) { print "$_; " }
  print "\n";
  my @resource_terms = $resource->related_terms();
  print "      term(s) = ";
  foreach (@resource_terms) {

    my $term = MyLibrary::Term->new(id => $_);
    print $term->term_name, " ($_)", '; ';

  }
  print "\n";
  my @locations = $resource->resource_locations();
  print "  location(s) = ";
  foreach (@locations) { print $_->location, "; " }
  print "\n\n";

}

Discussion Area - Leave a Comment

You must be logged in to post a comment.