<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Open Content Index</title>
		<style type="text/css" title="text/css" media="screen">
			body { font-size: large; margin: 5%; }
			li { margin-bottom: 1em }
			.code { background: silver; font-size: medium' }
		</style>
		<style type="text/css" title="text/css" media="print">
			.form { display: none }
			.hidden { display: none }
			a { text-decoration: none; color: black }
			li { margin-bottom: 1em }
			.code { font-size: medium' }
		</style>
    </head>
    <body>
        
        <h1 style='text-align: center'>Open Content Index</h1>
        <div class='form' style='text-align: center'>
<p style='font-size: large; color: silver'>This is a search engine against open access content. Select an index, enter a word or phrase, and click Go.</p>
<form action='./' method='get' style='margin-bottom: 2em'>
	<input type='hidden' name='cmd' value='search' />
	<select name='index' style='font-size: large'>
		<option value='wikipedia'>Wikipedia</option>
		<option value='gutenberg'>Project Gutenberg</option>
		<option value='dmoz' >The Open Directory Project</option>
		<option value='oaister' selected='selected'>OAIster</option>
		<option value='oca-all'>OCA (all)</option>
		<option value='oca-americana'>OCA (American libraries)</option>
		<option value='oca-iacl'>OCA (Children's Library)</option>
		<option value='oca-opensource'>OCA (Community contributed books)</option>
		<option value='oca-toronto' >OCA (Canadian libraries)</option>
		<option value='oca-universallibrary'>OCA (Universal Library)</option>
</select>
	<input type='text' size='30' name='query' value='' style='font-size: large' />
	<input type='submit' value='Go' style='font-size: large' />
</form>

<hr style='margin-bottom: 2em' />
</div>

        
        <h2>About the Index</h2>

<p class='hidden'>(Psst. <a href="javascript:window.print()">Print this page</a>. It is designed to read on a piece of paper.)</p>

<p>Open access content abounds, and all the library profession needs to do to leverage it is: 1) collect it, 2) organize it, 3) index it, and 4) provide access to it. This Web page illustrates one way these ideas can be implemented.</p>

<p>There exists a wide range of open access content appropo to the users of libraries. Some of this content includes but is not limited to:</p>

<ul>
	<li><a href="http://www.dmoz.org/">Open Directory Project</a> - Human-cataloged web resources. (4,780,821 records)</li>
	<li><a href="http://www.gutenberg.org/">Project Gutenberg</a> - High-quality clean-text ebooks, some audio-books. (20,800 records)</li>
	<li><a href="http://www.oaister.org/">OAIster</a> - A Union catalog of digital resources, chiefly open archives of journals, etc. (9,558,628 records)</li>
	<li><a href="http://www.archive.org/details/texts">Open Content Alliance (OCA)</a> - All of the ebooks made available by the Internet Archive as part of the Open Content Alliance (OCA). Includes high-quality, searchable PDFs, online book-readers, audio books, and much more. Excludes the Gutenberg sub-collection, which is available as a separate database. (134,250 records)</li>
	<li><a href="http://www.archive.org/details/americana">OCA American Libraries collection</a> - A sub-collection of the OCA. (48,299 records)</li>
	<li><a href="http://www.archive.org/details/iacl">The Internet Archive Children's Library</a> - Books for children from around the world. (669 records)</li>
	<li><a href="http://www.archive.org/details/opensource">Open Source Books</a> - A collection of community-contributed books at the Internet Archive. (2,489 records)</li>
	<li><a href="http://www.archive.org/details/toronto">OCA Canadian Libraries</a> - A sub-collection of the Open Content Alliance. (36,730 records)</li>
	<li><a href="http://www.archive.org/details/universallibrary">The Universal Library</a> - A digitzation project founded at Carnegie-Mellon University. Content hosted at the Internet Archive. (30,888 records)</li>
	<li><a href="http://www.wikipedia.org/">Wikipedia</a> - Titles and abstracts from Wikipedia, the open encyclopedia. (1,657,443 records)</li>
</ul>

<p>Besides existing in their native Web interfaces, these sets of content are also available in machine-readable formats. (Think MARC.) After systematically harvesting this content and indexing it, it is possible to provide a myriad of interfaces to it. A long time ago and in a galaxy far far away, Z39.50 used to be <em>the</em> interface. Today there are additional interfaces including SRW/U and OpenSearch.</p>

<p>This page -- Open Access Index -- is an example of the idea outlined above. A company named <a href="http://www.indexdata.com/">Index Data</a> harvested metadata from the sources above, indexed it, and provided a number of interfaces to their index. Open Access Index takes advantage of the SRW/U interface. Users select a specific index to search, enter a query, and click Go. The query is sent to a client application that converts the query into SRW/U and sends it to a server at Index Data. The server searches the selected index and returns a stream of XML. The client application transforms the XML into XHTML and displays the result on the user's screen. The whole client application is written in Perl, and the heart of the application is almost trivial in nature:</p>

<p class='code'><pre><![CDATA[

  # get the query
  my $query = &query_to_cql( $cgi->param( 'query' ));
  
  # build an SRU url
  my $url = SRUTEMPLATE;
  $url =~ s/##QUERY##/$query/e;
  $url =~ s/##INDEX##/$cgi->param( 'index' )/e;
  
  # create a user agent, create a request, send the url, and get a response
  my $ua       = LWP::UserAgent->new;
  my $request  = HTTP::Request->new( GET => $url );
  my $response = $ua->request( $request );
  
  # transform the response
  my $parser     = XML::LibXML->new;
  my $xslt       = XML::LibXSLT->new;
  my $source     = $parser->parse_string( $response->content ) or croak $!;
  my $style      = $parser->parse_string( &get_sru_to_html )   or croak $!;
  my $stylesheet = $xslt->parse_stylesheet( $style )           or croak $!;
  my $results    = $stylesheet->transform( $source )           or croak $!;
  
  # create the results page
  my $html = &get_template;
  $html =~ s/##FORM##/&get_form/e;
  $html =~ s/##CONTENT##/$stylesheet->output_string( $results )/e;
  $html =~ s/##QUERY##/$cgi->param( 'query' )/ge;
  
  # done
  &gracefulExit ( $html );
 
]]></pre></p>

<p>For more information about what Index Data did and its machine-to-machine interfaces see <a href="http://www.indexdata.com/opencontent/">www.indexdata.com/opencontent</a>. If you want to the client source code to Open Access Index, then see: <a href="http://mylibrary.library.nd.edu/oci/oci.txt">mylibrary.library.nd.edu/oci/oci.txt</a>. As you will see, most of the program is about user interface, not searching. (For a good time, it might be fun to add a Did You Mean? function and/or a synonym function to the interface. Hmmm...)</p>

<h2>Summary</h2>

<p>Again, open access content abounds. Much of it is relevant to needs of library users. By collecting this content, indexing it, and providing sets of services againt the index meeting the needs of users, libraries can exploit this thing called the Internet to a greater degree. By enhancing its skills and evolving with the environment librarianship can retain its traditional missions and at the same time provide useful collections and services to it clientele. The opportunities are almost limitless.</p>

 
        
	<div style='font-size: small; margin-top: 3em'>
		<hr />
		<p>Author: Eric Lease Morgan &lt;<a href="mailto:emorgan@nd.edu">emorgan@nd.edu</a>&gt;<br />
		Date created 2007-03-22<br />
		Date updated: 2007-03-22<br />
		URL: <a href="http://mylibrary.library.nd.edu/oci/">http://mylibrary.library.nd.edu/oci/</a></p>
	</div>
</body>
</html>
