Skip Navigation

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Beta 1 indexer plans

The page lists summary task and progress notes for the beta 1 release of the MKSearch indexer component. This is an archive page.

HTML link element indexing

The current content handlers only process HTML meta elements. The latest Dublin Core in HTML recommendation also allows link elements to be used. This will require a new type of content handler and an XhtmlLinkFilter, ultimately composed into a general purpose XHTML metadata processor.

Task progress: Many of the original alpha classes have been refactored and new interfaces introduced to provide more flexible usage and integration.

  • XhtmlLinkFilter completed.
  • LinkTripleWriter completed.
  • LinkRDFStoreWriter completed.
  • Composite XhtmlTripleWriter completed.
  • Composite XhtmlRDFStoreWriter completed.
  • Full Dublin Core in HTML compatibility complete.
e-GIF compatibility

Currently the system only indexes Dublin Core namespace metadata. The content handlers must be extended to cover UK e-GIF Metadata Standard markup. This standard includes records management fields specified by the National Archives' Requirements for Electronic Records Management Systems, and is based on the e-GMS Application Profile Version 1. This will require an e-GIF com.mkdoc.schema.Schema and a new RDF store writer implementation.

Task progress:

  • UKeGMS Schema class completed to e-GMS Application Profile Version 1 specification.
  • Custom schema configuration introduced to the XhtmlTripleWriterPlugin class to enable e-GMS indexing. See Crawler development plans.
  • Extended the test document Web site to include all e-GMS elements, refinements and encoding schemes.
  • Custom schema configuration introduced to the XhtmlStoreWriterPlugin class to enable e-GMS indexing. See Crawler development plans.
  • Full e-GIF compatibility complete.
Application profiles

The original indexing system used the Schema interface to expand metadata values to URIs. Extension methods were added to the interface to permit the type of mixed form used in the UK e-GMS schema, which shares elements with the Dublin Core element set and provides its own refinements. An ApplicationProfile interface is required to allow more flexible configuration and enable dynamic configuration of the Query component.

The ApplicationProfile interface will also need methods to iterate through all the predicates they contain to dynamically generate search forms in the Query component.

Task progress:

  • Refactored the com.mkdoc.schema and com.mkdoc.sax packages to introduce the new ApplicationProfile interface.
  • Introduced DublinCoreProfile and UKeGMSProfile classes in place of the former Schema types.
  • Completed DublinCoreProfile and AbstractApplicationProfile.
  • Completed UKeGMSProfile.
  • Custom application profiles complete (indexing functions)

Up

This document was last modified by Philip Shaw on 2005-08-04 07:41:18
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html