Skip Navigation

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Indexer

Component description

The MKSearch indexer component is responsible for extracting metadata from Web documents. It's current implementation is in the form of a set of SAX content handlers and XML filters, which are limited to processing HTML meta elements. The content handlers are used in JSpider plugins and are triggered by document download callback events.

Development plans

HTML link element indexing
The current content handlers only process HTML meta elements. The latest Dublin Core in HTML recommendation also allows link elements to be used. This will require a new type of content handler and an XhtmlLinkFilter, ultimately composed into a general purpose XHTML metadata processor.
e-GIF compatibility
Currently the system only indexes Dublin Core namespace metadata. The content handlers must be extended to cover UK e-GIF Metadata Standard markup. This will require an e-GIF com.mkdoc.schema.Schema and a new RDF store writer implementation.
RSS 1.0 processing
The current system only processes (X)HTML document metadata. MKSearch is also required to index RSS 1.0 feed metadata according to the RSS Dublin Core , which will require a different set of content handlers (and an additional JSpider plugin).

Up

This document was last modified by Philip Shaw on 2005-02-09 05:54:15
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html