Indexer
Component description
The MKSearch indexer component is responsible for extracting metadata from Web documents. It's current implementation is in the form of a set of SAX content handlers and XML filters. The content handlers are used in JSpider plugins and are triggered by document download callback events.
Completed task information has been moved to the beta 1 indexer plans archive.
Development plans
- RSS 1.0 processing
-
The current system only processes (X)HTML document metadata. MKSearch is also required to index RSS 1.0 feed metadata according to the RSS Dublin Core Module and RSS Qualified Dublin Core Module. This will require a different set of content handlers (and an additional JSpider plugin).
Task progress:
- Draft
RdfStoreWriterPlugin
class andRdfContentTypeOnly
rule prepared. - Test indexing successful, reviewing plugin configuration.
- Draft
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html