Future directions

This is a page of notes on possible future directions for MKSearch. There are no plans to implement these features in the immediate future.

Open Office document indexing

The Open Office document format has metadata that would be suitable for indexing:

  1. It's all in XML
  2. It's stored as a Java archive
  3. It contains Dublin Core metadata

See chapter 2 of OpenOffice.org XML Essentials.

PDF indexing

PDFBox is a free Java library that provides access to embedded XMP metadata, which is serialized RDF.

This introductory article by Leigh Dodds, Looking at XMP outlines the RDF nature of the format.

File system indexing

It should be reasonably easy to walk a filesystem directory structure, find and index supported document types using MKSearch. This could make document metadata available on an intranet, so people know who to ask for copy or could get it directly.

JSpider features

Alternative configuration schemes
JSpider currently uses static factory-based configuration loaders with Java property files, which work fine, but cause some difficulties in unit testing. This is not a critical issue, but an alternative form of configuration may be devised.

Document Links

OpenOffice.org XML Essentials
A draft book about the Open Office document format by J. David Eisenberg
http://books.evc-cit.info/book.html
XMP metadata
The Adobe XMP format specification (PDF)
http://partners.adobe.com/public/developer/en/xmp/sdk/xmpspecification.pdf
PDFBox
PDFBox metadata processing features
http://www.pdfbox.org/userguide/metadata.html
Looking at XMP
An overview of the RDF nature of XMP format
http://www.ldodds.com/blog/archives/000261.html
This document was last modified on 2005-12-13 07:16:01.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html