Future directions

This is a page of notes on possible future directions for MKSearch. There are no plans to implement these features in the immediate future.

Open Office document indexing

The Open Office document format has metadata that would be suitable for indexing:

It's all in XML
It's stored as a Java archive
It contains Dublin Core metadata

See chapter 2 of OpenOffice.org XML Essentials.

PDF indexing

PDFBox is a free Java library that provides access to embedded XMP metadata, which is serialized RDF.

This introductory article by Leigh Dodds, Looking at XMP outlines the RDF nature of the format.

File system indexing

It should be reasonably easy to walk a filesystem directory structure, find and index supported document types using MKSearch. This could make document metadata available on an intranet, so people know who to ask for copy or could get it directly.

JSpider features

Alternative configuration schemes: JSpider currently uses static factory-based configuration loaders with Java property files, which work fine, but cause some difficulties in unit testing. This is not a critical issue, but an alternative form of configuration may be devised.

Document Links

OpenOffice.org XML Essentials: A draft book about the Open Office document format by J. David Eisenberg
http://books.evc-cit.info/book.html

XMP metadata: The Adobe XMP format specification (PDF)
http://partners.adobe.com/public/developer/en/xmp/sdk/xmpspecification.pdf

PDFBox: PDFBox metadata processing features
http://www.pdfbox.org/userguide/metadata.html

Looking at XMP: An overview of the RDF nature of XMP format
http://www.ldodds.com/blog/archives/000261.html

This document was last modified on 2005-12-13 07:16:01.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html