Future directions
This is a page of notes on possible future directions for MKSearch. There are no plans to implement these features in the immediate future.
Open Office document indexing
The Open Office document format has metadata that would be suitable for indexing:
- It's all in XML
- It's stored as a Java archive
- It contains Dublin Core metadata
See chapter 2 of OpenOffice.org XML Essentials.
PDF indexing
PDFBox is a free Java library that provides access to embedded XMP metadata, which is serialized RDF.
This introductory article by Leigh Dodds, Looking at XMP outlines the RDF nature of the format.
File system indexing
It should be reasonably easy to walk a filesystem directory structure, find and index supported document types using MKSearch. This could make document metadata available on an intranet, so people know who to ask for copy or could get it directly.
JSpider features
- Alternative configuration schemes
- JSpider currently uses static factory-based configuration loaders with Java property files, which work fine, but cause some difficulties in unit testing. This is not a critical issue, but an alternative form of configuration may be devised.
Document Links
- OpenOffice.org XML Essentials
-
A draft book about the Open Office document format by J. David Eisenberg
http://books.evc-cit.info/book.html
- XMP metadata
-
The Adobe XMP format specification (PDF)
http://partners.adobe.com/public/developer/en/xmp/sdk/xmpspecification.pdf
- PDFBox
-
PDFBox metadata processing features
http://www.pdfbox.org/userguide/metadata.html
- Looking at XMP
-
An overview of the RDF nature of XMP format
http://www.ldodds.com/blog/archives/000261.html
This document was last modified on
2005-12-13 07:16:01.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html