Beta 1 crawler plans
This page lists summary task and progress notes for the beta 1 release of the MKSearch crawler component. This is an archive page.
Development plans
- Custom schema plugin types
-
The default metadata schema support includes Dublin Core elements and qualifiers. However, government Web sites may have their own metadata schemas, such as the UK e-Government Metadata Standard, which need to be integrated with these schemes. The crawler plugins need to allow for custom
Schema
support.Task progress: Re-factored the plugin class hierarchy to share more common functionality.
-
XhtmlTripleWriterPlugin
completed. -
XhtmlStoreWriterPlugin
completed. - Custom schema support complete.
-
- Custom application profile support
-
The dynamic interface of the Query component has brought forward the need for a
ApplicationProfile
interface to adapt variousSchema
. The initial implementation of these types is compatible with the customSchema
configuration mechanism in the JSpider plugins, but should ultimately change to customApplicationProfile
.Task progress: Refactored
AbstractRdfContentHandler
, concrete types and plugins to new interface.- Custom application profile support complete
- Custom Rule types
-
The standard rule set for JSpider includes a "parse only
text/html
" rule used for standard Web content. MKSearch will require other rules for RDF and RSS content types, and perhaps PDF and other document types.Task progress
- Draft
RdfContentTypeOnly
rule prepared for testing.
- Draft
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html