Validator

Component description

The MKSearch validator component is responsible for ensuring that source documents are well-formed, valid XML documents; it therefore converts HTML documents to XHTML on the fly. The validator is largely composed of JTidy, which checks and corrects common HTML markup errors, and a validating XML parser.

Beta 2 development plans

Integrated exception handling
The beta version of MKSearch handles validation problems with source documents at the XML parsing stage by throwing a SAXException when it is parsed and the document is not indexed. The beta 2 validator will have to report such problems to the checker component so that the repository can be purged of any existing records for the problem document. The next release of JTidy is expected to implement a MessageListener interface that can be used to monitor the parse, see below.
Upgrade to release r8 of JTidy
A significant number of bugs have been reported against the current version r7 release of JTidy, many of which are expected to be corrected in the next release. No issues are known to affect MKSearch, but system tests have been relatively limited to date and it would be better to work with a cleaner version.

Document Links

JTidy
The JTidy project home page on SourceForge
http://jtidy.sourceforge.net/
This document was last modified on 2005-08-04 08:34:04.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html