Checker
Component description
The MKSearch checker component is essentially an integration layer between the data acquisition system and the repository, and between the query component and the repository. The checker component does not exist in an independent capacity in the alpha version of MKSearch.
Development plans
- Repository management interfaces
- A wrapper layer is required around the repository to dynamically maintain the contents to permit incremental indexing. The wrapper will need to handle exception messages from the crawler, validator and indexer components to purge invalid records from the repository. It will also need to add RDF statements generated by the indexer and ultimately purge stale entries from the result cache.
- Check un-linked documents
- At present, the crawler component pushes the whole data aquisition process by following published hyperlinks and creates a new repository for each session. However, with an incremental indexing scheme, previously indexed documents may be removed between sessions and un-linked. In this case, the crawler will not discover resources are obsolete. The checker component therefore needs periodically to check whether "old" source documents still exist.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html