WebLech

WebLech is released under the Open Source MIT Licence. Their release version indicates a very early stage of development, the package only has 12 classes.

Initial review notes

WebLech uses a similar scheme to Arachnid and Spindle, a Runnable spider class and supporting HTML parser. The parser uses an iterative tag matching scheme to find attributes with URL content, rather than trying to capture the overall structure of the document. The status of a longer trawl can be saved and restored from "checkpoint" files.

The WebLech spider can be configured using a plain text file and is multi-threaded by default. The Apache logging component is integrated with most of the key classes and would have to be substituted with an interface to a dynamically loaded logging system.

The relatively early development stage of this project and its MIT licence make it a less attractive prospect for the MKSearch project.

This document was last modified on 2004-11-04 08:05:40.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html