WebLech
WebLech is released under the Open Source MIT Licence. Their release version indicates a very early stage of development, the package only has 12 classes.
- Several classes depend on the org.apache.log4jpackage, released under the Apache Software License version 2.0.
Initial review notes
WebLech uses a similar scheme to Arachnid and Spindle, a Runnable spider class and supporting HTML parser. The parser uses an iterative tag matching scheme to find attributes with URL content, rather than trying to capture the overall structure of the document. The status of a longer trawl can be saved and restored from "checkpoint" files.
The WebLech spider can be configured using a plain text file and is multi-threaded by default. The Apache logging component is integrated with most of the key classes and would have to be substituted with an interface to a dynamically loaded logging system.
The relatively early development stage of this project and its MIT licence make it a less attractive prospect for the MKSearch project.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html