Arachnid

Arachnid is released under the GPL, it appears to be a tool for mapping a link structure. It is a small package of 8 classes.

Initial review notes

Arachnid is of a similar nature to Spindle, but its structure is more modular and extensive. The package has a layered scheme for creating spiders by extending the abstract Arachnid base class. Subclasses implement template methods for handling bad links, IO exceptions, unrecognised links and external links.

Arachnid is not multi-threaded, but the Arachnid base class can be used to create threaded applications. It uses an HTML tokenizer that appears slightly more sophisticated than that used with Spindle, but may not handle all cases of invalid markup. The PageInfo class uses a WebPageXtractor to get document content.

There are no problematic package dependencies.

Limitations

Arachnid does allow a fixed period delay between URL requests, but does not support the robot exclusion protocol.

This document was last modified on 2004-11-04 03:14:48.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html