JoBo

JoBo is described as "free software" and the project page on SourceForge says it is GPL, but there are no explicit licence terms in the project source code.

Initial review notes

JoBo looks a strong candidate because there is not a strong coupling between the Apache classes and the core code. One issue would be removing the logging dependencies from the WebRobot, FormFiller, HtmlDocument and HttpTool classes. Logging could be handled dynamically through an interface adapter. Secondly, the Apache regular expression handling in the RegExpRule and RegExpURLCheck classes would have to be switched to use the GNU RegExp package.

One of the strengths of JoBo is that it is already part-integrated with JTidy and has an HttpDocManager interface for post-processing documents. The default document interface saves all content to individual files.

JoBo appears to have advanced support for HTTP methods including cookies and form handling, and respects the robots exclusion protocol. The rate of spidering can also be throttled to moderate the load on the origin servers.

This document was last modified on 2004-11-04 07:10:00.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html