JoBo
JoBo is described as "free software" and the project page on SourceForge says it is GPL, but there are no explicit licence terms in the project source code.
- Several classes depend on the Apache Commons Logging component,
org.apache.log4j.*
, released under the Apache Software License. - Two classes depend on the Apache Regular Expressions package,
org.apache.regexp
, released under the Apache Software License. - Two classes depend on the Castor XML framework packages,
org.exolab.castor.mapping
andorg.exolab.castor.xml
, which is released under a "BSD-like" licence, see the master licence. - Several classes depend on the
org.w3c.dom
package, which is released under the W3C® Software Notice and License. - Two classes depend on the JTidy package,
org.w3c.tidy
, which is released under the W3C® Software Notice and License.
- Various classes depend on the packages
javax.swing
andjavax.swing.table
, which may not be implemented by GNU Classpath.
Initial review notes
JoBo looks a strong candidate because there is not a strong coupling between the Apache classes and the core code. One issue would be removing the logging dependencies from the WebRobot
, FormFiller
, HtmlDocument
and HttpTool
classes. Logging could be handled dynamically through an interface adapter. Secondly, the Apache regular expression handling in the RegExpRule
and RegExpURLCheck
classes would have to be switched to use the GNU RegExp package.
One of the strengths of JoBo is that it is already part-integrated with JTidy and has an HttpDocManager
interface for post-processing documents. The default document interface saves all content to individual files.
JoBo appears to have advanced support for HTTP methods including cookies and form handling, and respects the robots exclusion protocol. The rate of spidering can also be throttled to moderate the load on the origin servers.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html