Skip Navigation

Spiders

J-Spider

JoBo

Arachnid

Spindle

Acme Spider

Metis

Heritrix

HouseSpider

WebLech

Excluded spiders

Link mappers

Content parsers

RDF Crawlers

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Metis

Metis is oriented to Open Source Security Testing and is released under the GPL licence, but its project home page says it will be changed to the BSD license.

  • The faust.sacha.web.util.FlashFileInfoGetter class depends on the com.iv.flash.api, com.iv.flash.api.action, com.iv.flash.api.button com.iv.flash.parser and com.iv.flash.util packages from the JGenerator project, released under the Apache Software License version 2.0. Furthermore, JGenerator depends on the com.sun.image.codec.jpeg package from the Sun, not part of the Java 2 Platform.
  • Many packages depend on the org.apache.commons.httpclient package, released under the Apache Software License. (The JGenerator package also depends on the Apache Xerces XML parser and Xalan XSLT processor.)

Metis also has dependencies on standard Java extensions, which should be compatible with GNU versions:

  • The org.ideahamster.metis.Metis class depends on the gnu.getopt.Getopt package.
  • Various packages depend on the javax.swing.text, javax.swing.text.html and javax.swing.text.html.parser packages, which may not be fully implemented in the GNU Classpath library.
  • Various packages depend on the javax.xml.parsers and javax.xml.transform packages.
  • Three classes in the faust.sacha.web.bot.spider.customlogin package depend on the org.w3c.dom and org.xml.sax package.

Initial review notes

Metis is much more extensive than the basic spidering tools Arachnid, Spindle and HouseSpider, it includes facilities for recovering some data from Flash movies and authenticating requests (through the Apache HTTP client). Responsibilities are delegated to a range of supporting classes.

Regrettably there is very little API documentation in the source, which makes the system quite difficult to deduce, and the level of integration with the Apache components appears to be quite close.

Although Metis is more functionally sophisticated than other spidering tools, it would be quite difficult to extract it from its dependence on the Apache packages and develop a working tool. Unless other candidates prove un-workable, Metis is not recommended for the MKSearch project, though it would probably be preferable to the basic spidering tools.

<< | Up | >>

This document was last modified by Philip Shaw on 2004-11-04 08:39:43
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html