Skip Navigation

Spiders

J-Spider

JoBo

Arachnid

Spindle

Acme Spider

Metis

Heritrix

HouseSpider

WebLech

Excluded spiders

Link mappers

Content parsers

RDF Crawlers

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

WebLech

WebLech is released under the Open Source MIT Licence. Their release version indicates a very early stage of development, the package only has 12 classes.

  • Several classes depend on the org.apache.log4j package, released under the Apache Software License version 2.0.

Initial review notes

WebLech uses a similar scheme to Arachnid and Spindle, a Runnable spider class and supporting HTML parser. The parser uses an iterative tag matching scheme to find attributes with URL content, rather than trying to capture the overall structure of the document. The status of a longer trawl can be saved and restored from "checkpoint" files.

The WebLech spider can be configured using a plain text file and is multi-threaded by default. The Apache logging component is integrated with most of the key classes and would have to be substituted with an interface to a dynamically loaded logging system.

The relatively early development stage of this project and its MIT licence make it a less attractive prospect for the MKSearch project.

<< | Up | >>

This document was last modified by Philip Shaw on 2004-11-04 08:05:40
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html