Heretrix
Heretrix is the Internet Archive's Web crawler, which is released under the GPL licence, but depends on many other packages whose licence terms may not be compatible. In many cases, the dependency is to provide handling for specific content types and may not be critical for (X)HTML-only retreival.
- Classes in the
org.archive.crawler.extractor
package depend on packages incom.anotherbigidea.flash.*
. These Flash parsing packages are released under a BSD License that is compatible with the GPL, see the JavaSWF2-BSD License. - Classes in the
org.archive.crawler.extractor
package depend on packages incom.lowagie.text.pdf.*
. This iText PDF parsing package is released under the Library General Public License, see the master version, and the Mozilla Public License. - One class,
org.archive.util.GateSync
, depends on the classEDU.oswego.cs.dl.util.concurrent.Sync
. The package as a whole is released to the public domain, but theCopyOnWriteArrayList
andConcurrentReaderHashMap
classes are released under a special licence from Sun Microsystems, see the TECHNOLOGY LICENSE FROM SUN MICROSYSTEMS, INC. TO DOUG LEA (PDF). Dependency on these classes has not been established. - Various classes depend on Apache packages released under the Apache Software License version 2.0:
- The Command Line Interface (CLI) package,
org.apache.commons.cli
. - The Commons Collections package,
org.apache.commons.collections
. - The Commons HTTP Client package,
org.apache.commons.httpclient
. - The Commons Logging package,
org.apache.commons.logging
. - The Commons Net package,
org.apache.commons.net
. - The Commons Pool package,
org.apache.commons.pool
. - The Jakarta POI package,
org.apache.poi.hdf.extractor
.
- The Command Line Interface (CLI) package,
- Classes in the
org.archive.crawler
package depend on classes in theorg.mortbay.http
andorg.mortbay.jetty
packages. These packages are released under the Apache Software License version 2.0 with special restrictions. - Classes in various packages depend on the Java DNS package,
org.xbill.DNS
, released under the BSD License. - JUnit tests depend on the
junit.extensions
andjunit.framework
packages, see secondary dependencies on JUnit below. - Classes in
org.archive.util
andorg.archive.datamodel
depend on classes in thest.ata.util
package, which does not appear to be maintained except by the Heretrix project. The source code contains no licence information nor copyright statement.
Heretrix also has dependencies on standard Java extensions that may not be fully implemented by GNU Classpath extensions:
- Classes in many packages depend on the
javax.management
package. - Classes in several packages depend on the
javax.net
andjavax.net.ssl
packages. - Classes in the
org.archive.crawler
package depend on thejavax.xml.parsers
andjavax.xml.transform
packages, which should be compatible with GNU JAXP. - Classes in the
org.archive.settings
package depend on classes inorg.xml.sax
, which should be compatible with GNU JAXP.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html