Build notes for FC4

This document is a series of working notes about building MKSearch on Fedora Core 4, using the GNU Compiler for Java (GCJ) version 4.0. It may be removed or superseded.

Introduction

The MKSearch system was developed using GCJ versions 3.3.3 and 3.3.4, which did not include the Java API for XML Processing (JAXP). A pure-Java sub-set of the GNU JAXP classes were imported and built separately as part of the (free) build scheme for the project.

GCJ 4.0 now includes a later version of GNU JAXP, which raises some warnings and abstract class compilation failures for MKSearch. These notes record the main changes, so that a workaround can be created to work with Fedora Core 4.0 and also be backward-compatible with FC 3 and other platforms.

Classpath bugs

The version of Classpath included in GCJ 4.0 for Fedora Core 4 has a number of bugs associated with the the HttpURLConnection class that affect operation of JSpider crawler module. The issues and the workarounds are noted below for reference.

HTTP header field key

When the header index passed to the getHeaderFieldKey method is out of range, it throws a NoSuchElementException instead of null. This problem affects the JSpider classes HttpHeaderUtil and CookieUtil, which have been amended to work around the error.

See the GCJ bugzilla entry for the HTTP header field key bug.

This fix also brought to light an HTTP header map bug, which includes the request protocol and status code in the mapping.

Content-Encoding: gzip

On sites that use gzip encoding, the input stream obtained from the HttpURLConnection is truncated before the content is complete. This problem affects the JSpider classes FetchRobotsTXTTaskImpl and SpiderHttpURLTask, which have been set to pass a Accept-Encoding header that precludes any alternative encoding.

See the GCJ Bugzilla entry for the gzip encoding bug.

It is not known whether this bug also affects deflate streams, but it may be possible to permit this form of compression if it can be tested.

JTidy compilation errors

TJidy and GNU JAXP compatibility

The latest JTidy CVS version (see below) is not compatible with the snapshot of GNU JAXP used for the MKSearch build. The project is switching to the GNU JAXP 1.3 release, which should solve the problem.

Abstract class compilation errors

Critical compilation problems for JTidy. JTidy does not implement the current version of the W3C Document Object Model (DOM).

Two main options exist; to force GCJ to use the earlier JAXP implementation, or to upgrade JTidy's JAXP implementation. Initially, a CVS snapshot of JTidy has been taken to check JAXP compatibility, (26 August 2005).

org.w3c.tidy.DOMTextImpl

This class is declared to implement the interface org.w3c.dom.Text, but several (new) methods are not implemented:

  • isElementContentWhitespace()
  • getWholeText()
  • replaceWholeText(java.lang.String)
  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMProcessingInstructionImpl

This class is declared to implement the interface org.w3c.dom.ProcessingInstruction, but several (new) methods are not implemented:

  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMNodeImpl

This class is declared to implement the interface org.w3c.dom.Node, but several (new) methods are not implemented:

  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMElementImpl

This class is declared to implement the interface org.w3c.dom.Element, but several (new) methods are not implemented:

  • getSchemaTypeInfo()
  • setIdAttribute(java.lang.String,boolean)
  • setIdAttributeNS(java.lang.String,java.lang.String,boolean)
  • setIdAttributeNode(org.w3c.dom.Attr,boolean)
  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMDocumentTypeImpl

This class is declared to implement the interface org.w3c.dom.DocumentType, but several (new) methods are not implemented:

  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
  • getInputEncoding()
  • getXmlEncoding()
  • getXmlStandalone()
  • setXmlStandalone(boolean)
  • getXmlVersion()
  • setXmlVersion(java.lang.String)
  • getStrictErrorChecking()
  • setStrictErrorChecking(boolean)
  • getDocumentURI()
  • setDocumentURI(java.lang.String)
  • adoptNode(org.w3c.dom.Node)
  • getDomConfig()
  • normalizeDocument()
  • renameNode(org.w3c.dom.Node,java.lang.String,java.lang.String)
  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMCommentImpl

This class is declared to implement the interface org.w3c.dom.Comment, but several (new) methods are not implemented:

  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMCharacterDataImpl

This class is declared to implement the interface org.w3c.dom.CharacterData, but several (new) methods are not implemented:

  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMCDATASectionImpl

This class is declared to implement the interface org.w3c.dom.CDATASection, but several (new) methods are not implemented:

  • isElementContentWhitespace()
  • getWholeText()
  • replaceWholeText(java.lang.String)
  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)
org.w3c.tidy.DOMAttrImpl

This class is declared to implement the interface org.w3c.dom.Attr, but several (new) methods are not implemented:

  • getSchemaTypeInfo()
  • isId()
  • getBaseURI()
  • compareDocumentPosition(org.w3c.dom.Node)
  • getTextContent()
  • setTextContent(java.lang.String)
  • isSameNode(org.w3c.dom.Node)
  • lookupPrefix(java.lang.String)
  • isDefaultNamespace(java.lang.String)
  • lookupNamespaceURI(java.lang.String)
  • isEqualNode(org.w3c.dom.Node)
  • getFeature(java.lang.String,java.lang.String)
  • setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
  • getUserData(java.lang.String)

JTidy source encoding

The org.w3c.tidy.Lexer class in the latest JTidy CVS source contains bytes that are not in the UTF-8 encoding. The GCJ compiler must be called with explicit Latin 1 encoding. See $mk_home/bin/compile-jtidy.sh.

Deprecation warnings

Non-critical errors compiling the earlier version of GNU JAXP.

javax.xml.parsers.SAXParser

Multiple warnings that the following classes are deprecated:

  • org.xml.sax.HandlerBase
  • org.xml.sax.Parser
  • org.xml.sax.AttributeList
  • org.xml.sax.DocumentHandler
gnu.xml.aelfred2.JAXPFactory
Single warning that org.xml.sax.Parser has been deprecated.
gnu.xml.aelfred2.SAXDriver
Single warning that org.xml.sax.AttributeList has been deprecated.

Document Links

HTTP header field key bug
The GCJ Bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24084
HTTP header map bug
The GCJ bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24104
gzip encoding bug
The GCJ Bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24124
This document was last modified on 2005-09-29 09:21:13.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html