Build notes for FC4
This document is a series of working notes about building MKSearch on Fedora Core 4, using the GNU Compiler for Java (GCJ) version 4.0. It may be removed or superseded.
Introduction
The MKSearch system was developed using GCJ versions 3.3.3 and 3.3.4, which did not include the Java API for XML Processing (JAXP). A pure-Java sub-set of the GNU JAXP classes were imported and built separately as part of the (free) build scheme for the project.
GCJ 4.0 now includes a later version of GNU JAXP, which raises some warnings and abstract class compilation failures for MKSearch. These notes record the main changes, so that a workaround can be created to work with Fedora Core 4.0 and also be backward-compatible with FC 3 and other platforms.
Classpath bugs
The version of Classpath included in GCJ 4.0 for Fedora Core 4 has a number of bugs associated with the the HttpURLConnection
class that affect operation of JSpider crawler module. The issues and the workarounds are noted below for reference.
HTTP header field key
When the header index passed to the getHeaderFieldKey
method is out of range, it throws a NoSuchElementException
instead of null
. This problem affects the JSpider classes HttpHeaderUtil
and CookieUtil
, which have been amended to work around the error.
See the GCJ bugzilla entry for the HTTP header field key bug.
This fix also brought to light an HTTP header map bug, which includes the request protocol and status code in the mapping.
Content-Encoding: gzip
On sites that use gzip encoding, the input stream obtained from the HttpURLConnection
is truncated before the content is complete. This problem affects the JSpider classes FetchRobotsTXTTaskImpl
and SpiderHttpURLTask
, which have been set to pass a Accept-Encoding
header that precludes any alternative encoding.
See the GCJ Bugzilla entry for the gzip encoding bug.
It is not known whether this bug also affects deflate
streams, but it may be possible to permit this form of compression if it can be tested.
JTidy compilation errors
TJidy and GNU JAXP compatibility
The latest JTidy CVS version (see below) is not compatible with the snapshot of GNU JAXP used for the MKSearch build. The project is switching to the GNU JAXP 1.3 release, which should solve the problem.
Abstract class compilation errors
Critical compilation problems for JTidy. JTidy does not implement the current version of the W3C Document Object Model (DOM).
Two main options exist; to force GCJ to use the earlier JAXP implementation, or to upgrade JTidy's JAXP implementation. Initially, a CVS snapshot of JTidy has been taken to check JAXP compatibility, (26 August 2005).
-
org.w3c.tidy.DOMTextImpl
-
This class is declared to implement the interface
org.w3c.dom.Text
, but several (new) methods are not implemented:-
isElementContentWhitespace()
-
getWholeText()
-
replaceWholeText(java.lang.String)
-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMProcessingInstructionImpl
-
This class is declared to implement the interface
org.w3c.dom.ProcessingInstruction
, but several (new) methods are not implemented:-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMNodeImpl
-
This class is declared to implement the interface
org.w3c.dom.Node
, but several (new) methods are not implemented:-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMElementImpl
-
This class is declared to implement the interface
org.w3c.dom.Element
, but several (new) methods are not implemented:-
getSchemaTypeInfo()
-
setIdAttribute(java.lang.String,boolean)
-
setIdAttributeNS(java.lang.String,java.lang.String,boolean)
-
setIdAttributeNode(org.w3c.dom.Attr,boolean)
-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMDocumentTypeImpl
-
This class is declared to implement the interface
org.w3c.dom.DocumentType
, but several (new) methods are not implemented:-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
getInputEncoding()
-
getXmlEncoding()
-
getXmlStandalone()
-
setXmlStandalone(boolean)
-
getXmlVersion()
-
setXmlVersion(java.lang.String)
-
getStrictErrorChecking()
-
setStrictErrorChecking(boolean)
-
getDocumentURI()
-
setDocumentURI(java.lang.String)
-
adoptNode(org.w3c.dom.Node)
-
getDomConfig()
-
normalizeDocument()
-
renameNode(org.w3c.dom.Node,java.lang.String,java.lang.String)
-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMCommentImpl
-
This class is declared to implement the interface
org.w3c.dom.Comment
, but several (new) methods are not implemented:-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMCharacterDataImpl
-
This class is declared to implement the interface
org.w3c.dom.CharacterData
, but several (new) methods are not implemented:-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMCDATASectionImpl
-
This class is declared to implement the interface
org.w3c.dom.CDATASection
, but several (new) methods are not implemented:-
isElementContentWhitespace()
-
getWholeText()
-
replaceWholeText(java.lang.String)
-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
-
org.w3c.tidy.DOMAttrImpl
-
This class is declared to implement the interface
org.w3c.dom.Attr
, but several (new) methods are not implemented:-
getSchemaTypeInfo()
-
isId()
-
getBaseURI()
-
compareDocumentPosition(org.w3c.dom.Node)
-
getTextContent()
-
setTextContent(java.lang.String)
-
isSameNode(org.w3c.dom.Node)
-
lookupPrefix(java.lang.String)
-
isDefaultNamespace(java.lang.String)
-
lookupNamespaceURI(java.lang.String)
-
isEqualNode(org.w3c.dom.Node)
-
getFeature(java.lang.String,java.lang.String)
-
setUserData(java.lang.String,java.lang.Object,org.w3c.dom.UserDataHandler)
-
getUserData(java.lang.String)
-
JTidy source encoding
The org.w3c.tidy.Lexer
class in the latest JTidy CVS source contains bytes that are not in the UTF-8 encoding. The GCJ compiler must be called with explicit Latin 1 encoding. See $mk_home/bin/compile-jtidy.sh
.
Deprecation warnings
Non-critical errors compiling the earlier version of GNU JAXP.
-
javax.xml.parsers.SAXParser
-
Multiple warnings that the following classes are deprecated:
-
org.xml.sax.HandlerBase
-
org.xml.sax.Parser
-
org.xml.sax.AttributeList
-
org.xml.sax.DocumentHandler
-
-
gnu.xml.aelfred2.JAXPFactory
-
Single warning that
org.xml.sax.Parser
has been deprecated. -
gnu.xml.aelfred2.SAXDriver
-
Single warning that
org.xml.sax.AttributeList
has been deprecated.
Document Links
- HTTP header field key bug
-
The GCJ Bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24084
- HTTP header map bug
-
The GCJ bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24104
- gzip encoding bug
-
The GCJ Bugzilla entry for this bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24124
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html