Skip Navigation

Howto

Build MKSearch with GCJ

Run the MKSearch indexer

Compilation with GCJ

Earlier MKSearch installation

Jpackage on FC3

Jpackage with Sun Java on FC3

Using MKSearch source releases

Database storage configuration

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Run the MKSearch indexer

The MKSearch system has been designed to work with the GNU Compiler for Java (GCJ). These notes explain how to index Web content with two of the default configuration sets provided with the project.

Environment settings

The MKSearch build and execution scripts use variable substitution to run from an arbitrary installation directory. It is assumed that the MKSearch source is installed in a single base directory and reflects the original structure of the project in the Subversion repository.

Before running the scripts, four environment variables must be set, see the instructions below.

GNU/Linux environment settings

You can set these properties in your .bash_profile script for instance:

 export mk_build=/home/mksearch/build
 export mk_home=/home/mksearch
 export CLASSPATH=/usr/share/java/libgcj-3.4.1.jar
  • Substitute the actual path to your MKSearch installation for the mk_home variable.

  • The path for the temporary build directory may be outside the MKSearch home path.

  • Include the actual path of your core Java class repository in the CLASSPATH variable.

Exit your current session and log in again to apply the changes. To check the settings have been applied, use the env command piped through less:

$ env | less

Use the down key to scroll down. You should see two lines that look like this:

mk_build=/home/mksearch/build
mk_home=/home/mksearch
CLASSPATH=/usr/share/java/libgcj-3.4.1.jar

Press Q to exit less.

Java compatibility

The example commands below assume you have installed the JPackage compatibility package for GCJ and can call the java command as if the Sun JVM was installed. If not, an equivalent set of scripts are available in the $mk_home/bin directory with a gij- prefix.

N-Triple index example

The MKSearch project includes a static test site that is used to check the correct operation of the indexer. For simplicity, the "triple" configuration indexes a set of Web pages and generates an N-Triple output file for each on the local file system. The example below runs the MKSearch indexer on the test site using the triple configuration.

  $mk_home/bin/java-jspider.sh http://test.mksearch.mkdoc.org/ triple

The output from this run will generate a new directory structure at: $mk_home/output/org.mkdoc.mksearch.test.

Sesame RDF repository example

After basic operation of the indexer has been confirmed using the N-Triple configuration, you can run the "rdfstore" configuration to build a file-based Sesame repository of the test site metadata. The Sesame repository is stored as a single XML-serialised RDF file on the file system.

$mk_home/bin/java-jspider.sh http://test.mksearch.mkdoc.org/ rdfstore

This run will generate a single XML/RDF file at $mk_home/output/com.mkdoc.jspider.XhtmlStoreWriterPlugin.rdf

Indexing performance

At the time of writing, the MKSearch static test site had 210 test pages and these index configurations were set to use a single thread with a throttle of 500 milliseconds between requests. Performance varies depending on other applications that may be running and general network traffic between 5 and 20 minutes for a complete run.

<< | Up | >>

This document was last modified by Philip Shaw on 2005-03-30 08:07:30
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html