Initial review notes
Grunk turns out to be more of a content parser than a spidering application per se. It is a tool for analysing source data structures and applying appropriate parsing tools to the content.
Grunk uses layered sets of Importer, Scanner, Preprocessor components to identify an appropriate parsing scheme for a source then apply it. This makes the system quite large in terms of class numbers and biased towards plain text formats, rather than HTML or XML. Grunk seems to have a capacity for extremely large input source.