XML Indexer
Highlights
XMLIndexer is a command line based java tools. Some highlights:
- Java Based: Platform independent
- Creates Index as XML file from arbitrary XML files.
- Batch conversion of multiple files possible.
- Format of "Output" index is configurable using XSLT.
- Exclusion files can be defined holding words that may not appear in the index.
Installation
Binary Distribution
Installation is simple: unzip the archive in an arbitrary directory. The jar files contains all necessary classes including the xml libraries. A Java 2 Virtual machine is required, so please install an appropriate version e.g. J2RE from Sun Microsystems.
Start xmlindexer with this command in the same directory where you unpacked the xmlindexer.jar file.
java -cp xmlindexer.jar info.schatten.xmlindexer.XmlIndexer config-filename.xml
Source Distribution
The source distribution contains all Java sources and JBuilder project files as well as Apache Ant build files. So you are flexible in what tools to use for development. Unzip the archive into an arbitrary folder.
The source file distribution does not contain the xml libraries so you have to download and install the JDOM libraries too.
Downloads
XMLIndexer is available in two forms:
How to Modify Config File
Steps...
To build indices you have to write a config xml file defining all parameters for index generation. The generation is performed in two steps:
- XMLIndexer builds the index (in memory) and generates an XML representation of the result
- This result can be written into an xml file (standard output) or can be modified with an xsl stylesheet
Config File: Main Tags
The root tag is XMLINDEXER so the config file has to start with this tag and end with the closing tag. Inside this root tag there are three main tags:
| ENCODING | Define the xml encoding of the output index here. |
| EXCLUSION | This tag holds (optional) filenames of exclusion files. These files are simple textfiles holding words (one per line) with words to exclude from index. Two files are included in distribution: "deutsch.txt" and "english.txt": these hold a set of german and english words I use for exclusion. |
| INDICES | This is the main section: inside this section you define the indices to be generated. |
Config File: Define Index Generation: inside INDICES
Inside the INDICES tag you write the definition of all indices to be generated: Each Index Definition is inside the Index tag. In this Index tag you can define the following parameters:
| xmlSourceFilename | Filename of the XML Sourcefile: From this file the index is generated. |
| indexTag | This is the tag in the xmlSourceFile that is indexed: Text inside this tag and also inside child tags is indexed. |
| referenceTag | This is also a tag in the xmlSourceFile: This tag is must be a parent tag of indexTag. One attribute of this tag is used as reference for the index words generated in indexTag. |
| referenceAttribute | This is an attribute from the referenceTag. It is used as reference for the index words. |
| xmlOutputFilename | This is the filename of the index - xml output file. |
| xslOutputFilename | All tags above are required, this tag is optional. If this field is left empty, a default output is generated. If you enter the filename of an xsl file here: the default output is transformed into an arbitrary output using xalan and your xslt script. |
To principles are easy: please take a look at the example testindex.xml file and if you are interested also into the xsl directory: there you can find an xslt example how to customize output. Simply try it!
Contact
If you have bug reports, suggestions or make significant further developments with XMLIndexer, please send me an Email.
|