BPLEX - An Algorithm for Tree Compression



BPLEX
is an algorithm for generating small pointer representations for (ordered, ranked) trees.
The idea of the algorithm is to detect repetitions of tree patterns (connected subgraphs of a tree)
and to replace them by appropriate pointers to one single instance of the pattern.
In this way a smaller representation of the tree is obtained.

BPLEX can generate a small memory representation of the tree structure of an XML document.
Here you find an implementation of BPLEX which was intended to test the performance of
BPLEX on tree structures of XML documents.

The implementation is in a preliminary state. It is written in C, using the expat XML parser.
Presently it supports the following tasks:

Load an XML document and represent its tree structure as

The program generates one of the above structures, presents information about its size,
and then writes the structure to disk in two different formats (one human-readable
text file, and one binary file for further processing).

Here is a technical report about BPLEX, and here a shorter, more recent version.



In the future we would like to use the BPLEX in order to get a complete picture on the
performance of XML memory mappings.