The reverseXSL software is written in the Java language and provides both a command-line interface and an Application Programming Interface (API).
It is articulated around two packages:
- A parser package, containing an original Parser component capable of transforming any character-based structure into XML;
- A transformer package, with the usual TransformerFactory and Transformer classes, that combine the functions in JAXP (Java API for XML Processing, notably containing XSLT) with those from the reverseXSL Parser such as to extend the transformation capabilities natively found in Java runtimes.
As illustrated on the figure, the java.util.regex and javax.xml.transform packages natively contained in the Java Runtime Environment will be much solicited.
The command line programs are:
- RegexCheck, a small application with a graphical interface (Swing) for the interactive design and test of regular expressions
- Parse, a command line interface over the reverseXSL Parser API
- Transform, a command line interface over the reverseXSL Transformer API
The main flow of execution through the reverseXSL Transformer class is illustrated in the second figure. It is self explanatory.
The TransformerFactory provides means to enforce a specific transformation profile comprising none, one or both of a Parsing step, followed by an XSL Transformation step. But by default, the TransformerFactory will load a Mapping Decision Table and dynamically match input messages against an adequate transformation profile.
1. A TransformerFactory is first used to set transformation parameters and indicate the source of meta-data. The classloader will be sollicited by default for loading transformation meta-data. The TransformerFactory is then used to instantiate one (or several) Transformer (in parallel threads).
2. The Transformer is invoked with an input byte stream, and returns transformed results as an output byte stream along with a count of parsing errors. Exceptions are thrown in case of major transformation failures.
Parser features
In fact, the Parser is capable of tolerance to syntaxical errors, as well as skipping bad data in the input message. The thresholds of tolerance are set via Parsing parameters related to Warning/Fatal and Throw/Record exception handling codes in message DEFinition files. In case thresholds (which could also be set to zero) are exceeded, the Parser, and then the Transformer, throws an exception.
On the other hand, the XSL Transformation step based on standard JAXP libraries does not offer the same tolerance mechanisms and the first XSL execution problem will immediately yield a Transformer exception.
The functional parsing power of the reverseXSL Parser matches that of regular expressions which are used in succession to conduct four different tasks:
(i) 'identify' structures and/or data
(s) 'segment' the message segments/structures into sub-segments/structures. This task is enriched with numerous built-in functions that greatly simplify and speed up the most common data cutting schemes (record delimiters, new lines, repeated separator character, fixed-size records, etc.)
(e) 'extract' values, thus separating true data from the message syntax.
(v) 'validate' the syntax of data values themselves (alpha, digits, structured...). This task is too enriched with numerous built-in functions that simplify the usual data validation against character sets.
Additional tips about Parser operations can be found within the article "The end of XML-to-anything asymmetry" and of course in the javadoc and Software Manuals.
As explained in the overview of Message DEFinition files, the Parser makes use of only five elements (Groups, Segments, Data, Marks and Conditions) to describe in detail the entire syntax of an arbitrary message format.
The Parser can check all forms of interdependencies:
- the presence of structures dependent from values
- structures dependent from other structures
- values dependent from other values
The Parser can expand or collapse nesting levels with regard to the original message structure. XML element tag names can be arbitrarily choosen for every element. Elements are easily promoted to become attribute nodes instead of child nodes by assigning a name starting with '@'.
The Parser can be instructed to ignore any part of the source message and pass it over as RAW Character-Data elements, or suppress it entirely from the XML output. The latter notably permits the incremental development of new Parsing DEFinitions.
The Parser always verifies the 'standard' min/max counts of repeated structures (and can record errors accordingly) and at the same time can be instructed to overload the 'official' values with variant optional/mandatory constraints. It can also accept more repetitions than formally specified. These features support behaviors like: "we processed your 1563 records successfully but actually the limit is at 1000!", "we got your order right but miss a delivery point address".
The Parser can too evaluate conditions based on the presence of structures or of certain values in the message, and generate original XML data elements that explicitly mark the outcome (these are MARKs!). This mechanism is useful in transforming structural information to values, translating coded data element values, and providing explicit default values on the XML side.
However, the Parser does not verify numerical ranges (e.g. ID > 5 and ID < 50), does not compute or verify totals (e.g. checking a sum of quantities) does not validate date values (e.g. 30/2/2009) but can validate the format, and does not reorder data elements in the message. All these tasks are very easily implemented in the XSL transformation step that can automatically follow the parsing step in proper, according to the transformation profile selected in the Mapping Decision Table.