com.reverseXSL.message
Class Data

java.lang.Object
  extended by com.reverseXSL.message.Data

public class Data
extends java.lang.Object

Wraps byte-oriented collections or structures into an object enriched with numerous methods capable of normalizing interchange data.

The original data piece (e.g. a ByteBuffer) is just wrapped, and not cloned nor copied. Therefore any later change to the argument data piece may geopardize operations.

Raw data received via some communication channel or read from various media is often affected by additional control characters, spurious record delimiters or other 'pollution' of the canonical formats required for fully automated processing. This Data class is used to wrap such raw data and yield clean, de-polluted, streams of bytes or characters for message processing.

NOTE: full use of this class is for future functional extensions of the reverseXSL software.


Field Summary
static int _1NewLineAtEnd
          arg for getConvertedData(int) : suppress trailing empty lines and ensure that the very last data line bears a single line terminator.
static int _ASCII7bits
          arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds.
static int _NoBlankLine
          arg for getConvertedData(int) : suppress blank lines everywhere in the original data.
static int _NoCRLFBytes
          arg for getConvertedData(int) : suppress all CR's and LF's.
static int _NoCtrlBytes
          arg for getConvertedData(int) : control BYTES (value<32) are discarded except for tabs, carriage returns and line feeds.
static int _NoCtrlChars
          arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds.
static int _NONE
          arg for getConvertedData(int) : case of no conversion requested.
static int _ToCRLF
          arg for getConvertedData(int) : convert standalone LF's to CRLF's, and preserve existing CRLF's.
static int _ToLF
          arg for getConvertedData(int) : remove CR's.
static int _ToUPPER
          arg for getConvertedData(int) : convert all characters to their uppercase equivalents (based on built-in java String methods).
static int _TrimNBSP
          arg for getConvertedData(int) : Trim Non-Breaking SPaces (i.e.
static int _UnfoldPSCRMRemarks
          arg for getConvertedData(int) : IATA PSCRM messages still generated by older systems can enforce the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks elements, e.g.
 
Constructor Summary
Data(byte[] ba)
          Instantiate a Data object from a byte array, assuming UTF-8 as charset for character oriented operations on this data.
Data(byte[] ba, java.nio.charset.Charset cs)
          Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.
Data(java.nio.ByteBuffer bb)
          Instantiate a Data object from a byte buffer, assuming UTF-8 as charset for character oriented operations on this data.
Data(java.nio.ByteBuffer bb, java.nio.charset.Charset cs)
          Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.
Data(java.io.InputStream inS, java.nio.charset.Charset cs)
          Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.
 
Method Summary
 byte[] getArray()
          Get the backing byte array.
 byte[] getBytes()
           
 java.lang.StringBuffer getConvertedData(int conversions)
          Converting the Data bytes to Characters while at the same time filtering and normalizing data.
 DataFormat getFormat()
          Get the data format type.
 DataFormat identify()
          Inspect data and set the data format type.
static DataFormat identify(java.lang.String msg)
          Attempts an identification of the data format based on a short string.
 int length()
          get the actual data length.
static int tokenValue(java.lang.String opt)
          utility method to convert named conversion tokens into the corresponding conversion token value.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_1NewLineAtEnd

public static final int _1NewLineAtEnd
arg for getConvertedData(int) : suppress trailing empty lines and ensure that the very last data line bears a single line terminator. The line terminator is either CR or CRLF according to other conversions or the existing data contents (by default).

See Also:
Constant Field Values

_ASCII7bits

public static final int _ASCII7bits
arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds. Character values above 127 are replaced by a '?'.

Note that this method operates on characters, not bytes, and thus also properly replaces all multibyte characters (whose Unicode values are always >127) with a single '?'.

See Also:
Constant Field Values

_NoBlankLine

public static final int _NoBlankLine
arg for getConvertedData(int) : suppress blank lines everywhere in the original data. Precisely, both the true empty lines and those containing only spaces or tabs are removed.

This is a byte-oriented method, applied before decoding bytes into characters!

See Also:
Constant Field Values

_NoCRLFBytes

public static final int _NoCRLFBytes
arg for getConvertedData(int) : suppress all CR's and LF's. In other words, consider the original data as a very long line.

This is a byte-oriented method, applied before decoding bytes into characters!

It is most useful when added to _NoCtrlBytes in which case only tab characters (with a value <32) are preserved.

See Also:
Constant Field Values

_NoCtrlBytes

public static final int _NoCtrlBytes
arg for getConvertedData(int) : control BYTES (value<32) are discarded except for tabs, carriage returns and line feeds. 8-bit values (above 127) are preserved.

This is a byte-oriented method, applied before decoding bytes into characters!

Compared with _NoCtrlChars, the supporting function operates on bytes and not characters, and thus may discard bytes actually belonging to multibyte character encodings, thus scrambling the original data!.

However,

The function is peculiarly useful whenever legacy single-byte character codings are expected (e.g. ISO-8859) and must be de-polluted.

See Also:
Constant Field Values

_NoCtrlChars

public static final int _NoCtrlChars
arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds. All other character values are preserved.

Compared with _NoCtrlBytes, the supporting function preserves character values that would be encoded as 8bit values in ISO-8859, and all multibyte characters in UTF-16 Unicode Transformation Formats.

See Also:
Constant Field Values

_NONE

public static final int _NONE
arg for getConvertedData(int) : case of no conversion requested.

See Also:
Constant Field Values

_ToCRLF

public static final int _ToCRLF
arg for getConvertedData(int) : convert standalone LF's to CRLF's, and preserve existing CRLF's.

See Also:
Constant Field Values

_ToLF

public static final int _ToLF
arg for getConvertedData(int) : remove CR's.

See Also:
Constant Field Values

_ToUPPER

public static final int _ToUPPER
arg for getConvertedData(int) : convert all characters to their uppercase equivalents (based on built-in java String methods).

See Also:
Constant Field Values

_TrimNBSP

public static final int _TrimNBSP
arg for getConvertedData(int) : Trim Non-Breaking SPaces (i.e. space chars and tabs) at the beginning and end of each line in the message body.

Note that if you combine this trim operation with _NoCRLFBytes you will only trim NBSP leading and trailing the entire data because _NoCRLFBytes transforms the whole data into a single big line first!

See Also:
Constant Field Values

_UnfoldPSCRMRemarks

public static final int _UnfoldPSCRMRemarks
arg for getConvertedData(int) : IATA PSCRM messages still generated by older systems can enforce the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks elements, e.g.
1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAG 05KG-1BIMMEL/LMRS
becomes:
1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAG
.RN/05KG-1BIMMEL/LMRS
thus breaking the .R/CKIN check-in luggage segment that normally runs to the end of line. This flag restores the canonical long line.

See Also:
Constant Field Values
Constructor Detail

Data

public Data(byte[] ba)
Instantiate a Data object from a byte array, assuming UTF-8 as charset for character oriented operations on this data.

Parameters:
ba - the byte array wrapped as Data, which is NOT copied (later changes to the byte array may adversely affect this Data)

Data

public Data(byte[] ba,
            java.nio.charset.Charset cs)
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:
ba - the byte array wrapped as Data, which is NOT copied (later changes to the byte array may adversely affect this Data)
cs - if null defaults back to UTF-8

Data

public Data(java.nio.ByteBuffer bb)
Instantiate a Data object from a byte buffer, assuming UTF-8 as charset for character oriented operations on this data.

Parameters:
bb - the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the ByteBuffer may adversely this Data)

Data

public Data(java.nio.ByteBuffer bb,
            java.nio.charset.Charset cs)
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:
bb - the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the ByteBuffer may adversely this Data)
cs - if null defaults back to UTF-8

Data

public Data(java.io.InputStream inS,
            java.nio.charset.Charset cs)
     throws java.io.IOException
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:
inS - the source of byte-oriented data
cs - if null defaults back to UTF-8
Throws:
java.io.IOException - as would result from read errors from the argument input stream
Method Detail

getArray

public byte[] getArray()
Get the backing byte array. Note that its size is often greater than the actual data.

Returns:
backing array of bytes.

getBytes

public byte[] getBytes()

getConvertedData

public java.lang.StringBuffer getConvertedData(int conversions)
Converting the Data bytes to Characters while at the same time filtering and normalizing data.

The character set specified at instantiation (or default UTF-8) is used to interpret bytes into characters.

Parameters:
conversions - either the value _NONE, else the addition of one or more of the constants _ToCRLF, _ToLF, _1NewLineAtEnd, _ToUPPER, _ASCII7bits, _TrimNBSP _NoCtrlBytes, _NoCRLFBytes, _NoBlankLine, _NoCtrlChars.
Returns:
string buffer

getFormat

public DataFormat getFormat()
Get the data format type.

Returns:
one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

identify

public DataFormat identify()
Inspect data and set the data format type.

Returns:
one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

identify

public static DataFormat identify(java.lang.String msg)
Attempts an identification of the data format based on a short string. Only the first 100 characters are actually inspected.

Parameters:
msg - typically, a short string
Returns:
one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

length

public int length()
get the actual data length.

Returns:
lentgh in bytes, as integer, i.e. up to 4Gbytes

tokenValue

public static final int tokenValue(java.lang.String opt)
utility method to convert named conversion tokens into the corresponding conversion token value.

Parameters:
opt - the named value
Returns:
the matching integer value, -1 if not found