LICENSE
The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at "http://www.mozilla.org/MPL/"
Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.
The Original Code is "ParserUtils.pas".
The Initial Developer of the Original Code is Dieter Köhler (Heidelberg, Germany, "http://www.philo.de/"). Portions created by the Initial Developer are Copyright (C) 2003-2004 Dieter Köhler. All Rights Reserved.
Alternatively, the contents of this file may be used under the terms of the GNU General Public License Version 2 or later (the "GPL"), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the GPL, and not to allow others to use your version of this file under the terms of the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL. If you do not delete the provisions above, a recipient may use your version of this file under the terms of any one of the MPL or the GPL.
The TUtilsCustomReader and TUtilsCustomWriter classes are based on code written by Robert Marquardt.
The Parser Utilities Library contains general classes for parsing a byte stream. The latest version of this software is available at <http://www.philo.de/xml/>.
The Tree Utilities Library does not contain any components to be registered. So using it from inside your own projects is very simple: Add "ParserUtils" to the uses clause of your unit and make sure that the path to the location of the ParserUtils.pas file is included in Delphi's list of library paths. To include it go to the Library section of Delphi's Environment Options dialog (see the menu item: "Tools/Environment Options ...").
These strings are used for the error messages of exceptions.
The TUtilsUCS4CharData is a record structure used to store location information of a single character in a Unicode input stream. It is used by the TUtilsUCS4Reader class.
The byte index of the last byte of the character in the context of the stream.
The character index of the character in the context of the stream.
The code point of the character.
The line number of the character in the context of the stream.
The number of non-TAB characters in the line before the character (including the character itself) in the context of the stream.
The number of bytes used to encode the character.
The number of TAB characters (#$09) in the line before the character (including the character itself) in the context of the stream.
Use TUtilsCustomReader as a base class when defining a class for buffered input of stream data.
Returns the size of the buffer as specified in the constructor.
The intial position of the associated Stream when the constructor of this TUtilsCustomReader object was called.
Position is used to track the reader's position within the stream relative to InitialStreamPosition. The value of Position will always be inside the most recent buffer block read. Thus, for reading, Position will always be less than the stream's Position.
This function is called by the Position property to get the reader's position, relative to InitialStreamPosition, within the stream.
Return Value:
Attempts to read up to Count bytes from the associated stream into Buf.
Parameters:
Return Value:
This procedure is called by the Position property to specify the reader's position, relative to InitialStreamPosition, within the stream.
Parameters:
Creates a new TUtilsCustomReader object. Create allocates memory for a TUtilsCustomReader object, and associates it with the stream passed in the Stream parameter, with a buffer of size BufSize.
Parameters:
Exceptions:
Destroys the TUtilsCustomReader instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.
FlushBuffer synchronizes the reader's buffer with the associated stream by setting the stream's Position to match the reader's Position, relative to InitialStreamPosition.
Use TUtilsCustomWriter as a base class when defining a class for buffered output of stream data.
Returns the size of the buffer as specified in the constructor.
The intial position of the associated Stream when the constructor of this TUtilsCustomReader object was called.
Position is used to track the writers's position within the stream relative to InitialStreamPosition. The value of Position will always be inside the most recent buffer block wrote. Thus, for writing, Position will always be greater than the stream's Position.
This function is called by the Position property to get the writer's position, relative to InitialStreamPosition, within the stream.
Return Value:
This procedure is called by the Position property to specify the writer's position, relative to InitialStreamPosition, within the stream.
Parameters:
Writes Count bytes from Buf to the associated stream.
Parameters:
Exceptions:
Creates a new TUtilsCustomWriter object. Create allocates memory for a TUtilsCustomWriter object, and associates it with the stream passed in the Stream parameter, with a buffer of size BufSize.
Parameters:
Exceptions:
Destroys the TUtilsCustomWriter instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.
Before the TUtilsCustomWriter instance is destroyed, all data in its buffer is written to the stream.
FlushBuffer synchronizes the writer's buffer with the associated stream by setting the stream's Position to match the writer's Position, relative to InitialStreamPosition. All data in the writer's buffer is written to the stream.
TUtilsCustomBOMReader is a TUtilsCustomReader descendant which can autodetect UCS-4, UTF-8 or UTF-16 encodings when initialized with a stream starting with a byte order mark (BOM).
Returns the size of the buffer as specified in the constructor.
Returns the size of the Byte Order Mark of the input stream, or '0' if no Byte Order Mark has been detected.
The codec class coresponding to the type of the Byte Order Mark (if any) of the input stream. This is one of the following values (The notation ## is used to denote any byte value except that two consecutive ##s cannot be both 00.):
The intial position of the associated Stream when the constructor of this TUtilsCustomReader object was called.
Position is used to track the writers's position within the stream relative to (InitialStreamPosition + ByteOrderMarkSize). The value of Position will always be inside the most recent buffer block wrote. Thus, for writing, Position will always be greater than the stream's Position.
This function is called by the Position property to get the reader's position, relative to (InitialStreamPosition + ByteOrderMarkSize), within the stream.
Return Value:
Attempts to read up to Count bytes from the associated stream into Buf.
Parameters:
Return Value:
This procedure is called by the Position property to specify the reader's position, relative to (InitialStreamPosition + ByteOrderMarkSize), within the stream.
Parameters:
Constructs and initializes an instance of TUtilsInputSource with the specified Stream. If the specified Stream starts with a UCS-4, UTF-8 or UTF-16 byte order mark, the byte order mark is skipped.
Parameters:
TUtilsBOMReader is a TUtilsCustomBOMReader descendant which declares some of the protected properties of TUtilsCustomBOMReader as public.
Returns the size of the buffer as specified in the constructor.
The intial position of the associated Stream when the constructor of this TUtilsCustomReader object was called.
Returns the size of the Byte Order Mark of the input stream, or '0' if no Byte Order Mark has been detected.
The codec class coresponding to the type of the Byte Order Mark (if any) of the input stream. This is one of the following values (The notation ## is used to denote any byte value except that two consecutive ##s cannot be both 00.):
Position is used to track the writers's position within the stream relative to (InitialStreamPosition + ByteOrderMarkSize). The value of Position will always be inside the most recent buffer block wrote. Thus, for writing, Position will always be greater than the stream's Position.
This function is called by the Position property to get the reader's position, relative to (InitialStreamPosition + ByteOrderMarkSize), within the stream.
Return Value:
Attempts to read up to Count bytes from the associated stream into Buf.
Parameters:
Return Value:
This procedure is called by the Position property to specify the reader's position, relative to (InitialStreamPosition + ByteOrderMarkSize), within the stream.
Parameters:
Constructs and initializes an instance of TUtilsInputSource with the specified Stream. If the specified Stream starts with a UCS-4, UTF-8 or UTF-16 byte order mark, the byte order mark is skipped.
Parameters:
TUtilsUCS4Reader encapsulates information about a character stream input source in a single object.
The internal TUtilsBOMReader object used to access the associated stream.
Returns the codec class used by default if no codec class was specified in the constructor and no byte order mark was found. The default codec class for TUtilsUCS4Reader objects is TUTF8Codec. Derived classes may override the protected GetDefaultCodecClass function to return a different codec class as default.
InitialUCS4CharData specifies the values used to initialize the character location information. Derived classes may change it to provide offset values.
If 'lrNormalize' (the default), line breaks are adjusted to Linux-style breaks with a single linefeed character, i.e. a sequence of CARRIAGE RETURN ($0D) + LINE FEED ($0A) or a single CARRIAGE RETURN is normalized to a single LINE FEED ($0A). If 'lrPass' no normalization is taking place.
The reset position for the internal TUtilsBOMReader object.
'True' if the input source is at its start position, i.e. the value of the current code point is $98 (START OF STRING); 'False' otherwise.
Returns the size of the buffer as specified in the constructor.
The number of bytes used to encode a byte order mark. If no byte order mark was used, ByteOrderMarkSize returns '0'.
Returns the codec class corresponding to the byte order mark of the stream, or 'nil' of the stream has no byte order mark.
Returns the codec class corresponding to the character encoding scheme of the input stream. The codec class was specified in the contructor or autodetected with the help of the input stream's byte order mark, if any, or set to the default TUTF8Codec class, if neither a codec class had been specified or a byte order mark been found.
Returns a record structure that contains the Unicode codepoint and location information of the current character.
'True' if the end of the input stream was reached, i.e. the value of the code point of the current character is $9C (STRING TERMINATOR); 'False' otherwise.
Returns a record structure that contains the Unicode codepoint and location information of the next character.
Returns a record structure that contains the Unicode codepoint and location information of the previous character.
Constructs and initializes an instance of TUtilsInputSource with the specified Stream.
Parameters:
Exceptions:
Advances the current code point as far as the following content of the input stream matches the specified WideString. After calling Match, if the specified WideString completely matched the following content of the input stream, the position of the current code point is that of the last matched character. If the following content of the input stream did not completely match the specified WideString, the position of the current code point after calling Match is that of the first mismatched character.
Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. The Match function may nevertheless test for STRING TERMINATOR which must appear at the end of the specified wideString in order to get a chance for a positive result.
Parameters:
Return Value:
Exceptions:
Advances the current character to the next character (if any) of the input stream. If the code point of the current character is $9C (STRING TERMINATOR) calling Next has no effect. If the end of the input stream is reached the code point of the current character is set to $9C (STRING TERMINATOR).
Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. Note also that if the value of the current character is $9C the code point returned by the NextChar property is always $9C no matter whether the end of the input stream was reached or not.
Exceptions:
Resets the input source to its initial position and state.
Exceptions:
Advances the current character to the next character (if any) of the input stream while skipping any UCS-2 character contained in Ucs2Str. If the code point of the current character is $9C (STRING TERMINATOR) calling SkipNext has no effect. If the end of the input stream is reached the code point of the current character is set to $9C (STRING TERMINATOR).
Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. Including $9C in the Ucs2Str parameter has no effect. Note also that if the value of the current character is $9C the code point returned by the NextChar property is always $9C no matter whether the end of the input stream was reached or not.
Parameters:
Return Value:
Exceptions:
Use TUtilsCustomTranscoder is the abstract base class for transcoder classes. It provides basic methods to transcode a sequence of characters from one encoding to another. Do not use instances of TUtilsCustomTranscoder directly in your application. Instead use one of the classes derived from TUtilsCustomTranscoder.
Returns 'True' during a transcoding session, otherwise 'False'.
Returns the codec class corresponding to the character encoding scheme of the input events.
The name of the character encoding scheme for the input characters.
Exceptions on setting:
Defines how line breaks are normalized. Line breaks are either a single LINE FEED ($0A), a single CARRIAGE RETURN ($0D) or a sequence of CARRIAGE RETURN + LINE FEED. If LineBreakOpt is 'lbCRLF' they are normalized into a sequence of CARRIAGE RETURN + LINE FEED; if LineBreakOpt is 'lbCR' they are normalized into a single CARRIAGE RETURN; if LineBreakOpt is 'lbLF' they are normalized into a single LINE FEED; if LineBreakOpt is 'lbNone' no normalization takes place.
Attention: Attempts to change LineBreakOpt during a transcoding session are ignored.
Returns the codec class corresponding to the character encoding scheme of the output events.
The name of the character encoding scheme for the output characters.
Exceptions on setting:
TNotifyEvent = procedure(Sender: TObject) of object;
Triggert during a transcoding session after each individual character transcoding.
Parameters:
The event handler for the OnRead event of an internal TCustomUnicodeCodec object. This codec object requests byte values to be transcoded during a transcoding session, similar to the Delphi VCL TStream.read function. By default the CodecReadEventHandler procedure returns 'False' in the Ok parameter and does not write input byte values to the Buf parameter. Derived classes should override this procedure to provide input byte values.
Parameters:
The event handler for the OnWrite event of an internal TCustomUnicodeCodec object. This codec object sends byte values of transcoded characters during a transcoding session, similar to the Delphi VCL TStream.write function. By default the CodecWriteEventHandler procedure does nothing. Derived classes should override this procedure to process output byte values.
Parameters:
DoProgress triggers an OnProgress event.
Creates a new instance of the TUtilsCustomTranscoder class.
Destroys the TUtilsCustomTranscoder instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.
Performs a transcoding session. Line breaks are normalized according to the value of the LineBreakOpt property. During a transcoding session the TUtilsCustomTranscoder object requests input bytes through its protected CodecReadEventHandler procedure and provides output bytes through its protected CodecWriteEventHandler procedure. After each individual character transcoding an OnProgress event is triggert.
Derived classes should override the CodecReadEventHandler and CodecWriteEventHandler procedures to provide input byte values and to process output byte values.
A transcoding session terminates, if an OnRead event either returns 'False' in its 'Ok' parameter or returns a byte (or byte sequence) representing the equivalent of the Unicode character STRING TERMINATOR (Unicode code point: $9C). To abort a transcoding session raise an EConvert exception in the OnProgress event (which is then further propagated trough the Transcode procedure).
Exceptions:
TUtilsStandardTranscoder is used to transcode a sequence of characters from one encoding to another by using OnRead and OnWrite events to request input code and provide output code.
Returns 'True' during a transcoding session, otherwise 'False'.
The name of the character encoding scheme for the input characters.
Exceptions on setting:
Defines how line breaks are normalized. Line breaks are either a single LINE FEED ($0A), a single CARRIAGE RETURN ($0D) or a sequence of CARRIAGE RETURN + LINE FEED. If LineBreakOpt is 'lbCRLF' they are normalized into a sequence of CARRIAGE RETURN + LINE FEED; if LineBreakOpt is 'lbCR' they are normalized into a single CARRIAGE RETURN; if LineBreakOpt is 'lbLF' they are normalized into a single LINE FEED; if LineBreakOpt is 'lbNone' no normalization takes place.
Attention: Attempts to change LineBreakOpt during a transcoding session are ignored.
The name of the character encoding scheme for the output characters.
Exceptions on setting:
TNotifyEvent = procedure(Sender: TObject) of object;
Triggert during a transcoding session after each individual character transcoding.
Parameters:
TCodecReadEvent = procedure(Sender: TObject; var Buf; Count: Longint; var Ok: Boolean) of object;
Requests byte values to be transcoded during a transcoding session, similar to the Delphi VCL TStream.read function.
Parameters:
TCodecWriteEvent = procedure(Sender: TObject; const Buf; Count: Longint) of object;
Sends byte values of transcoded characters during a transcoding session, similar to the Delphi VCL TStream.write function.
Parameters:
The event handler for the OnRead event of an internal TCustomUnicodeCodec object. This codec object requests byte values to be transcoded during a transcoding session, similar to the Delphi VCL TStream.read function. The CodecReadEventHandler procedure triggers an OnRead event to request the required input bytes. If no OnRead event handler was specified it returns 'False' in its Ok parameter.
Parameters:
The event handler for the OnWrite event of an internal TCustomUnicodeCodec object. This codec object sends byte values of transcoded characters during a transcoding session, similar to the Delphi VCL TStream.write function. The CodecWriteEventHandler procedure triggers an OnWrite event to send the transcoded characters.
Parameters:
DoProgress triggers an OnProgress event.
Creates a new instance of the TUtilsStandardTranscoder class.
Destroys the TUtilsStandardTranscoder instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.
Performs a transcoding session. Line breaks are normalized according to the value of the LineBreakOpt property. During a transcoding session the TUtilsStandardTranscoder object requests input bytes through its OnRead event and provides output bytes through its OnWrite event. After each individual character transcoding an OnProgress event is triggert.
A transcoding session terminates, if an OnRead event either returns 'False' in its 'Ok' parameter or returns a byte (or byte sequence) representing the equivalent of the Unicode character STRING TERMINATOR (Unicode code point: $9C). To abort a transcoding session raise an EConvert exception in the OnProgress event (which is then further propagated trough the Transcode procedure).
Exceptions:
Use TUtilsStandardTranscoder to transcode a sequence of characters from one encoding to another by using an input and an output stream. Reading from and writing to the streams is buffered.
Returns 'True' during a transcoding session, otherwise 'False'.
The name of the character encoding scheme for the input characters.
Exceptions on setting:
Defines how line breaks are normalized. Line breaks are either a single LINE FEED ($0A), a single CARRIAGE RETURN ($0D) or a sequence of CARRIAGE RETURN + LINE FEED. If LineBreakOpt is 'lbCRLF' they are normalized into a sequence of CARRIAGE RETURN + LINE FEED; if LineBreakOpt is 'lbCR' they are normalized into a single CARRIAGE RETURN; if LineBreakOpt is 'lbLF' they are normalized into a single LINE FEED; if LineBreakOpt is 'lbNone' no normalization takes place.
Attention: Attempts to change LineBreakOpt during a transcoding session are ignored.
The name of the character encoding scheme for the output characters.
Exceptions on setting:
TNotifyEvent = procedure(Sender: TObject) of object;
Triggert during a transcoding session after each individual character transcoding.
Parameters:
The event handler for the OnRead event of an internal TCustomUnicodeCodec object. This codec object requests byte values to be transcoded during a transcoding session, similar to the Delphi VCL TStream.read function. The CodecReadEventHandler procedure reads the requested number of bytes from the associated input stream.
Parameters:
The event handler for the OnWrite event of an internal TCustomUnicodeCodec object. This codec object sends byte values of transcoded characters during a transcoding session, similar to the Delphi VCL TStream.write function. The CodecWriteEventHandler procedure writes the transcoded characters to the associated output stream.
Parameters:
DoProgress triggers an OnProgress event.
Parameters:
Creates a new instance of the TUtilsStreamTranscoder class.
Destroys the TUtilsStreamTranscoder instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.
Performs a transcoding session. Line breaks are normalized according to the value of the LineBreakOpt property. During a transcoding session the TUtilsStandardTranscoder object requests input bytes from the associated input stream and writes output bytes to the associated output stream. After each individual character transcoding an OnProgress event is triggert.
A transcoding session terminates, if the input stream returns a byte (or byte sequence) representing the equivalent of the Unicode character STRING TERMINATOR (Unicode code point: $9C). To abort a transcoding session raise an EConvert exception in the OnProgress event (which is then further propagated trough the Transcode procedure).
Exceptions: