Libraries |
|
Unicode | Source Code |
|
|
Function Summary | |||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
| ||||
string |
|
Function Detail |
toUtf8
const func string: toUtf8 (in string: stri)
-
Convert a string to an UTF-8 encoded string of bytes.
toUtf8("€") returns "â\130;¬"
Surrogate pairs are converted into a CESU-8 encoded string:
toUtf8("\16#d834;\16#dd1e;") returns "\237;\160;\180;\237;\180;\158;" (surrogate pair)
This function accepts unpaired surrogate characters.
toUtf8("\16#dc00;") returns "\16#ed;\16#b0;\16#80;" (unpaired surrogate char)
Note that a Unicode string should not contain surrogate characters. If the string contains surrogate pairs use
toUtf8(replaceUtf16SurrogatePairs(stringWithSurrogatePairs))
to create a correct (not CESU-8 encoded) UTF-8 string.
- Parameters:
- stri - Normal (UTF-32) string to be converted to UTF-8.
- Returns:
- stri converted to a string of bytes with UTF-8 encoding.
fromUtf8
const func string: fromUtf8 (in string: utf8)
-
Convert a string with bytes in UTF-8 encoding to UTF-32.
fromUtf8("â\130;¬") returns "€"
Surrogate pairs from a CESU-8 encoded string are kept intact:
fromUtf8("\237;\160;\180;\237;\180;\158;") returns "\16#d834;\16#dd1e;" (surrogate pair)
To decode a CESU-8 encoded string use:
replaceUtf16SurrogatePairs(fromUtf8(cesu8String))
Overlong encodings and unpaired surrogate chare are accepted.
fromUtf8("\16#c0;\16#80;") returns "\0;" (overlong encoding) fromUtf8("\16#ed;\16#b0;\16#80;") returns "\16#dc00;" (unpaired surrogate char)
- Parameters:
- utf8 - String of bytes encoded with UTF-8.
- Returns:
- utf8 converted to a normal (UTF-32) string.
- Raises:
- RANGE_ERROR - If utf8 contains a char beyond '\255;' or if utf8 is not encoded with UTF-8.
toUtf16Be
const func string: toUtf16Be (in string: stri)
-
Convert a string to an UTF-16BE encoded string of bytes.
- Parameters:
- stri - Normal (UTF-32) string to be converted to UTF-16BE.
- Returns:
- stri converted to a string of bytes with UTF-16BE encoding.
- Raises:
- RANGE_ERROR - If a character is not representable as UTF-16 or a surrogate character is present.
fromUtf16Be
const func string: fromUtf16Be (in string: utf16Be)
-
Convert an UTF-16BE encoded string of bytes to UTF-32.
- Parameters:
- utf16Be - String of bytes encoded with UTF-16 in big endian byte order.
- Returns:
- utf16Be converted to a normal (UTF-32) string.
- Raises:
- RANGE_ERROR - If the length of utf16Be is odd or if utf16Be contains a char beyond '\255;' or if utf16Be contains an invalid surrogate pair.
toUtf16Le
const func string: toUtf16Le (in string: stri)
-
Convert a string to an UTF-16LE encoded string of bytes.
- Parameters:
- stri - Normal (UTF-32) string to be converted to UTF-16LE.
- Returns:
- stri converted to a string of bytes with UTF-16LE encoding.
- Raises:
- RANGE_ERROR - If a character is not representable as UTF-16 or a surrogate character is present.
fromUtf16Le
const func string: fromUtf16Le (in string: utf16Le)
-
Convert an UTF-16LE encoded string of bytes to UTF-32.
- Parameters:
- utf16Le - String of bytes encoded with UTF-16 in little endian byte order.
- Returns:
- utf16Le converted to a normal (UTF-32) string.
- Raises:
- RANGE_ERROR - If the length of utf16Le is odd or if utf16Le contains a char beyond '\255;' or if utf16Le contains an invalid surrogate pair.
replaceUtf16SurrogatePairs
const func string: replaceUtf16SurrogatePairs (in string: stri)
-
Return string where all surrogate pairs are replaced by single chars.
replaceUtf16SurrogatePairs("\16#d834;\16#dd1e;") returns "\16#1d11e;"
This function can be used to decode CESU-8 encoded strings:
replaceUtf16SurrogatePairs(fromUtf8(cesu8String))
In CESU-8 an Unicode code point from the Basic Multilingual Plane (BMP) is encoded in the same way as in UTF-8. An Unicode code point outside the BMP is first represented as a surrogate pair, like in UTF-16, and then each surrogate code point is encoded in UTF-8.
- Parameters:
- stri - String of UTF-16 or UTF-32 Unicode characters, which may contain surrogate pairs.
- Returns:
- stri with all surrogate pairs replaced by single UTF-32 chars.
- Raises:
- RANGE_ERROR - If an invalid surrogate pair is present.
fromNullTerminatedUtf16Be
const func string: fromNullTerminatedUtf16Be (in string: stri, in integer: startPos)
-
Convert a null terminated UTF-16BE encoded string of bytes to UTF-32. The UTF-16BE encoded string starts at startPos and ends with an UTF-16BE encoded null ('\0;') character. When there is no null character the UTF-16BE encoded string is assumed to extend to the end of stri.
- Parameters:
- stri - UTF-16BE encoded string of bytes (starting from startPos).
- startPos - Start position for the UTF-16BE encoded null terminated string.
- Returns:
- the string found in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16BE to UTF-32 fails.
fromNullTerminatedUtf16Le
const func string: fromNullTerminatedUtf16Le (in string: stri, in integer: startPos)
-
Convert a null terminated UTF-16LE encoded string of bytes to UTF-32. The UTF-16LE encoded string starts at startPos and ends with an UTF-16LE encoded null ('\0;') character. When there is no null character the UTF-16LE encoded string is assumed to extend to the end of stri.
- Parameters:
- stri - UTF-16LE encoded string of bytes (starting from startPos).
- startPos - Start position for the UTF-16LE encoded null terminated string.
- Returns:
- the string found in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16LE to UTF-32 fails.
getNullTerminatedUtf16Be
const func string: getNullTerminatedUtf16Be (in string: stri, inout integer: currPos)
-
Read a null terminated UTF-16BE encoded string of bytes and convert it to UTF-32. The UTF-16BE encoded string starts at currPos and ends with an UTF-16BE encoded null ('\0;') character. The position currPos is advanced behind the null ('\0;') character. When there is no null character the UTF-16BE encoded string is assumed to extend to the end of stri. In this case currPos is advanced beyond the length of stri.
- Parameters:
- stri - UTF-16BE encoded string of bytes (starting from currPos).
- currPos - Start position for the UTF-16BE encoded null terminated string. The function advances currPos to refer to the position behind the terminating null ('\0;') character.
- Returns:
- the string found in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16BE to UTF-32 fails.
getNullTerminatedUtf16Be
const func string: getNullTerminatedUtf16Be (inout file: inFile)
-
Read a null terminated UTF-16BE encoded string of bytes and convert it to UTF-32. The reading ends when an UTF-16BE encoded null ('\0;') character has been read.
- Parameters:
- inFile - File with UTF-16BE encoded bytes.
- Returns:
- the string read in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16BE to UTF-32 fails.
getNullTerminatedUtf16Le
const func string: getNullTerminatedUtf16Le (in string: stri, inout integer: currPos)
-
Read a null terminated UTF-16LE encoded string of bytes and convert it to UTF-32. The UTF-16LE encoded string starts at currPos and ends with an UTF-16LE encoded null ('\0;') character. The position currPos is advanced behind the null ('\0;') character. When there is no null character the UTF-16LE encoded string is assumed to extend to the end of stri. In this case currPos is advanced beyond the length of stri.
- Parameters:
- stri - UTF-16LE encoded string of bytes (starting from currPos).
- currPos - Start position for the UTF-16LE encoded null terminated string. The function advances currPos to refer to the position behind the terminating null ('\0;') character.
- Returns:
- the string found in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16LE to UTF-32 fails.
getNullTerminatedUtf16Le
const func string: getNullTerminatedUtf16Le (inout file: inFile)
-
Read a null terminated UTF-16LE encoded string of bytes and convert it to UTF-32. The reading ends when an UTF-16LE encoded null ('\0;') character has been read.
- Parameters:
- inFile - File with UTF-16LE encoded bytes.
- Returns:
- the string read in UTF-32 encoding without the null ('\0;') character.
- Raises:
- RANGE_ERROR - If the conversion from UTF-16LE to UTF-32 fails.
fromUtf7
const func string: fromUtf7 (in string: stri7)
|
|