Seed7 Library: Charsets

Charsets

Constant Summary

string

cp_437
	Conversion table from code page 437 to Unicode.

string

cp_708
	Conversion table from code page 708 to Unicode.

string

cp_720
	Conversion table from code page 720 to Unicode.

string

cp_737
	Conversion table from code page 737 to Unicode.

string

cp_775
	Conversion table from code page 775 to Unicode.

string

cp_850
	Conversion table from code page 850 to Unicode.

string

cp_852
	Conversion table from code page 852 to Unicode.

string

cp_855
	Conversion table from code page 855 to Unicode.

string

cp_857
	Conversion table from code page 857 to Unicode.

string

cp_858
	Conversion table from code page 858 to Unicode.

string

cp_860
	Conversion table from code page 860 to Unicode.

string

cp_861
	Conversion table from code page 861 to Unicode.

string

cp_862
	Conversion table from code page 862 to Unicode.

string

cp_863
	Conversion table from code page 863 to Unicode.

string

cp_864
	Conversion table from code page 864 to Unicode.

string

cp_865
	Conversion table from code page 865 to Unicode.

string

cp_866
	Conversion table from code page 866 to Unicode.

string

cp_869
	Conversion table from code page 869 to Unicode.

string

cp_874
	Conversion table from code page 874 to Unicode.

string

cp_907
	Conversion table from code page 907 to Unicode.

string

cp_909
	Conversion table from code page 909 to Unicode.

string

cp_1125
	Conversion table from code page 1125 to Unicode.

string

cp_1250
	Conversion table from code page 1250 to Unicode.

string

cp_1251
	Conversion table from code page 1251 to Unicode.

string

cp_1252
	Conversion table from code page 1252 to Unicode.

string

cp_1253
	Conversion table from code page 1253 to Unicode.

string

cp_1254
	Conversion table from code page 1254 to Unicode.

string

cp_1255
	Conversion table from code page 1255 to Unicode.

string

cp_1256
	Conversion table from code page 1256 to Unicode.

string

cp_1257
	Conversion table from code page 1257 to Unicode.

string

cp_1258
	Conversion table from code page 1258 to Unicode.

string

iso_8859_1
	Conversion table from ISO-8859-1 (Latin-1) to Unicode.

string

iso_8859_2
	Conversion table from ISO-8859-2 (Latin-2) to Unicode.

string

iso_8859_3
	Conversion table from ISO-8859-3 (Latin-3) to Unicode.

string

iso_8859_4
	Conversion table from ISO-8859-4 (Latin-4) to Unicode.

string

iso_8859_5
	Conversion table from ISO-8859-5 to Unicode.

string

iso_8859_6
	Conversion table from ISO-8859-6 to Unicode.

string

iso_8859_7
	Conversion table from ISO-8859-7 to Unicode.

string

iso_8859_8
	Conversion table from ISO-8859-8 to Unicode.

string

iso_8859_9
	Conversion table from ISO-8859-9 (Latin-5) to Unicode.

string

iso_8859_10
	Conversion table from ISO-8859-10 (Latin-6) to Unicode.

string

iso_8859_11
	Conversion table from ISO-8859-11 to Unicode.

string

iso_8859_13
	Conversion table from ISO-8859-13 (Latin-7) to Unicode.

string

iso_8859_14
	Conversion table from ISO-8859-14 (Latin-8) to Unicode.

string

iso_8859_15
	Conversion table from ISO-8859-15 (Latin-9) to Unicode.

string

iso_8859_16
	Conversion table from ISO-8859-16 (Latin-10) to Unicode.

string

mac_os_roman
	Conversion table from Mac OS Roman encoding to Unicode.

string

koi8_r
	Conversion table from KOI8-R encoding to Unicode.

string

koi8_u
	Conversion table from KOI8-U encoding to Unicode.

string

mik
	Conversion table from MIK encoding to Unicode.

string

tis_620
	Conversion table from TIS-620 encoding to Unicode.

string

armscii_8
	Conversion table from ArmSCII-8 encoding to Unicode.

string

geostd8
	Conversion table from GEOSTD8 encoding to Unicode.

string

jis_x_0201
	Conversion table from JIS X 0201 encoding to Unicode.

string

viscii
	Conversion table from VISCII encoding to Unicode.

string

ns_4551_1
	Conversion table from NS 4551-1 encoding to Unicode.

string

cp_037
	Conversion table from code page 37 to Unicode.

string

cp_273
	Conversion table from code page 273 to Unicode.

string

cp_277
	Conversion table from code page 277 to Unicode.

string

cp_280
	Conversion table from code page 280 to Unicode.

string

cp_285
	Conversion table from code page 285 to Unicode.

string

cp_297
	Conversion table from code page 297 to Unicode.

string

cp_500
	Conversion table from code page 500 to Unicode.

string

cp_1047
	Conversion table from code page 1047 to Unicode.

Function Summary

void

conv2unicode (inout string: stri, in string: codePage)
	Convert a string with bytes from a code page encoding to UTF-32.

void

conv2unicodeByName (inout string: stri, in var string: charset)
	Convert a string from a charset encoding to UTF-32.

Constant Detail

cp_437

const string: cp_437

Conversion table from code page 437 to Unicode. Code page 437 is the character set of the original IBM PC.

cp_708

const string: cp_708

Conversion table from code page 708 to Unicode. Code page 708 was outlined by ASMO to write Arabic.

cp_720

const string: cp_720

Conversion table from code page 720 to Unicode. The MS-DOS code page 720 is used to write Arabic.

cp_737

const string: cp_737

Conversion table from code page 737 to Unicode. The MS-DOS code page 737 is used to write Greek language.

cp_775

const string: cp_775

Conversion table from code page 775 to Unicode. The MS-DOS code page 775 is used to write the Estonian, Lithuanian and Latvian languages.

cp_850

const string: cp_850

Conversion table from code page 850 to Unicode. The MS-DOS code page 850 is used to write Western European languages.

cp_852

const string: cp_852

Conversion table from code page 852 to Unicode. The MS-DOS code page 752 is used to write Central European languages that use Latin script, such as Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian and Slovak.

cp_855

const string: cp_855

Conversion table from code page 855 to Unicode. The MS-DOS code page 855 is used to write Cyrillic script.

cp_857

const string: cp_857

Conversion table from code page 857 to Unicode. The MS-DOS code page 857 is used to write Turkish.

cp_858

const string: cp_858

Conversion table from code page 858 to Unicode. The MS-DOS code page 858 is used to write Western European languages.

cp_860

const string: cp_860

Conversion table from code page 860 to Unicode. The MS-DOS code page 860 is used to write Portuguese.

cp_861

const string: cp_861

Conversion table from code page 861 to Unicode. The MS-DOS code page 861 is used to write Icelandic language.

cp_862

const string: cp_862

Conversion table from code page 862 to Unicode. The MS-DOS code page 862 is used to write Hebrew.

cp_863

const string: cp_863

Conversion table from code page 863 to Unicode. The MS-DOS code page 863 is used to write French language.

cp_864

const string: cp_864

Conversion table from code page 864 to Unicode. The MS-DOS code page 864 is used to write Arabic.

cp_865

const string: cp_865

Conversion table from code page 865 to Unicode. The MS-DOS code page 865 is used to write Nordic languages.

cp_866

const string: cp_866

Conversion table from code page 866 to Unicode. The MS-DOS code page 866 is used to write Cyrillic script.

cp_869

const string: cp_869

Conversion table from code page 869 to Unicode. The MS-DOS code page 869 is used to write Greek language.

cp_874

const string: cp_874

Conversion table from code page 874 to Unicode. The Windows code page 874 is used for the Thai language.

cp_907

const string: cp_907

Conversion table from code page 907 to Unicode. Code page 907 is is used for encoding APL symbols.

cp_909

const string: cp_909

Conversion table from code page 909 to Unicode. Code page 909 is is used for encoding APL symbols.

cp_1125

const string: cp_1125

Conversion table from code page 1125 to Unicode. The code page 1125 is used for the Ukrainian language.

cp_1250

const string: cp_1250

Conversion table from code page 1250 to Unicode. The Windows code page 1250 encodes the Latin alphabet for Central and Eastern European languages, that use Latin script. It can be used for encoding German, Polish, Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian, Romanian and Albanian.

cp_1251

const string: cp_1251

Conversion table from code page 1251 to Unicode. The Windows code page 1251 encodes the Latin/Cyrillic alphabet. It can be used is for encoding Russian, Bulgarian, Serbian and Macedonian.

cp_1252

const string: cp_1252

Conversion table from code page 1252 to Unicode. The Windows code page 1250 encodes the Latin alphabet for Western European languages. The Windows code page 1252 is a superset of ISO 8859-1.

cp_1253

const string: cp_1253

Conversion table from code page 1253 to Unicode. The Windows code page 1253 encodes the Latin/Greek alphabet.

cp_1254

const string: cp_1254

Conversion table from code page 1254 to Unicode. The Windows code page 1254 covers the Turkish language.

cp_1255

const string: cp_1255

Conversion table from code page 1255 to Unicode. The Windows code page 1255 encodes the Latin/Hebrew alphabet.

cp_1256

const string: cp_1256

Conversion table from code page 1256 to Unicode. The Windows code page 1256 encodes the Latin/Arabic alphabet.

cp_1257

const string: cp_1257

Conversion table from code page 1257 to Unicode. The Windows code page 1257 covers the Baltic languages.

cp_1258

const string: cp_1258

Conversion table from code page 1258 to Unicode. The Windows code page 1258 covers the Vietnamese language.

iso_8859_1

const string: iso_8859_1

Conversion table from ISO-8859-1 (Latin-1) to Unicode. ISO-8859-1 is the character set for Western European languages. ISO-8859-1 defines the first 256 code point assignments in Unicode. It can be used for encoding Afrikaans, Albanian, Basque, Breton, Catalan, Danish, English, Faroese, Galician, German, Icelandic, Malay, Irish, Italian, Latin, Leonese, Luxembourgish, Norwegian, Occitan, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swahili, Swedish and Walloon.

iso_8859_2

const string: iso_8859_2

Conversion table from ISO-8859-2 (Latin-2) to Unicode. ISO-8859-2 is the character set for Eastern European languages. It can be used for encoding Bosnian, Croatian, Czech, German, Hungarian, Polish, Serbian, Slovak, Slovene and Sorbian.

iso_8859_3

const string: iso_8859_3

Conversion table from ISO-8859-3 (Latin-3) to Unicode. ISO-8859-3 is the character set for South European languages. It can be used for encoding Turkish, Maltese and Esperanto.

iso_8859_4

const string: iso_8859_4

Conversion table from ISO-8859-4 (Latin-4) to Unicode. ISO-8859-4 is the character set for North European languages. It can be used for encoding Estonian, Latvian, Lithuanian, Greenlandic and Sami.

iso_8859_5

const string: iso_8859_5

Conversion table from ISO-8859-5 to Unicode. ISO-8859-5 is the character set for the Latin/Cyrillic alphabet. It can be used for encoding Bulgarian, Belarusian, Russian, Serbian and Macedonian.

iso_8859_6

const string: iso_8859_6

Conversion table from ISO-8859-6 to Unicode. ISO-8859-6 is the character set for the Latin/Arabic alphabet.

iso_8859_7

const string: iso_8859_7

Conversion table from ISO-8859-7 to Unicode. ISO-8859-7 is the character set for the Latin/Greek alphabet.

iso_8859_8

const string: iso_8859_8

Conversion table from ISO-8859-8 to Unicode. ISO-8859-8 is the character set for the Latin/Hebrew alphabet.

iso_8859_9

const string: iso_8859_9

Conversion table from ISO-8859-9 (Latin-5) to Unicode. ISO-8859-9 is the character set to cover the Turkish language.

iso_8859_10

const string: iso_8859_10

Conversion table from ISO-8859-10 (Latin-6) to Unicode. ISO-8859-10 is the character set to cover the Nordic languages.

iso_8859_11

const string: iso_8859_11

Conversion table from ISO-8859-11 to Unicode. ISO-8859-11 is the character set for the Latin/Thai alphabet.

iso_8859_13

const string: iso_8859_13

Conversion table from ISO-8859-13 (Latin-7) to Unicode. ISO-8859-13 is the character set to cover the Baltic languages.

iso_8859_14

const string: iso_8859_14

Conversion table from ISO-8859-14 (Latin-8) to Unicode. ISO-8859-14 is the character set to cover the Celtic languages. It can be used for encoding Irish, Manx, Scottish Gaelic, Welsh, Cornish and Breton.

iso_8859_15

const string: iso_8859_15

Conversion table from ISO-8859-15 (Latin-9) to Unicode. ISO-8859-15 is the character set for Western European languages. It can be used for encoding Afrikaans, Albanian, Breton, Catalan, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Kurdish, Latin, Luxembourgish, Malay, Norwegian, Occitan, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Scots, Spanish, Swahili, Swedish, Tagalog and Walloon.

iso_8859_16

const string: iso_8859_16

Conversion table from ISO-8859-16 (Latin-10) to Unicode. ISO-8859-16 is the character set for South-Eastern European languages. It can be used for encoding Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.

mac_os_roman

const string: mac_os_roman

Conversion table from Mac OS Roman encoding to Unicode.

koi8_r

const string: koi8_r

Conversion table from KOI8-R encoding to Unicode. KOI8-R is an encoding used for Russian and Bulgarian.

koi8_u

const string: koi8_u

Conversion table from KOI8-U encoding to Unicode. KOI8-U is an encoding used for Ukrainian and Belorussian.

mik

const string: mik

Conversion table from MIK encoding to Unicode. MIK is an encoding used for the Bulgarian language.

tis_620

const string: tis_620

Conversion table from TIS-620 encoding to Unicode. TIS-620 is the Thai Industrial Standard encoding for the Thai language.

armscii_8

const string: armscii_8

Conversion table from ArmSCII-8 encoding to Unicode. ArmSCII-8 is an encoding for the Armenian alphabet.

geostd8

const string: geostd8

Conversion table from GEOSTD8 encoding to Unicode. GEOSTD is an encoding for the Georgian language.

jis_x_0201

const string: jis_x_0201

Conversion table from JIS X 0201 encoding to Unicode. JIS X 0201 is a Japanese Industrial Standard which combines ASCII (except backslash and tilde) with half-width kana (the halfwidth form of katakana).

viscii

const string: viscii

Conversion table from VISCII encoding to Unicode. VISCII is the Vietnamese Standard Code for Information Interchange.

ns_4551_1

const string: ns_4551_1

Conversion table from NS 4551-1 encoding to Unicode. NS 4551 version 1 is the national variant of ISO 646 for Norway.

cp_037

const string: cp_037

Conversion table from code page 37 to Unicode. Code page 37 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Australia, Brazil, Canada, New Zealand, Portugal, South Africa and USA.

cp_273

const string: cp_273

Conversion table from code page 273 to Unicode. Code page 273 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Austria and Germany.

cp_277

const string: cp_277

Conversion table from code page 277 to Unicode. Code page 277 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Denmark and Norway.

cp_280

const string: cp_280

Conversion table from code page 280 to Unicode. Code page 280 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Italy.

cp_285

const string: cp_285

Conversion table from code page 285 to Unicode. Code page 285 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Ireland and the United Kingdom.

cp_297

const string: cp_297

Conversion table from code page 297 to Unicode. Code page 297 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in France.

cp_500

const string: cp_500

Conversion table from code page 500 to Unicode. Code page 500 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is international.

cp_1047

const string: cp_1047

Conversion table from code page 1047 to Unicode. Code page 1047 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used for Open Systems.

Function Detail

conv2unicode

const proc: conv2unicode (inout string: stri, in string: codePage)

Convert a string with bytes from a code page encoding to UTF-32. When the function is called stri is assumed to be a string of bytes. The byte sequence in stri is encoded with the specified code page. When the function is left stri contains an UTF-32 unicode string.

conv2unicodeByName

const proc: conv2unicodeByName (inout string: stri, in var string: charset)

Convert a string from a charset encoding to UTF-32. When the function is called stri is assumed to be a string of bytes encoded with the specified charset. When the function is left stri contains an UTF-32 unicode string. The 'charset' encoding is specified with an IANA/MIME charset name. This way the function can be used to convert encoded data for internet protocols such as NNTP.

Raises:: RANGE_ERROR - The charset unknown