Language Utilities Library 1.0.1

Delphi 5, 6, 7, and Kylix Implementation

Dieter Köhler

LICENSE

The contents of the Extended Document Object Model files are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this files except in compliance with the License. You may obtain a copy of the License at "http://www.mozilla.org/MPL/"

Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.

The Original Code is "LangUtils.pas".

The Initial Developer of the Original Code is Dieter Köhler (Heidelberg, Germany, "http://www.philo.de/"). Portions created by the Initial Developer are Copyright (C) 1999-2003 Dieter Köhler. All Rights Reserved.

Alternatively, the contents of this files may be used under the terms of the GNU General Public License Version 2 or later (the "GPL"), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this files only under the terms of the GPL, and not to allow others to use your version of this files under the terms of the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL. If you do not delete the provisions above, a recipient may use your version of this file under the terms of any one of the MPL or the GPL.

2004


Table of Contents

Introduction
Using the Unit
ISO 639 Constants
TIso639LanguageCode
TIso639LanguageCodeSet
ISO 639 Translation
TIso639Info
Helper Functions
IsRFC3066LanguageTag
IsSubLanguage

Introduction

The Language Utilities Library contains helper functions for ISO 639, ISO 639-2 and RFC 3066 language code processing. For further information about these standards see the relevant specifications:

  • [ISO 639] ISO 639:1988 (E/F) - Code for the representation of names of languages - The International Organization for Standardization, 1st edition, 1988-04-01 Prepared by ISO/TC 37 - Terminology (principles and coordination).

  • [ISO 639-2] ISO 639-2:1998 - Codes for the representation of names of languages -- Part 2: Alpha-3 code - edition 1, 1998-11-01, 66 pages, prepared by a Joint Working Group of ISO TC46/SC4 and ISO TC37/SC2.

  • [RFC 3066] Alvestrand, H.: "Tags for the Identification of Languages", RFC 3066, January 2001, see <http://www.ietf.org/rfc/rfc3066.txt>.

The latest version of this software is available at <http://www.philo.de/xml/>.

Using the Unit

The Language Utilities Library does not contain any components to be registered. So using it from inside your own projects is very simple: Add "LangUtils" to the uses clause of your unit and make sure that the path to the location of the LangUtils.pas file is included in Delphi's list of library paths. To include it go to the Library section of Delphi's Environment Options dialog (see the menu item: "Tools/Environment Options ...").

ISO 639 Constants

ISO 639 defines a set of two-character codes for the representation of names of languages. The TIso639LanguageCode constants are used as an equivalent.

TIso639LanguageCode

Defines the following constants for the representation of names of languages using the codes of ISO 639:


iso639_aa, // Afar
iso639_ab, // Abkhazian
iso639_af, // Afrikaans
iso639_am, // Amharic
iso639_ar, // Arabic
iso639_as, // Assamese
iso639_ay, // Aymara
iso639_az, // Azerbaijani

iso639_ba, // Bashkir
iso639_be, // Byelorussian
iso639_bg, // Bulgarian
iso639_bh, // Bihari
iso639_bi, // Bislama
iso639_bn, // Bengali; Bangla
iso639_bo, // Tibetan
iso639_br, // Breton

iso639_ca, // Catalan
iso639_co, // Corsican
iso639_cs, // Czech
iso639_cy, // Welsh

iso639_da, // Danish
iso639_de, // German
iso639_dz, // Bhutani

iso639_el, // Greek
iso639_en, // English
iso639_eo, // Esperanto
iso639_es, // Spanish
iso639_et, // Estonian
iso639_eu, // Basque

iso639_fa, // Persian
iso639_fi, // Finnish
iso639_fj, // Fiji
iso639_fo, // Faeroese
iso639_fr, // French
iso639_fy, // Frisian

iso639_ga, // Irish
iso639_gd, // Scots Gaelic
iso639_gl, // Galician
iso639_gn, // Guarani
iso639_gu, // Gujarati

iso639_ha, // Hausa
iso639_hi, // Hindi
iso639_hr, // Croatian
iso639_hu, // Hungarian
iso639_hy, // Armenian

iso639_ia, // Interlingua
iso639_ie, // Interlingue
iso639_ik, // Inupiak
iso639_in, // Indonesian
iso639_is, // Icelandic
iso639_it, // Italian
iso639_iw, // Hebrew

iso639_ja, // Japanese
iso639_ji, // Yiddish
iso639_jw, // Javanese

iso639_ka, // Georgian
iso639_kk, // Kazakh
iso639_kl, // Greenlandic
iso639_km, // Cambodian
iso639_kn, // Kannada
iso639_ko, // Korean
iso639_ks, // Kashmiri
iso639_ku, // Kurdish
iso639_ky, // Kirghiz

iso639_la, // Latin
iso639_ln, // Lingala
iso639_lo, // Laothian
iso639_lt, // Lithuanian
iso639_lv, // Latvian, Lettish

iso639_mg, // Malagasy
iso639_mi, // Maori
iso639_mk, // Macedonian
iso639_ml, // Malayalam
iso639_mn, // Mongolian
iso639_mo, // Moldavian
iso639_mr, // Marathi
iso639_ms, // Malay
iso639_mt, // Maltese
iso639_my, // Burmese

iso639_na, // Nauru
iso639_ne, // Nepali
iso639_nl, // Dutch
iso639_no, // Norwegian

iso639_oc, // Occitan
iso639_om, // (Afan) Oromo
iso639_or, // Oriya

iso639_pa, // Punjabi
iso639_pl, // Polish
iso639_ps, // Pashto, Pushto
iso639_pt, // Portuguese

iso639_qu, // Quechua

iso639_rm, // Rhaeto-Romance
iso639_rn, // Kirundi
iso639_ro, // Romanian
iso639_ru, // Russian
iso639_rw, // Kinyarwanda

iso639_sa, // Sanskrit
iso639_sd, // Sindhi
iso639_sg, // Sangro
iso639_sh, // Serbo-Croatian
iso639_si, // Singhalese
iso639_sk, // Slovak
iso639_sl, // Slovenian
iso639_sm, // Samoan
iso639_sn, // Shona
iso639_so, // Somali
iso639_sq, // Albanian
iso639_sr, // Serbian
iso639_ss, // Siswati
iso639_st, // Sesotho
iso639_su, // Sundanese
iso639_sv, // Swedish
iso639_sw, // Swahili

iso639_ta, // Tamil
iso639_te, // Tegulu
iso639_tg, // Tajik
iso639_th, // Thai
iso639_ti, // Tigrinya
iso639_tk, // Turkmen
iso639_tl, // Tagalog
iso639_tn, // Setswana
iso639_to, // Tonga
iso639_tr, // Turkish
iso639_ts, // Tsonga
iso639_tt, // Tatar
iso639_tw, // Twi

iso639_uk, // Ukrainian
iso639_ur, // Urdu
iso639_uz, // Uzbek

iso639_vi, // Vietnamese
iso639_vo, // Volapuk

iso639_wo, // Wolof

iso639_xh, // Xhosa

iso639_yo, // Yoruba

iso639_zh, // Chinese
iso639_zu  // Zulu
  

TIso639LanguageCodeSet

TIso639LanguageCodeSet = set of TIso639LanguageCode;

TIso639LanguageCodeSet defines a set of TIso639LanguageCode instances.

ISO 639 Translation

TIso639Info

TIso639Info = class(TPersistent)

A TIso639Info object allows to translate ISO 639 symbols (e.g. 'en'), codes (e.g. iso639_en) and readable WideStrings (e.g. 'English') into each other. A TIso639Info object can be assigned to a TSting or TUtilsWideStringList object, which then contains a list of all defined ISO 639 languages in the language specified in the NameLanguage property and optionally added the ISO 639 symbol in brackets if the value of the appendSymbolToName property is set to True; the associated objects of the items contain the index of the corresponding TIso639LanguageCode.

Properties

AppendSymbolToName (Boolean): If set to True, the codeToName() function will append the ISO 639 symbol in square brackets to the returned WideString (e.g. 'English [en]'). The default value is False.

NameLanguage (TIso639LanguageCode): Specifies the language used for the codeToName() and nameToCode() function.

Exceptions on Setting

  • EConvertError: Raised if the specified language is not supported (see the description of the SupportedLanguages property.

SupportedLanguages (TIso639LanguageCodeSet) (readonly): A set of the languages supporting the CodeToName and NameToCode function. Currently the only supported language is English (iso639_en). Users may add more support in derived classes.

Public Methods

constructor Create;

Creates a new TIso639Info object.

function CodeToName(const Value: TIso639LanguageCode): WideString; virtual;

Translates the specified TIso639LanguageCode into a readable WideString. The language of the returned WideString is determined by the NameLanguage property. If the AppendSymbolToName flag is set to True the ISO 639 symbol is appended in square brackets.

Example: If NameLanguage is set to iso639_en and AppendSymbolToName is True calling CodeToName(iso639_de) returns 'German [de]'.

Paramters

  • Value: The TIso639LanguageCode to be translated.

Return Value

  • A human readable WideString containing the name of the specified language.

function CodeToSymbol(const Value: TIso639LanguageCode): WideString; virtual;

Translates the specified TIso639LanguageCode into a WideString containing the corresponding ISO 639 symbol, e.g. iso639_en is translated into 'en'.

Paramters

  • Value: The TIso639LanguageCode to be translated.

Return Value

  • A WideString containing the code of the specified language.

function NameToCode(const Value: WideString): WideString; TIso639LanguageCode;

Translates the specified human readable WideString into its corresponding TIso639LanguageCode. The language of the specefied wideString is determined by the NameLanguage property.

The specified WideString is evaluated in the following way: If it contains a semicolon (';'), only the text up to the semicolon is evaluated. If it contains no semicolon, but the character '[' and the character before it is a SPACE, then only the text up to this SPACE is evaluated. If none of this is true, the whole text is evaluated.

The evaluation is case-sensitive, i.e. the first character of the specified WideString must be upper-case, the rest lower-case.

Example: If NameLanguage is set to iso639_en calling NameToCode('German'), NameToCode('German; Deutsch'), and NameToCode('German [de]'), will all return iso639_de. Instead, calling NameToCode('german'), NameToCode('German ; Deutsch'), nameToCode('de'), NameToCode('[de]'), NameToCode('German '), or NameToCode('Deutsch; German') will raise an exception.

Paramters

  • Value: The human readable WideString to be translated.

Return Value

  • The TIso639LanguageCode of the specified language.

Exceptions

  • EConvertError: Raised if an invalid ISO 639 language name was specified.

function SymbolToCode(const Value: WideString): WideString; TIso639LanguageCode;

Translates the specified symbol into its corresponding TIso639LanguageCode, e.g. 'en' into iso639_en. This function is case-sensitive, i.e. only lower-case values are accepted.

Paramters

  • Value: The ISO 639 symbol to be translated.

Return Value

  • The TIso639LanguageCode of the specified language.

Exceptions

  • EConvertError: Raised if an invalid ISO 639 symbol was specified. Note that this function is case-sensitive, i.e. only lower-case values are accepted.

Helper Functions

IsRFC3066LanguageTag

function IsRFC3066LanguageTag(const S: string): Boolean;

Tests whether the specified string is a valid language tag according to [RFC 3066].

Parameters

S: The US-ASCII string to test.

Return Value

True if the specified string is a valid language tag according to RFC 3066; False otherwise.

IsSubLanguage

function IsSubLanguage(const Sublanguage, Superlanguage: WideString): Boolean;

Tests whether the specified Sublanguage is the same as or is a sublanguage of the language specified by the Superlanguage parameter according to [RFC 3066].

Parameters

Sublanguage: The sublanguage to test.

Superlanguage: The superlanguage to test.

Return Value

True if Superlanguage is equal to Sublanguage ignoring case, or if there is some suffix starting with '-' such that Superlanguage is equal to Sublanguage ignoring that suffix of Sublanguage and ignoring case; False otherwise. For example IsSubLanguage('en-US','EN') returns True.