summaryrefslogtreecommitdiffstats
path: root/libjava/classpath/doc/unicode/ReadMe-2.1.1.txt
diff options
context:
space:
mode:
Diffstat (limited to 'libjava/classpath/doc/unicode/ReadMe-2.1.1.txt')
-rw-r--r--libjava/classpath/doc/unicode/ReadMe-2.1.1.txt344
1 files changed, 0 insertions, 344 deletions
diff --git a/libjava/classpath/doc/unicode/ReadMe-2.1.1.txt b/libjava/classpath/doc/unicode/ReadMe-2.1.1.txt
deleted file mode 100644
index 506f155a762..00000000000
--- a/libjava/classpath/doc/unicode/ReadMe-2.1.1.txt
+++ /dev/null
@@ -1,344 +0,0 @@
-
-UNICODE 2.1 CHARACTER DATABASE
-
-Copyright (c) 1991-1998 Unicode, Inc.
-All Rights reserved.
-
-DISCLAIMER
-
-The Unicode Character Database "UNIDAT21.TXT" is provided as-is by
-Unicode, Inc. (The Unicode Consortium). No claims are made as to fitness for any
-particular purpose. No warranties of any kind are expressed or implied. The
-recipient agrees to determine applicability of information provided. If this
-file has been purchased on magnetic or optical media from Unicode, Inc.,
-the sole remedy for any claim will be exchange of defective media within
-90 days of receipt.
-
-This disclaimer is applicable for all other data files accompanying the
-Unicode Character Database, some of which have been compiled by the
-Unicode Consortium, and some of which have been supplied by other vendors.
-
-LIMITATIONS ON RIGHTS TO REDISTRIBUTE THIS DATA
-
-Recipient is granted the right to make copies in any form for internal
-distribution and to freely use the information supplied in the creation of
-products supporting the Unicode (TM) Standard. This file can be redistributed
-to third parties or other organizations (whether for profit or not) as long
-as this notice and the disclaimer notice are retained.
-
-EXPLANATORY INFORMATION
-
-The Unicode Character Database defines the default Unicode character
-properties, and internal mappings. Particular implementations may choose to
-override the properties and mappings that are not normative. If that is done,
-it is up to the implementer to establish a protocol to convey that
-information. For more information about character properties and mappings,
-see "The Unicode Standard, Worldwide Character Encoding, Version 2.0",
-published by Addison-Wesley. For information about other data files
-accompanying the Unicode Character Database, see the section of the
-Unicode Standard they were extracted from, or the explanatory readme
-files and/or header sections with those files.
-
-The Unicode Character Database has been updated to reflect Version 2.1
-of the Unicode Standard, with two additional characters added to those
-published in Version 2.0:
-
- U+20AC EURO SIGN
- U+FFFC OBJECT REPLACEMENT CHARACTER
-
-A number of corrections have also been made to case mappings or other
-errors in the database noted since the publication of Version 2.0. And
-a few normative bidirectional properties have been modified to reflect
-decisions of the Unicode Technical Committee.
-
-The Unicode Character Database is a plain ASCII text file consisting of lines
-containing fields terminated by semicolons. Each line represents the data for
-one encoded character in the Unicode Standard, Version 2.1. Every encoded
-character has a data entry, with the exception of certain special ranges, as
-detailed below.
-
-There are five special ranges of characters that are represented only by
-their start and end characters, since the properties in the file are uniform,
-except for code values (which are all sequential and assigned). The names of CJK
-ideograph characters and Hangul syllable characters are algorithmically
-derivable. (See the Unicode Standard for more information). Surrogate
-characters and private use characters have no names.
-
-The exact ranges represented by start and end characters are:
-
- The CJK Ideographs Area (U+4E00 - U+9FFF)
- The Hangul Syllables Area (U+AC00 - U+D7A3)
- The Surrogates Area (U+D800 - U+DFFF)
- The Private Use Area (U+E000 - U+F8FF)
- CJK Compatibility Ideographs (U+F900 - U+FAFF)
-
-The following table describes the format and meaning of each field in a
-data entry in the Unicode Character Database. Fields which contain
-normative information are so indicated.
-
-Field Explanation
------ -----------
-
- 0 Code value in 4-digit hexadecimal format.
- This field is normative.
-
- 1 Unicode 2.1 Character Name. These names match exactly the
- names published in Chapter 7 of the Unicode Standard, Version
- 2.0, except for the two additional characters.
- This field is normative.
-
- 2 General Category. This is a useful breakdown into various "character
- types" which can be used as a default categorization in implementations.
- Some of the values are normative, and some are informative.
- See below for a brief explanation.
-
- 3 Canonical Combining Classes. The classes used for the
- Canonical Ordering Algorithm in the Unicode Standard. These
- classes are also printed in Chapter 4 of the Unicode Standard.
- This field is normative. See below for a brief explanation.
-
- 4 Bidirectional Category. See the list below for an explanation of the
- abbreviations used in this field. These are the categories required
- by the Bidirectional Behavior Algorithm in the Unicode Standard.
- These categories are summarized in Chapter 4 of the Unicode Standard.
- This field is normative.
-
- 5 Character Decomposition. In the Unicode Standard, not all of
- the decompositions are full decompositions. Recursive
- application of look-up for decompositions will, in all cases, lead to
- a maximal decomposition. The decompositions match exactly the
- decompositions published with the character names in Chapter 7
- of the Unicode Standard. This field is normative.
-
- 6 Decimal digit value. This is a numeric field. If the character
- has the decimal digit property, as specified in Chapter 4 of
- the Unicode Standard, the value of that digit is represented
- with an integer value in this field. This field is normative.
-
- 7 Digit value. This is a numeric field. If the character represents a
- digit, not necessarily a decimal digit, the value is here. This
- covers digits which do not form decimal radix forms, such as the
- compatibility superscript digits. This field is informative.
-
- 8 Numeric value. This is a numeric field. If the character has the
- numeric property, as specified in Chapter 4 of the Unicode
- Standard, the value of that character is represented with an
- integer or rational number in this field. This includes fractions as,
- e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH.
- Also included are numerical values for compatibility characters
- such as circled numbers. This field is normative.
-
- 9 If the characters has been identified as a "mirrored" character in
- bidirectional text, this field has the value "Y"; otherwise "N".
- The list of mirrored characters is also printed in Chapter 4 of
- the Unicode Standard. This field is normative.
-
- 10 Unicode 1.0 Name. This is the old name as published in Unicode 1.0.
- This name is only provided when it is significantly different from
- the Unicode 2.1 name for the character. This field is informative.
-
- 11 10646 Comment field. This field is informative.
-
- 12 Upper case equivalent mapping. If a character is part of an
- alphabet with case distinctions, and has an upper case equivalent,
- then the upper case equivalent is in this field. See the explanation
- below on case distinctions. These mappings are always one-to-one,
- not one-to-many or many-to-one. This field is informative.
-
- 13 Lower case equivalent mapping. Similar to 12. This field is informative.
-
- 14 Title case equivalent mapping. Similar to 12. This field is informative.
-
-GENERAL CATEGORY
-
-The values in this field are abbreviations for the following. Some of the
-values are normative, and some are informative. For more information, see
-the Unicode Standard. Note: the standard does not assign information to
-control characters (except for TAB in the Bidirectonal Algorithm).
-Implementations will generally also assign categories to certain control
-characters, notably CR and LF, according to platform conventions.
-
-
-Normative
- Mn = Mark, Non-Spacing
- Mc = Mark, Spacing Combining
- Me = Mark, Enclosing
-
- Nd = Number, Decimal Digit
- Nl = Number, Letter
- No = Number, Other
-
- Zs = Separator, Space
- Zl = Separator, Line
- Zp = Separator, Paragraph
-
- Cc = Other, Control
- Cf = Other, Format
- Cs = Other, Surrogate
- Co = Other, Private Use
- Cn = Other, Not Assigned
-
-Informative
- Lu = Letter, Uppercase
- Ll = Letter, Lowercase
- Lt = Letter, Titlecase
- Lm = Letter, Modifier
- Lo = Letter, Other
-
- Pc = Punctuation, Connector
- Pd = Punctuation, Dash
- Ps = Punctuation, Open
- Pe = Punctuation, Close
- Po = Punctuation, Other
-
- Sm = Symbol, Math
- Sc = Symbol, Currency
- Sk = Symbol, Modifier
- So = Symbol, Other
-
-BIDIRECTIONAL PROPERTIES
-
-Please refer to the Unicode Standard for an explanation of the algorithm for
-Bidirectional Behavior and an explanation of the sigificance of these categories.
-These values are normative.
-
-Strong types:
- L Left-Right; Most alphabetic, syllabic, and logographic
- characters (e.g., CJK ideographs)
- R Right-Left; Arabic, Hebrew, and
- punctuation specific to those scripts
-Weak types:
- EN European Number
- ES European Number Separator
- ET European Number Terminator
- AN Arabic Number
- CS Common Number Separator
-
-Separators:
- B Block Separator
- S Segment Separator
-
-Neutrals:
- WS Whitespace
- ON Other Neutrals ; All other characters: punctuation, symbols
-
-CHARACTER DECOMPOSITION TAGS
-
-The decomposition is a normative property of a character. The tags supplied
-with certain decompositions generally indicate formatting information.
-Where no such tag is given, the decomposition is designated as canonical.
-Conversely, the presence of a formatting tag also indicates
-that the decomposition is a compatibility decomposition and not a canonical
-decomposition. In the absence of other formatting information in a
-compatibility decomposition, the tag <compat> is used to distinguish it from
-canonical decompositions.
-
-In some instances a canonical decomposition or a compatibility decomposition
-may consist of a single character. For a canonical decomposition, this
-indicates that the character is a canonical equivalent of another single
-character. For a compatibility decomposition, this indicates that the
-character is a compatibility equivalent of another single character.
-
-The compatibility formatting tags used are:
-
- <font> A font variant (e.g. a blackletter form).
- <noBreak> A no-break version of a space or hyphen.
- <initial> An initial presentation form (Arabic).
- <medial> A medial presentation form (Arabic).
- <final> A final presentation form (Arabic).
- <isolated> An isolated presentation form (Arabic).
- <circle> An encircled form.
- <super> A superscript form.
- <sub> A subscript form.
- <vertical> A vertical layout presentation form.
- <wide> A wide (or zenkaku) compatibility character.
- <narrow> A narrow (or hankaku) compatibility character.
- <small> A small variant form (CNS compatibility).
- <square> A CJK squared font variant.
- <fraction> A vulgar fraction form.
- <compat> Otherwise unspecified compatibility character.
-
-CANONICAL COMBINING CLASSES
-
- 0: Spacing, enclosing, reordrant, and surrounding
- 1: Overlays and interior
- 6: Tibetan subjoined Letters
- 7: Nuktas
- 8: Hiragana/Katakana voiced marks
- 9: Viramas
- 10: Start of fixed position classes
-199: End of fixed position classes
-200: Below left attached
-202: Below attached
-204: Below right attached
-208: Left attached (reordrant around single base character)
-210: Right attached
-212: Above left attached
-214: Above attached
-216: Above right attached
-218: Below left
-220: Below
-222: Below right
-224: Left (reordrant around single base character)
-226: Right
-228: Above left
-230: Above
-232: Above right
-234: Double above
-
-Note: some of the combining classes in this list do not currently have
-members but are specified here for completeness.
-
-CASE MAPPINGS
-
-In addition to uppercase and lowercase, because of the inclusion of certain
-composite characters for compatibility, such as "01F1;LATIN CAPITAL LETTER
-DZ", there is a third case, called titlecase, which is used where the first
-character of a word is to be capitalized (e.g. UPPERCASE, Titlecase,
-lowercase). An example of such a character is "01F2;LATIN CAPITAL LETTER D
-WITH SMALL LETTER Z".
-
-The uppercase, titlecase and lowercase fields are only included for characters
-that have a single corresponding character of that type. Composite characters
-(such as "339D;SQUARE CM") that do not have a single corresponding character
-of that type can be cased by decomposition.
-
-The case mapping is an informative, default mapping. Certain languages, such
-as Turkish, German, French, or Greek may have small deviations from the
-default mappings listed in the Unicode Character Database.
-
-MODIFICATION HISTORY
-
-Modifications made in updating the Unicode Character Database for
-the Unicode Standard, Version 2.1 (from Version 2.0) are:
-* Added two characters (U+20AC and U+FFFC).
-* Amended bidi properties for U+0026, U+002E, U+0040, U+2007.
-* Corrected case mappings for U+018E, U+019F, U+01DD, U+0258, U+0275,
- U+03C2, U+1E9B.
-* Changed combining order class for U+0F71.
-* Corrected canonical decompositions for U+0F73, U+1FBE.
-* Changed decomposition for U+FB1F from compatibility to canonical.
-* Added compatibility decompositions for U+FBE8, U+FBE9, U+FBF9..U+FBFB.
-* Corrected compatibility decompositions for U+2469, U+246A, U+3358.
-
-
-Some of the modifications made in updating the Unicode Character Database
-for the Unicode Standard, Version 2.0 are:
-* Fixed decompositions with TONOS to use correct NSM: 030D.
-* Removed old Hangul Syllables; mapping to new characters are
- in a separate table.
-* Marked compability decompositions with additional tags.
-* Changed old tag names for clarity.
-* Revision of decompositions to use first-level decomposition, instead
- of maximal decomposition.
-* Correction of all known errors in decompositions from earlier versions.
-* Added control code names (as old Unicode names).
-* Added Hangul Jamo decompositions.
-* Added Number category to match properties list in book.
-* Fixed categories of Koranic Arabic marks.
-* Fixed categories of precomposed characters to match decomposition where possible.
-* Added Hebrew cantillation marks and the Tibetan script.
-* Added place holders for ranges such as CJK Ideographic Area and the
- Private Use Area.
-* Added categories Me, Sk, Pc, Nl, Cs, Cf, and rectified a number of mistakes in the
- database.
OpenPOWER on IntegriCloud