Here is the whole set! a character dead at every word.
—Richard Brinsley Sheridan, The School for Scandal, Act 2, scene 2
Objects of type Character
represent primitive values of type char
.
public final classCharacter
{ public static final charMIN_VALUE
= '\u0000'; public static final charMAX_VALUE
= '\uffff'; public static final intMIN_RADIX
= 2; public static final intMAX_RADIX
= 36; publicCharacter
(char value); public StringtoString
(); public booleanequals
(Object obj); public inthashCode
(); public charcharValue
(); public static booleanisDefined
(char ch); public static booleanisLowerCase
(char ch); public static booleanisUpperCase
(char ch); public static booleanisTitleCase
(char ch); public static booleanisDigit
(char ch); public static booleanisLetter
(char ch); public static booleanisLetterOrDigit
(char ch); public static booleanisJavaLetter
(char ch); public static booleanisJavaLetterOrDigit
(char ch);) public static booleanisSpace
(char ch); public static chartoLowerCase
(char ch); public static chartoUpperCase
(char ch); public static chartoTitleCase
(char ch); public static intdigit
(char ch, int radix); public static charforDigit
(int digit, int radix); }
Many of the methods of class Character
are defined in terms of a "Unicode attribute table" that specifies a name for every defined Unicode character as well as other possible attributes, such as a decimal value, an uppercase equivalent, a lowercase equivalent, and/or a titlecase equivalent. Prior to Java 1.1, these methods were internal to the Java compiler and based on Unicode 1.1.5, as described here. The most recent versions of these methods should be used in Java compilers that are to run on Java systems that do not yet include these methods.
The Unicode 1.1.5 attribute table is available on the World Wide Web as:
ftp://unicode.org/pub/MappingTables/UnicodeData-1.1.5.txt
However, this file contains a few errors. The term "Unicode attribute table" in the following sections refers to the contents of this file after the following corrections have been applied:
03D0;GREEK BETA SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER CURLED BETA;;0392;;0392
03D1;GREEK THETA SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER SCRIPT THETA;;0398;;0398
03D5;GREEK PHI SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER SCRIPT PHI;;03A6;;03A6
03D6;GREEK PI SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER OMEGA PI;;03A0;;03A0
03F0;GREEK KAPPA SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER SCRIPT KAPPA;;039A;;039A
03F1;GREEK RHO SYMBOL;Ll;0;L;;;;;N;GREEK SMALL LETTER TAILED RHO;;03A1;;03A1
FF10;FULLWIDTH DIGIT ZERO;Nd;0;EN;0030;0;0;0;N;;;;;
FF11;FULLWIDTH DIGIT ONE;Nd;0;EN;0031;1;1;1;N;;;;;
FF12;FULLWIDTH DIGIT TWO;Nd;0;EN;0032;2;2;2;N;;;;;
FF13;FULLWIDTH DIGIT THREE;Nd;0;EN;0033;3;3;3;N;;;;;
FF14;FULLWIDTH DIGIT FOUR;Nd;0;EN;0034;4;4;4;N;;;;;
FF15;FULLWIDTH DIGIT FIVE;Nd;0;EN;0035;5;5;5;N;;;;;
FF16;FULLWIDTH DIGIT SIX;Nd;0;EN;0036;6;6;6;N;;;;;
FF17;FULLWIDTH DIGIT SEVEN;Nd;0;EN;0037;7;7;7;N;;;;;
FF18;FULLWIDTH DIGIT EIGHT;Nd;0;EN;0038;8;8;8;N;;;;;
FF19;FULLWIDTH DIGIT NINE;Nd;0;EN;0039;9;9;9;N;;;;;
03DA;GREEK LETTER STIGMA;Lu;0;L;;;;;N;GREEK CAPITAL LETTER STIGMA;;;;
03DC;GREEK LETTER DIGAMMA;Lu;0;L;;;;;N;GREEK CAPITAL LETTER DIGAMMA;;;;
03DE;GREEK LETTER KOPPA;Lu;0;L;;;;;N;GREEK CAPITAL LETTER KOPPA;;;;
03E0;GREEK LETTER SAMPI;Lu;0;L;;;;;N;GREEK CAPITAL LETTER SAMPI;;;;
03C2;GREEK SMALL LETTER FINAL SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3
It is anticipated that these problems will be corrected for Unicode version 2.0.
Java 1.1 will include the methods defined here, either based on Unicode 1.1.5 or, we hope, updated versions of the methods that use the newer Unicode 2.0. The character attribute table for Unicode 2.0 is currently available on the World Wide Web as the file:
ftp://unicode.org/pub/MappingTables/UnicodeData-2.0.12.txt
If you are implementing a Java compiler or system, please refer to the page:
which will be updated with information about the Unicode-dependent methods.
The biggest change in Unicode 2.0 is a complete rearrangement of the Korean Hangul characters. There are numerous smaller improvements as well.
It is our intention that Java will track Unicode as it evolves over time. Given that full Unicode support is just emerging in the marketplace, and that changes in Unicode are in areas which are not yet widely used, this should cause minimal problems and further Java's goal of worldwide language support.
20.5.1 public static final char MIN_VALUE = '\u0000';
The constant value of this field is the smallest value of type char
.
[This field is scheduled for introduction in Java version 1.1.]
20.5.2 public static final char MAX_VALUE = '\uffff';
The constant value of this field is the smallest value of type char
.
[This field is scheduled for introduction in Java version 1.1.]
20.5.3 public static final int MIN_RADIX = 2;
The constant value of this field is the smallest value permitted for the radix argument in radix-conversion methods such as the digit
method (§20.5.23), the
forDigit
method (§20.5.24), and the toString
method of class Integer
(§20.7).
20.5.4 public static final int MAX_RADIX = 36;
The constant value of this field is the largest value permitted for the radix argument in radix-conversion methods such as the digit
method (§20.5.23), the forDigit
method (§20.5.24), and the toString
method of class Integer
(§20.7).
20.5.5 public Character(char value)
This constructor initializes a newly created Character
object so that it represents
the primitive value that is the argument.
20.5.6 public String toString()
The result is a String
whose length is 1
and whose sole component is the primitive char
value represented by this Character
object.
Overrides the toString
method of Object
(§20.1.2).
20.5.7 public boolean equals(Object obj)
The result is true
if and only if the argument is not null
and is a Character
object that represents the same char
value as this Character
object.
Overrides the equals
method of Object
(§20.1.3).
20.5.8 public int hashCode()
The result is the primitive char
value represented by this Character
object, cast
to type int
.
Overrides the hashCode
method of Object
(§20.1.4).
20.5.9 public char charValue()
The primitive char
value represented by this Character
object is returned.
20.5.10 public static boolean isDefined(char ch)
The result is true
if and only if the character argument is a defined Unicode character.
A character is a defined Unicode character if and only if at least one of the following is true:
\u3040
and not greater than \u9FA5
.
\uF900
and not greater than \uFA2D
.
It follows, then, that for Unicode 1.1.5 as corrected above, the defined Unicode characters are exactly those with codes in the following list, which contains both single codes and inclusive ranges: 0000-01F5, 01FA-0217, 0250-02A8, 02B0-02DE, 02E0-02E9, 0300-0345, 0360-0361, 0374-0375, 037A, 037E, 0384-038A, 038C, 038E-03A1, 03A3-03CE, 03D0-03D6, 03DA, 03DC, 03DE, 03E0, 03E2-03F3, 0401-040C, 040E-044F, 0451-045C, 045E-0486, 0490-04C4, 04C7-04C8, 04CB-04CC, 04D0-04EB, 04EE-04F5, 04F8-04F9, 0531-0556, 0559-055F, 0561-0587, 0589, 05B0-05B9, 05BB-05C3, 05D0-05EA, 05F0-05F4, 060C, 061B, 061F, 0621-063A, 0640-0652, 0660-066D, 0670-06B7, 06BA-06BE, 06C0-06CE, 06D0-06ED, 06F0-06F9, 0901-0903, 0905-0939, 093C-094D, 0950-0954, 0958-0970, 0981-0983, 0985-098C, 098F-0990, 0993-09A8, 09AA-09B0, 09B2, 09B6-09B9, 09BC, 09BE-09C4, 09C7-09C8, 09CB-09CD, 09D7, 09DC-09DD, 09DF-09E3, 09E6-09FA, 0A02, 0A05-0A0A, 0A0F-0A10, 0A13-0A28, 0A2A-0A30, 0A32-0A33, 0A35-0A36, 0A38-0A39, 0A3C, 0A3E-0A42, 0A47-0A48, 0A4B-0A4D, 0A59-0A5C, 0A5E, 0A66-0A74, 0A81-0A83, 0A85-0A8B, 0A8D, 0A8F-0A91, 0A93-0AA8, 0AAA-0AB0, 0AB2-0AB3, 0AB5-0AB9, 0ABC-0AC5, 0AC7-0AC9, 0ACB-0ACD, 0AD0, 0AE0, 0AE6-0AEF, 0B01-0B03, 0B05-0B0C, 0B0F-0B10, 0B13-0B28, 0B2A-0B30, 0B32-0B33, 0B36-0B39, 0B3C-0B43, 0B47-0B48, 0B4B-0B4D, 0B56-0B57, 0B5C-0B5D, 0B5F-0B61, 0B66-0B70, 0B82-0B83, 0B85-0B8A, 0B8E-0B90, 0B92-0B95, 0B99-0B9A, 0B9C, 0B9E-0B9F, 0BA3-0BA4, 0BA8-0BAA, 0BAE-0BB5, 0BB7-0BB9, 0BBE-0BC2, 0BC6-0BC8, 0BCA-0BCD, 0BD7, 0BE7-0BF2, 0C01-0C03, 0C05-0C0C, 0C0E-0C10, 0C12-0C28, 0C2A-0C33, 0C35-0C39, 0C3E-0C44, 0C46-0C48, 0C4A-0C4D, 0C55-0C56, 0C60-0C61, 0C66-0C6F, 0C82-0C83, 0C85-0C8C, 0C8E-0C90, 0C92-0CA8, 0CAA-0CB3, 0CB5-0CB9, 0CBE-0CC4, 0CC6-0CC8, 0CCA-0CCD, 0CD5-0CD6, 0CDE, 0CE0-0CE1, 0CE6-0CEF, 0D02-0D03, 0D05-0D0C, 0D0E-0D10, 0D12-0D28, 0D2A-0D39, 0D3E-0D43, 0D46-0D48, 0D4A-0D4D, 0D57, 0D60-0D61, 0D66-0D6F, 0E01-0E3A, 0E3F-0E5B, 0E81-0E82, 0E84, 0E87-0E88, 0E8A, 0E8D, 0E94-0E97, 0E99-0E9F, 0EA1-0EA3, 0EA5, 0EA7, 0EAA-0EAB, 0EAD-0EB9, 0EBB-0EBD, 0EC0-0EC4, 0EC6,
0EC8-0ECD, 0ED0-0ED9, 0EDC-0EDD, 10A0-10C5, 10D0-10F6, 10FB, 1100-1159, 115F-11A2, 11A8-11F9, 1E00-1E9A, 1EA0-1EF9, 1F00-1F15, 1F18-1F1D, 1F20-1F45, 1F48-1F4D, 1F50-1F57, 1F59, 1F5B, 1F5D, 1F5F-1F7D, 1F80-1FB4, 1FB6-1FC4, 1FC6-1FD3, 1FD6-1FDB, 1FDD-1FEF, 1FF2-1FF4, 1FF6-1FFE, 2000-202E, 2030-2046, 206A-2070, 2074-208E, 20A0-20AA, 20D0-20E1, 2100-2138, 2153-2182, 2190-21EA, 2200-22F1, 2300, 2302-237A, 2400-2424, 2440-244A, 2460-24EA, 2500-2595, 25A0-25EF, 2600-2613, 261A-266F, 2701-2704, 2706-2709, 270C-2727, 2729-274B, 274D, 274F-2752, 2756, 2758-275E, 2761-2767, 2776-2794, 2798-27AF, 27B1-27BE, 3000-3037, 303F, 3041-3094, 3099-309E, 30A1-30FE, 3105-312C, 3131-318E, 3190-319F, 3200-321C, 3220-3243, 3260-327B, 327F-32B0, 32C0-32CB, 32D0-32FE, 3300-3376, 337B-33DD, 33E0-33FE, 3400-9FA5, F900-FA2D, FB00-FB06, FB13-FB17, FB1E-FB36, FB38-FB3C, FB3E, FB40-FB41, FB43-FB44, FB46-FBB1, FBD3-FD3F, FD50-FD8F, FD92-FDC7, FDF0-FDFB, FE20-FE23, FE30-FE44, FE49-FE52, FE54-FE66, FE68-FE6B, FE70-FE72, FE74, FE76-FEFC, FEFF, FF01-FF5E, FF61-FFBE, FFC2-FFC7, FFCA-FFCF, FFD2-FFD7, FFDA-FFDC, FFE0-FFE6, FFE8-FFEE, FFFD
.
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.11 public static boolean isLowerCase(char ch)
The result is true
if and only if the character argument is a lowercase character.
A character is considered to be lowercase if and only if all of the following are true:
ch
is not in the range \u2000
through \u2FFF
.
It follows, then, that for Unicode 1.1.5 as corrected above, the lowercase Unicode characters are exactly those with codes in the following list, which contains both single codes and inclusive ranges: 0061
-007A
, 00DF
-00F6
, 00F8
-00FF
, 0101
-0137
(odds only), 0138
-0148
(evens only), 0149
-0177
(odds only), 017A
-017E
(evens only), 017F
-0180
, 0183
, 0185
, 0188
, 018C
-018D
, 0192
, 0195
, 0199
-019B
, 019E
, 01A1
-01A5
(odds only), 01A8
, 01AB
, 01AD
, 01B0
, 01B4
, 01B6
, 01B9
-01BA
, 01BD
, 01C6
, 01C9
, 01CC
-01DC
(evens only), 01DD
-01EF
(odds only), 01F0
, 01F3
, 01F5
, 01FB
-0217
(odds only), 0250
-0261
, 0263
-0269
, 026B
-0273
, 0275
, 0277
-027F
, 0282
-028E
, 0290
-0293
, 029A
, 029D
-029E
, 02A0
, 02A3
-02A8
, 0390
, 03AC
-03CE
, 03D0
-03D1
, 03D5
-03D6
, 03E3
-03EF
(odds only), 03F0
-03F1
, 0430
-044F
, 0451
-045C
, 045E
-045F
, 0461
-0481
(odds only), 0491
-04BF
(odds only), 04C2
, 04C4
, 04C8
,
04CC
, 04D1
-04EB
(odds only), 04EF
-04F5
(odds only), 04F9
, 0561
-0587
, 1E01
-1E95
(odds only), 1E96
-1E9A
, 1EA1
-1EF9
(odds only), 1F00
-1F07
, 1F10
-1F15
, 1F20
-1F27
, 1F30
-1F37
, 1F40
-1F45
, 1F50
-1F57
, 1F60
-1F67
, 1F70
-1F7D
, 1F80
-1F87
, 1F90
-1F97
, 1FA0
-1FA7
, 1FB0
-1FB4
, 1FB6
-1FB7
, 1FC2
-1FC4
, 1FC6
-1FC7
, 1FD0
-1FD3
, 1FD6
-1FD7
, 1FE0
-1FE7
, 1FF2
-1FF4
, 1FF6
-1FF7
, FB00
-FB06
, FB13
-FB17
, FF41
-FF5A
.
Of the first 128 Unicode characters, exactly 26 are considered to be lowercase:
abcdefghijklmnopqrstuvwxyz
[This specification for the method isLowerCase
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns false
for all arguments larger than \u00FF
.]
20.5.12 public static boolean isUpperCase(char ch)
The result is true
if and only if the character argument is an uppercase character.
A character is considered to be uppercase if and only if all of the following are true:
ch
is not in the range \u2000
through \u2FFF
.
It follows, then, that for Unicode 1.1.5 as corrected above, the uppercase Unicode characters are exactly those with codes in the following list, which contains both single codes and inclusive ranges: 0041
-005A
, 00C0
-00D6
, 00D8
-00DE
, 0100
-0136
(evens only), 0139
-0147
(odds only), 014A
-0178
(evens only), 0179
-017D
(odds only), 0181
-0182
, 0184
, 0186
, 0187
, 0189
-018B
, 018E
-0191
, 0193
-0194
, 0196
-0198
, 019C
-019D
, 019F
-01A0
, 01A2
, 01A4
, 01A7
, 01A9
, 01AC
, 01AE
, 01AF
, 01B1
-01B3
, 01B5
, 01B7
, 01B8
, 01BC
, 01C4
, 01C7
, 01CA
, 01CD
-01DB
(odds only), 01DE
-01EE
(evens only), 01F1
, 01F4
, 01FA
-0216
(evens only), 0386
, 0388
-038A
, 038C
, 038E
, 038F
, 0391
-03A1
, 03A3
-03AB
, 03E2
-03EE
(evens only), 0401
-040C
, 040E
-042F
, 0460
-0480
(evens only), 0490
-04BE
(evens only), 04C1
, 04C3
, 04C7
, 04CB
, 04D0
-04EA
(evens only), 04EE
-04F4
(evens only), 04F8
, 0531
-0556
, 10A0
-10C5
, 1E00
-1E94
(evens only),
1EA0
-1EF8
(evens only), 1F08
-1F0F
, 1F18
-1F1D
, 1F28
-1F2F
, 1F38
-1F3F
, 1F48
-1F4D
, 1F59
-1F5F
(odds only), 1F68
-1F6F
, 1F88
-1F8F
, 1F98
-1F9F
, 1FA8
-1FAF
, 1FB8
-1FBC
, 1FC8
-1FCC
, 1FD8
-1FDB
, 1FE8
-1FEC
, 1FF8
-1FFC
, FF21
-FF3A
.
Of the first 128 Unicode characters, exactly 26 are considered to be uppercase:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[This specification for the method isUpperCase
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns false
for all arguments larger than \u00FF
.]
20.5.13 public static boolean isTitleCase(char ch)
The result is true
if and only if the character argument is a titlecase character.
The notion of "titlecase" was introduced into Unicode to handle a peculiar situation: there are single Unicode characters whose appearance in each case looks exactly like two ordinary Latin letters. For example, there is a single Unicode character `LJ' (\u01C7
) that looks just like the characters `L' and `J' put together. There is a corresponding lowercase letter `lj' (\u01C9
) as well. These characters are present in Unicode primarily to allow one-to-one translations from the Cyrillic alphabet, as used in Serbia, for example, to the Latin alphabet. Now suppose the word "LJUBINJE" (which has six characters, not eight, because two of them are the single Unicode characters `LJ' and `NJ', perhaps produced by one-to-one translation from the Cyrillic) is to be written as part of a book title, in capitals and lowercase. The strategy of making the first letter uppercase and the rest lowercase results in "LJubinje"-most unfortunate. The solution is that there must be a third form, called a titlecase form. The titlecase form of `LJ' is `Lj' (\u01C8
) and the titlecase form of `NJ' is `Nj'. A word for a book title is then best rendered by converting the first letter to titlecase if possible, otherwise to uppercase; the remaining letters are then converted to lowercase.
A character is considered to be titlecase if and only if both of the following are true:
ch
is not in the range \u2000
through \u2FFF
.
There are exactly four Unicode 1.1.5 characters for which isTitleCase
returns
true
:
\u01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON \u01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J \u01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J \u01F2 LATIN CAPITAL LETTER D WITH SMALL LETTER Z
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.14 public static boolean isDigit(char ch)
The result is true
if and only if the character argument is a digit.
A character is considered to be a digit if and only if both of the following are true:
ch
is not in the range \u2000
through \u2FFF
.
DIGIT
.
The digits are those characters with the following codes:
0030
-0039
ISO-Latin-1 (and ASCII) digits ('
0'
-'
9'
)0660
-0669
Arabic-Indic digits06F0
-06F9
Eastern Arabic-Indic digits0966
-096F
Devanagari digits09E6
-09EF
Bengali digits0A66
-0A6F
Gurmukhi digits0AE6
-0AEF
Gujarati digits0B66
-0B6F
Oriya digits0BE7
-0BEF
Tamil digits (there are only nine of these-no zero digit)0C66
-0C6F
Telugu digits0CE6
-0CEF
Kannada digits0D66
-0D6F
Malayalam digits0E50
-0E59
Thai digits0ED0
-0ED9
Lao digitsFF10
-FF19
Fullwidth digits
Of the first 128 Unicode characters, exactly 10 are considered to be digits:
0123456789
[This specification for the method isDigit
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns false
for all arguments larger than \u00FF
.]
20.5.15 public static boolean isLetter(char ch)
The result is true
if and only if the character argument is a letter.
A character is considered to be a letter if and only if it is a letter or digit (§20.5.16) but is not a digit (§20.5.14).
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.16 public static boolean isLetterOrDigit(char ch)
The result is true
if and only if the character argument is a letter-or-digit.
A character is considered to be a letter-or-digit if and only if it is a defined Unicode character (§20.5.10) and its code lies in one of the following ranges:
0030
-0039
ISO-Latin-1 (and ASCII) digits ('0'
-'9'
)0041
-005A
ISO-Latin-1 (and ASCII) uppercase Latin letters ('A'-'Z'
)0061
-007A
ISO-Latin-1 (and ASCII) lowercase Latin letters ('a'-'z'
)00C0
-00D6
ISO-Latin-1 supplementary letters00D8
-00F6
ISO-Latin-1 supplementary letters00F8
-00FF
ISO-Latin-1 supplementary letters0100
-1FFF
Latin extended-A, Latin extended-B, IPA extensions, spacing modifier letters, combining diacritical marks, basic Greek, Greek symbols and Coptic, Cyrillic, Armenian, Hebrew extended-A, Basic Hebrew, Hebrew extended-B, Basic Arabic, Arabic extended, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Basic Georgian, Georgian extended, Hanguljamo, Latin extended additional, Greek extended3040
-9FFF
Hiragana, Katakana, Bopomofo, Hangul compatibility Jamo, CJK miscellaneous, enclosed CJK characters and months, CJK compatibility, Hangul, Hangul supplementary-A, Hangul supplementary-B, CJK unified ideographsF900
-FDFF
CJK compatibility ideographs, alphabetic presentation forms, Arabic presentation forms-AFE70
-FEFE
Arabic presentation forms-BFF10
-FF19
Fullwidth digitsFF21
-FF3A
Fullwidth Latin uppercaseFF41
-FF5A
Fullwidth Latin lowercaseFF66
-FFDC
Halfwidth Katakana and Hangul
It follows, then, that for Unicode 1.1.5 as corrected above, the Unicode letters and digits are exactly those with codes in the following list, which contains both single codes and inclusive ranges:
0030-0039, 0041-005A, 0061-007A, 00C0-00D6, 00D8-00F6, 00F8-01F5, 01FA-0217, 0250-02A8, 02B0-02DE, 02E0-02E9, 0300-0345, 0360-0361, 0374-0375, 037A, 037E, 0384-038A, 038C, 038E, 038F-03A1, 03A3-03CE, 03D0-03D6, 03DA-03E2, 03DA, 03DC, 03DE, 03E0, 03E2-03F3, 0401-040C, 040E-044F, 0451-045C, 045E-0486, 0490-04C4, 04C7-04C8, 04CB-04CC, 04D0-04EB, 04EE-04F5, 04F8-04F9, 0531-0556, 0559-055F, 0561-0587, 0589, 05B0-05B9, 05BB-05C3, 05D0-05EA, 05F0-05F4, 060C, 061B, 061F, 0621, 0622-063A, 0640-0652, 0660-066D, 0670-06B7, 06BA-06BE, 06C0-06CE, 06D0-06ED, 06F0-06F9, 0901-0903, 0905-0939, 093C-094D, 0950-0954, 0958-0970, 0981-0983, 0985-098C, 098F-0990, 0993-09A8, 09AA-09B0, 09B2, 09B6-09B9, 09BC, 09BE, 09BF-09C4, 09C7-09C8, 09CB-09CD, 09D7, 09DC-09DD, 09DF-09E3, 09E6-09FA, 0A02, 0A05-0A0A, 0A0F-0A10, 0A13-0A28, 0A2A-0A30, 0A32-0A33, 0A35-0A36, 0A38-0A39, 0A3C, 0A3E, 0A3F-0A42, 0A47-0A48, 0A4B-0A4D, 0A59-0A5C, 0A5E, 0A66-0A74, 0A81-0A83, 0A85-0A8B, 0A8D, 0A8F, 0A90-0A91, 0A93-0AA8, 0AAA-0AB0, 0AB2-0AB3, 0AB5-0AB9, 0ABC-0AC5, 0AC7-0AC9, 0ACB-0ACD, 0AD0, 0AE0, 0AE6-0AEF, 0B01-0B03, 0B05-0B0C, 0B0F-0B10, 0B13-0B28, 0B2A-0B30, 0B32-0B33, 0B36-0B39, 0B3C-0B43, 0B47-0B48, 0B4B-0B4D, 0B56-0B57, 0B5C-0B5D, 0B5F-0B61, 0B66-0B70, 0B82-0B83, 0B85-0B8A, 0B8E-0B90, 0B92-0B95, 0B99-0B9A, 0B9C, 0B9E, 0B9F, 0BA3-0BA4, 0BA8-0BAA, 0BAE-0BB5, 0BB7-0BB9, 0BBE-0BC2, 0BC6-0BC8, 0BCA-0BCD, 0BD7, 0BE7-0BF2, 0C01-0C03, 0C05-0C0C, 0C0E-0C10, 0C12-0C28, 0C2A-0C33, 0C35-0C39, 0C3E-0C44, 0C46-0C48, 0C4A-0C4D, 0C55-0C56, 0C60-0C61, 0C66-0C6F, 0C82-0C83, 0C85-0C8C, 0C8E-0C90, 0C92-0CA8, 0CAA-0CB3, 0CB5-0CB9, 0CBE-0CC4, 0CC6-0CC8, 0CCA-0CCD, 0CD5-0CD6, 0CDE, 0CE0, 0CE1, 0CE6-0CEF, 0D02-0D03, 0D05-0D0C, 0D0E-0D10, 0D12-0D28, 0D2A-0D39, 0D3E-0D43, 0D46-0D48, 0D4A-0D4D, 0D57, 0D60-0D61, 0D66-0D6F, 0E01-0E3A, 0E3F-0E5B, 0E81-0E82, 0E84, 0E87-0E88, 0E8A, 0E8D, 0E94-0E97, 0E99-0E9F, 0EA1-0EA3, 0EA5, 0EA7, 0EAA-0EAB, 0EAD-0EB9, 0EBB-0EBD, 0EC0-0EC4, 0EC6, 0EC8, 0EC9-0ECD, 0ED0-0ED9, 0EDC-0EDD, 10A0-10C5, 10D0-10F6, 10FB, 1100-1159, 115F-11A2, 11A8-11F9, 1E00-1E9A, 1EA0-1EF9, 1F00-1F15, 1F18-1F1D, 1F20-1F45, 1F48-1F4D, 1F50-1F57, 1F59, 1F5B, 1F5D, 1F5F-1F7D, 1F80-1FB4, 1FB6-1FC4, 1FC6-1FD3, 1FD6-1FDB, 1FDD-1FEF, 1FF2-1FF4, 1FF6-1FFE, 3041-3094, 3099-309E, 30A1-30FE, 3105-312C, 3131-318E, 3190-319F, 3200-321C, 3220-3243, 3260-327B, 327F-32B0, 32C0-32CB, 32D0-32FE, 3300-3376, 337B-33DD, 33E0-33FE, 3400-9FA5, F900-FA2D, FB00-FB06, FB13-FB17, FB1E-FB36, FB38-FB3C, FB3E, FB40, FB41, FB43, FB44, FB46, FB47-FBB1, FBD3-FD3F, FD50-FD8F, FD92-FDC7, FDF0-FDFB, FE70-FE72, FE74, FE76, FE77-FEFC, FF10-FF19, FF21-FF3A, FF41-FF5A, FF66-FFBE, FFC2-FFC7, FFCA-FFCF, FFD2-FFD7, FFDA-FFDC.
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.17 public static boolean isJavaLetter(char ch)
The result is true if and only if the character argument is a character that can begin a Java identifier.
A character is considered to be a Java letter if and only if it is a letter (§20.5.15) or is the dollar sign character '$'
(\u0024
) or the underscore ("low line") character '_'
(\u005F
).
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.18 public static boolean isJavaLetterOrDigit(char ch)
The result is true if and only if the character argument is a character that can occur in a Java identifier after the first character.
A character is considered to be a Java letter-or-digit if and only if it is a letter-or-digit (§20.5.16) or is the dollar sign character '$'
(\u0024
) or the underscore ("low line") character '_'
(\u005F
).
[This method is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5.]
20.5.19 public static boolean isSpace(char ch)
The result is true
if the argument ch
is one of the following characters:
'
\t'
\u0009 HT HORIZONTAL TABULATION'
\n'
\u000A LF LINE FEED (also known asNEW LINE
)'
\f'
\u000C FF FORM FEED'
\r'
\u000D CR CARRIAGE RETURN'
'
\u0020 SP SPACE
Otherwise, the result is false
.
20.5.20 public static char toLowerCase(char ch)
If the character ch
has a lowercase equivalent specified in the Unicode attribute
table, then that lowercase equivalent character is returned. Otherwise, the argument ch
is returned.
The lowercase equivalents specified in the Unicode attribute table, for Unicode 1.1.5 as corrected above, are as follows, where character codes to the right of arrows are the lowercase equivalents of character codes to the left of arrows: 0041-005A0061-007A, 00C0-00D600E0-00F6, 00D8-00DE00F8-00FE, 0100-012E0101-012F (evens to odds), 0132-01360133-0137 (evens to odds), 0139-0147013A-0148 (odds to evens), 014A-0176014B-0177 (evens to odds), 017800FF, 0179-017D017A-017E (odds to evens), 01810253, 01820183, 01840185, 01860254, 01870188, 018A0257, 018B018C, 018E0258, 018F0259, 0190025B, 01910192, 01930260, 01940263, 01960269, 01970268, 01980199, 019C026F, 019D0272, 01A0-01A401A1-01A5 (evens to odds), 01A701A8, 01A90283, 01AC01AD, 01AE0288, 01AF01B0, 01B1028A, 01B2028B, 01B301B4, 01B501B6, 01B70292, 01B801B9, 01BC01BD, 01C401C6, 01C501C6, 01C701C9, 01C801C9, 01CA01CC, 01CB-01DB01CC-01DC (odds to evens), 01DE-01EE01DF-01EF (evens to odds), 01F101F3, 01F201F3, 01F401F5, 01FA-021601FB-0217 (evens to odds), 038603AC, 0388-038A03AD-03AF, 038C03CC, 038E03CD, 038F03CE, 0391-03A103B1-03C1, 03A3-03AB03C3-03CB, 03E2-03EE03E3-03EF (evens to odds), 0401-040C0451-045C, 040E045E, 040F045F, 0410-042F0430-044F, 0460-04800461-0481 (evens to odds), 0490-04BE0491-04BF (evens to odds), 04C104C2, 04C304C4, 04C704C8, 04CB04CC, 04D0-04EA04D1-04EB (evens to odds), 04EE-04F404EF-04F5 (evens to odds), 04F804F9, 0531-05560561-0586, 10A0-10C510D0-10F5, 1E00-1E941E01-1E95 (evens to odds), 1EA0-1EF81EA1-1EF9 (evens to odds), 1F08-1F0F1F00-1F07, 1F18-1F1D1F10-1F15, 1F28-1F2F1F20-1F27, 1F38-1F3F1F30-1F37, 1F48-1F4D1F40-1F45, 1F591F51, 1F5B1F53, 1F5D1F55, 1F5F1F57, 1F68-1F6F1F60-1F67, 1F88-1F8F1F80-1F87, 1F98-1F9F1F90-1F97, 1FA8-1FAF1FA0-1FA7, 1FB81FB0, 1FB91FB1, 1FBA1F70, 1FBB1F71, 1FBC1FB3, 1FC8-1FCB1F72-1F75, 1FCC1FC3, 1FD81FD0, 1FD91FD1, 1FDA1F76, 1FDB1F77, 1FE81FE0, 1FE91FE1, 1FEA1F7A, 1FEB1F7B, 1FEC1FE5, 1FF81F78, 1FF91F79, 1FFA1F7C, 1FFB1F7D, 1FFC1FF3, 2160-216F2170-217F, 24B6-24CF24D0-24E9, FF21-FF3AFF41-FF5A.
Note that the method isLowerCase
(§20.5.11) will not necessarily return true
when given the result of the toLowerCase
method.
[This specification for the method toLowerCase
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns its argument for all arguments larger than \u00FF
.]
20.5.21 public static char toUpperCase(char ch)
If the character ch
has an uppercase equivalent specified in the Unicode attribute
table, then that uppercase equivalent character is returned. Otherwise, the argument ch
is returned.
The uppercase equivalents specified in the Unicode attribute table for Unicode 1.1.5 as corrected above, are as follows, where character codes to the right of arrows are the uppercase equivalents of character codes to the left of arrows:
0061-007A0041-005A, 00E0-00F600C0-00D6, 00F8-00FE00D8-00DE, 00FF0178, 0101-012F0100-012E (odds to evens), 0133-01370132-0136 (odds to evens), 013A-01480139-0147 (evens to odds), 014B-0177014A-0176 (odds to evens), 017A-017E0179-017D (evens to odds), 017F0053, 0183-01850182-0184 (odds to evens), 01880187, 018C018B, 01920191, 01990198, 01A1-01A501A0-01A4 (odds to evens), 01A801A7, 01AD01AC, 01B001AF, 01B401B3, 01B601B5, 01B901B8, 01BD01BC, 01C501C4, 01C601C4, 01C801C7, 01C901C7, 01CB01CA, 01CC01CA, 01CE-01DC01CD-01DB (evens to odds), 01DF-01EF01DE-01EE (odds to evens), 01F201F1, 01F301F1, 01F501F4, 01FB-021701FA-0216 (odds to evens), 02530181, 02540186, 0257018A, 0258018E, 0259018F, 025B0190, 02600193, 02630194, 02680197, 02690196, 026F019C, 0272019D, 028301A9, 028801AE, 028A01B1, 028B01B2, 029201B7, 03AC0386, 03AD-03AF0388-038A, 03B1-03C10391-03A1, 03C203A3, 03C3-03CB03A3-03AB, 03CC038C, 03CD038E, 03CE038F, 03D00392, 03D10398, 03D503A6, 03D603A0, 03E3-03EF03E2-03EE (odds to evens), 03F0039A, 03F103A1, 0430-044F0410-042F, 0451-045C0401-040C, 045E040E, 045F040F, 0461-04810460-0480 (odds to evens), 0491-04BF0490-04BE (odds to evens), 04C204C1, 04C404C3, 04C804C7, 04CC04CB, 04D1-04EB04D0-04EA (odds to evens), 04EF-04F504EE-04F4 (odds to evens), 04F904F8, 0561-05860531-0556, 1E01-1E951E00-1E94 (odds to evens), 1EA1-1EF91EA0-1EF8 (odds to evens), 1F00-1F071F08-1F0F, 1F10-1F151F18-1F1D, 1F20-1F271F28-1F2F, 1F30-1F371F38-1F3F, 1F40-1F451F48-1F4D, 1F511F59, 1F531F5B, 1F551F5D, 1F571F5F, 1F60-1F671F68-1F6F, 1F701FBA, 1F711FBB, 1F72-1F751FC8-1FCB, 1F761FDA, 1F771FDB, 1F781FF8, 1F791FF9, 1F7A1FEA, 1F7B1FEB, 1F7C1FFA, 1F7D1FFB, 1F80-1F871F88-1F8F, 1F90-1F971F98-1F9F, 1FA0-1FA71FA8-1FAF, 1FB01FB8, 1FB11FB9, 1FB31FBC, 1FC31FCC, 1FD01FD8, 1FD11FD9, 1FE01FE8, 1FE11FE9, 1FE51FEC, 1FF31FFC, 2170-217F2160-216F, 24D0-24E924B6-24CF, FF41-FF5AFF21-FF3A.
Note that the method isUpperCase
(§20.5.12) will not necessarily return true
when given the result of the toUpperCase
method.
[This specification for the method toUpperCase
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns its argument for all arguments larger than \u00FE
. Note that although \u00FF
is a lowercase character, its uppercase equivalent is \u0178
; toUpperCase
in versions of Java prior to version 1.1 simply do not consistently handle or use Unicode character codes above \u00FF
.]
20.5.22 public static char toTitleCase(char ch)
If the character ch
has a titlecase equivalent specified in the Unicode attribute
table, then that titlecase equivalent character is returned; otherwise, the argument
ch
is returned.
Note that the method isTitleCase
(§20.5.13) will not necessarily return true
when given the result of the toTitleCase
method. The Unicode attribute table always has the titlecase attribute equal to the uppercase attribute for characters that have uppercase equivalents but no separate titlecase form.
Example: Character.toTitleCase('a')
returns 'A'
Example: Character.toTitleCase('Q')
returns 'Q'
Example: Character.toTitleCase('lj')
returns 'Lj'
where 'lj'
is the Unicode character \u01C9
and 'Lj'
is its titlecase equivalent character \u01C8
.
[This method is scheduled for introduction in Java version 1.1.]
20.5.23 public static int digit(char ch, int radix)
Returns the numeric value of the character ch
considered as a digit in the specified
radix. If the value of radix
is not a valid radix, or the character ch
is not a valid
digit in the specified radix, then -1
is returned.
A radix is valid if and only if its value is not less than Character.MIN_RADIX
(§20.5.3) and not greater than Character.MAX_RADIX
(§20.5.4).
A character is a valid digit if and only if one of the following is true:
isDigit
returns true
for the character, and the decimal digit value of the character, as specified in the Unicode attribute table, is less than the specified radix. In this case, the decimal digit value is returned.
'A'
-'Z'
(\u0041
-\u005A
) and its code is less than radix+'A'-10
. In this case ch-'A'+10
is returned.
'
a'
-'z'
(\u0061
-\u007A
) and its code is less than radix+'a'-10
. In this case ch-'a'+10
is returned.
[This specification for the method digit
is scheduled for introduction in Java version 1.1, either as defined here, or updated for Unicode 2.0; see §20.5. In previous versions of Java, this method returns -1
for all character codes larger than \u00FF
.]
20.5.24 public static char forDigit(int digit, int radix)
Returns a character that represents the given digit in the specified radix. If the
value of radix
is not a valid radix, or the value of digit
is not a valid digit in the
specified radix, the null character '\u0000'
is returned.
A radix is valid if and only if its value is not less than Character.MIN_RADIX
(§20.5.3) and not greater than Character.MAX_RADIX
(§20.5.4).
A digit is valid if and only if it is nonnegative and less than the radix
.
If the digit is less than 10
, then the character value '0'+digit
is returned; otherwise, 'a'+digit-10
is returned. Thus, the digits produced by forDigit
, in increasing order of value, are the ASCII characters:
0123456789abcdefghijklmnopqrstuvwxyz
(these are '\u0030'
through '\u0039'
and '\u0061'
through '\u007a'
). If
uppercase letters are desired, the toUpperCase
method may be called on the
result:
Character.toUpperCase(Character.forDigit(digit, radix))