net.sourceforge.wohenchan.encoding
Class GB2312Encoding
java.lang.Object
|
+--net.sourceforge.wohenchan.encoding.GB2312Encoding
- All Implemented Interfaces:
- EncodingInfoInterface
- public class GB2312Encoding
- extends java.lang.Object
- implements EncodingInfoInterface
Information about the GB2312-1980 encoding.
The following is taken from
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
2.2.2: GB 2312-80
This basic (simplified) Chinese character set standard
enumerates 7,445 characters, 6,763 of which are hanzi separated into
two levels. Hanzi in the first level are arranged by reading, and
those in the second level are arranges by radical then total number of
(remaining) strokes. GB 2312-80 is also known as the "Primary Set,"
GB0 (zero), or just GB.
o Row 1: 94 symbols
o Row 2: 72 numerals
o Row 3: 94 full-width GB 1988-89 characters (see Section 2.2.1)
o Row 4: 83 hiragana
o Row 5: 86 katakana
o Row 6: 48 uppercase and lowercase Greek alphabet
o Row 7: 66 uppercase and lowercase Cyrillic (Russian) alphabet
o Row 8: 26 Pinyin and 37 Bopomofo characters
o Row 9: 76 line-drawing elements (09-04 through 09-79)
o Rows 16 through 55: 3,755 hanzi (Level 1 Hanzi; last is 55-89)
o Rows 56 through 87: 3,008 hanzi (Level 2 Hanzi; last is 87-94)
Compare some of the structure with JIS X 0208-1990, and you will find
many similarities, such as:
o Hiragana, katakana, Greek, and Cyrillic characters are in Rows 4, 5,
6, and 7, respectively
o Chinese characters begin at Row 16
o Chinese characters are separated into two levels
o Level 1 arranged by reading
o Level 2 arranged by radical then total number of strokes
The Japanese standard, JIS C 6226-1978, came out in 1978, which means
that it pre-dates GB 2312-80. The above similarities could not be by
coincidence, but rather by design.
Appendix G (pp 318-344) of "Developing International Software
for Windows 95 and Windows NT" by Nadine Kano illustrates the GB 2312-
80 character set standard by EUC code (Microsoft calls this Code Page
936). Code Page 936 incorporates the correction of the hanzi at 79-81,
and the correction of the order of 07-22 and 07-23 (see Section 2.2.3
for more details).
- Version:
- $Name: $ $Date: 2003/06/22 21:51:37 $
- Author:
- $Author: wtanaka $
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CANONICAL
public static final java.lang.String CANONICAL
- See Also:
- Constant Field Values
s_singleton
static final GB2312Encoding s_singleton
GB2312Encoding
private GB2312Encoding()
getInstance
public static GB2312Encoding getInstance()
getCanonicalString
public java.lang.String getCanonicalString()
- Specified by:
getCanonicalString
in interface EncodingInfoInterface