net.sourceforge.wohenchan.encoding
Class Big5Encoding
java.lang.Object
|
+--net.sourceforge.wohenchan.encoding.Big5Encoding
- All Implemented Interfaces:
- EncodingInfoInterface
- public class Big5Encoding
- extends java.lang.Object
- implements EncodingInfoInterface
Information about the Big5 encoding.
The following is taken from
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
2.3.1: BIG FIVE
The Big Five character set is composed of 94 rows of 157
characters each (the 157 characters of each row are encoded in an
initial group of 63 codes followed by the remaining 94 codes). The
following is a break-down of its contents:
o Row 1: 157 symbols
o Row 2: 157 symbols
o Row 3: 94 symbols
o Rows 4 through 38: 5,401 hanzi (Level 1 Hanzi; last is 38-63)
o Rows 41 through 89: 7,652 hanzi (Level 2 Hanzi; last is 89-116)
This forms what I consider to be the basic Big Five set. Actually, two
of the hanzi in Level 2 are duplicates, so there are actually only
7,650 unique hanzi in Level 2.
There are two major extensions to Big Five. The first really
has no name, and can be considered part of the basic Big Five set as
specified above. It adds the following characters:
o Rows 38-39: 4 Japanese iteration marks, 83 hiragana, 86 katakana, 66
uppercase and lowercase Cyrillic (Russian) alphabet, 10 circled
digits, and 10 parenthesized digits
The other extension was developed by a company called ETen
Information System in Taiwan, and is actually considered to be the
most widely used version of Big Five. It provides the following
extensions to Big Five (different from the above extension):
o Rows 38-40: 10 circled digits, 10 parenthesized digits, 10 lowercase
Roman numerals, 25 classical radicals, 15 Japanese-specific symbols,
83 hiragana, 86 katakana, 66 uppercase and lowercase Cyrillic
(Russian) alphabet, 3 arrows, 10 radical-like hanzi elements, 40
fraction-like digits, and 7 symbols
o Row 89: 7 hanzi, 33 double-lined line-drawing elements, and a black
box
It is *very* important to note that while these two extensions
have many common portions (in particular, hiragana, katakana, the
Cyrillic alphabet, and so on), they do not share the same code points
for such characters.
Appendix G (pp 407-450) of "Developing International Software
for Windows 95 and Windows NT" by Nadine Kano illustrates the Big Five
character set standard by Big Five code (Microsoft calls this Code
Page 950). Code Page 950 incorporates some of the ETen extensions,
namely those in Row 89.
- Version:
- $Name: $ $Date: 2003/06/22 17:44:44 $
- Author:
- $Author: wtanaka $
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CANONICAL
public static final java.lang.String CANONICAL
- See Also:
- Constant Field Values
s_singleton
static final Big5Encoding s_singleton
Big5Encoding
private Big5Encoding()
getInstance
public static Big5Encoding getInstance()
getCanonicalString
public java.lang.String getCanonicalString()
- Specified by:
getCanonicalString
in interface EncodingInfoInterface