net.sourceforge.wohenchan.encoding
Class Big5Encoding

java.lang.Object
  |
  +--net.sourceforge.wohenchan.encoding.Big5Encoding
All Implemented Interfaces:
EncodingInfoInterface

public class Big5Encoding
extends java.lang.Object
implements EncodingInfoInterface

Information about the Big5 encoding. The following is taken from ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf

 2.3.1: BIG FIVE

    The Big Five character set is composed of 94 rows of 157
 characters each (the 157 characters of each row are encoded in an
 initial group of 63 codes followed by the remaining 94 codes). The
 following is a break-down of its contents:

 o Row 1: 157 symbols
 o Row 2: 157 symbols
 o Row 3: 94 symbols
 o Rows 4 through 38: 5,401 hanzi (Level 1 Hanzi; last is 38-63)
 o Rows 41 through 89: 7,652 hanzi (Level 2 Hanzi; last is 89-116)

 This forms what I consider to be the basic Big Five set. Actually, two
 of the hanzi in Level 2 are duplicates, so there are actually only
 7,650 unique hanzi in Level 2.
    There are two major extensions to Big Five. The first really
 has no name, and can be considered part of the basic Big Five set as
 specified above. It adds the following characters:

 o Rows 38-39: 4 Japanese iteration marks, 83 hiragana, 86 katakana, 66
   uppercase and lowercase Cyrillic (Russian) alphabet, 10 circled
   digits, and 10 parenthesized digits

    The other extension was developed by a company called ETen
 Information System in Taiwan, and is actually considered to be the
 most widely used version of Big Five. It provides the following
 extensions to Big Five (different from the above extension):

 o Rows 38-40: 10 circled digits, 10 parenthesized digits, 10 lowercase
   Roman numerals, 25 classical radicals, 15 Japanese-specific symbols,
   83 hiragana, 86 katakana, 66 uppercase and lowercase Cyrillic
   (Russian) alphabet, 3 arrows, 10 radical-like hanzi elements, 40
   fraction-like digits, and 7 symbols
 o Row 89: 7 hanzi, 33 double-lined line-drawing elements, and a black
   box

    It is *very* important to note that while these two extensions
 have many common portions (in particular, hiragana, katakana, the
 Cyrillic alphabet, and so on), they do not share the same code points
 for such characters.
    Appendix G (pp 407-450) of "Developing International Software
 for Windows 95 and Windows NT" by Nadine Kano illustrates the Big Five
 character set standard by Big Five code (Microsoft calls this Code
 Page 950). Code Page 950 incorporates some of the ETen extensions,
 namely those in Row 89.
 

Version:
$Name: $ $Date: 2003/06/22 17:44:44 $
Author:
$Author: wtanaka $

Field Summary
static java.lang.String CANONICAL
           
(package private) static Big5Encoding s_singleton
           
 
Fields inherited from interface net.sourceforge.wohenchan.encoding.EncodingInfoInterface
ALL_ENCODINGS
 
Constructor Summary
private Big5Encoding()
           
 
Method Summary
 java.lang.String getCanonicalString()
           
static Big5Encoding getInstance()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CANONICAL

public static final java.lang.String CANONICAL
See Also:
Constant Field Values

s_singleton

static final Big5Encoding s_singleton
Constructor Detail

Big5Encoding

private Big5Encoding()
Method Detail

getInstance

public static Big5Encoding getInstance()

getCanonicalString

public java.lang.String getCanonicalString()
Specified by:
getCanonicalString in interface EncodingInfoInterface