net.sourceforge.wohenchan.convert
Class UnihanConverterTable

java.lang.Object
  |
  +--net.sourceforge.wohenchan.convert.AbstractConverterTable
        |
        +--net.sourceforge.wohenchan.convert.UnihanConverterTable
All Implemented Interfaces:
ConverterTableInterface

public class UnihanConverterTable
extends AbstractConverterTable

Represents a converter based off of the UNIHAN.TXT file available at http://charts.unicode.org/Unihan.html

Version:
$Name: $ $Date: 2003/09/14 08:28:11 $
Author:
$Author: wtanaka $

Field Summary
private static java.lang.String CACHE_RESOURCE
           
private static int COMMENT
           
private static java.lang.String FILE_SEPARATOR
           
private static int FIRST_CHAR
           
private  java.util.Hashtable m_gbToUnicodeCharacter
          This is a Hashtable of Hashtable.
private  java.util.Hashtable m_pinyinToUnicodeCharacters
          This hashtable stores pinyin in the input format used by ConverterTableInterface.lookupByPinyin and maps it into a char[] representing the Unicode characters which correspond to the given pinyin.
private  java.util.Hashtable m_unicodeCharacterToEnglishDefn
          This hashtable maps Character objects for unicode characters into String english definitions.
private  java.util.Hashtable m_unicodeCharacterToPinyin
          This hashtable maps unicode characters into a Vector of String for the pinyin for that unicode character.
private static int NEW_LINE
           
private static int PRE_TAG_VALUE
           
private static byte[] s_hexValue
          Lookup table converting the literal bytes '0' through '9', 'a'-'f' and 'A'-'F' into their hexidecimal values.
private static int SECOND_CHAR
           
private static java.io.File SUCKY_HARDCODED_FILE_LOCATION
           
private static int TAG_NAME
           
private static int TAG_VALUE
           
private static int UNICODE_1
           
private static int UNICODE_2
           
private static int UNICODE_3
           
private static int UNICODE_4
           
 
Fields inherited from class net.sourceforge.wohenchan.convert.AbstractConverterTable
 
Constructor Summary
UnihanConverterTable()
          Construct a UnihanConverterTable
 
Method Summary
(package private)  void addEnglishDefinition(char unicodeChar, java.lang.String englishDefn)
           
(package private)  void addGB2312Lookup(byte b1, byte b2, char unicode)
          Adds a GB2312 lookup.
(package private)  void addPinyinLookup(java.lang.String pinyin, char unicode)
           
private  void generateBig5FoundEntriesFor(ConverterListener listener, byte[] big5, java.lang.Character unicodeChar, java.lang.String english)
           
private  void generateFoundEntriesFor(ConverterListener listener, java.lang.Character unicodeChar, java.lang.String english)
           
protected  void init(ConverterListener listener)
          init method does the intialization of this converter table.
private static java.lang.String normalizeUuSound(java.lang.String pinyin)
           
 void postInitLookup(java.lang.String unicode, LanguageInfoInterface language, ConverterListener listener)
           
 void postInitLookupByEnglishSubstring(java.lang.String str, ConverterListener listener)
          Locates ConverterEntryInterface objects corresponding to a given case insensitive English substring.
 void postInitLookupByPinyin(java.lang.String pinyin, ConverterListener listener)
          Looks up a dictionary entry by pinyin.
 void postInitLookupBySimplifiedChinese(java.lang.String chinese, ConverterListener listener)
          Looks up a converter table entry by simplified chinese.
 void postInitLookupByTraditionalChinese(java.lang.String chinese, ConverterListener listener)
          Looks up a converter table entry by traditional chinese.
 void postInitLookupByUnicode(java.lang.String unicode, ConverterListener listener)
          Looks up a ConverterEntry by Unicode character value.
 java.lang.String toString()
           
private  java.lang.String utf8BytesToString(java.lang.StringBuffer utf8bytes)
           
 
Methods inherited from class net.sourceforge.wohenchan.convert.AbstractConverterTable
bytesToUnicode, fireDone, fireEntryFound, fireProgressChanged, fireStatus, fireTaskChange, initInOtherThread, lookupByEnglishSubstring, lookupByPinyin, lookupBySimplifiedChinese, lookupByTraditionalChinese, unicodeToBytes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CACHE_RESOURCE

private static final java.lang.String CACHE_RESOURCE
See Also:
Constant Field Values

FILE_SEPARATOR

private static final java.lang.String FILE_SEPARATOR

SUCKY_HARDCODED_FILE_LOCATION

private static final java.io.File SUCKY_HARDCODED_FILE_LOCATION

COMMENT

private static final int COMMENT
See Also:
Constant Field Values

NEW_LINE

private static final int NEW_LINE
See Also:
Constant Field Values

FIRST_CHAR

private static final int FIRST_CHAR
See Also:
Constant Field Values

SECOND_CHAR

private static final int SECOND_CHAR
See Also:
Constant Field Values

UNICODE_1

private static final int UNICODE_1
See Also:
Constant Field Values

UNICODE_2

private static final int UNICODE_2
See Also:
Constant Field Values

UNICODE_3

private static final int UNICODE_3
See Also:
Constant Field Values

UNICODE_4

private static final int UNICODE_4
See Also:
Constant Field Values

TAG_NAME

private static final int TAG_NAME
See Also:
Constant Field Values

PRE_TAG_VALUE

private static final int PRE_TAG_VALUE
See Also:
Constant Field Values

TAG_VALUE

private static final int TAG_VALUE
See Also:
Constant Field Values

s_hexValue

private static final byte[] s_hexValue
Lookup table converting the literal bytes '0' through '9', 'a'-'f' and 'A'-'F' into their hexidecimal values.


m_pinyinToUnicodeCharacters

private final java.util.Hashtable m_pinyinToUnicodeCharacters
This hashtable stores pinyin in the input format used by ConverterTableInterface.lookupByPinyin and maps it into a char[] representing the Unicode characters which correspond to the given pinyin.

See Also:
ConverterTableInterface.lookupByPinyin(java.lang.String, net.sourceforge.wohenchan.convert.ConverterListener)

m_unicodeCharacterToPinyin

private final java.util.Hashtable m_unicodeCharacterToPinyin
This hashtable maps unicode characters into a Vector of String for the pinyin for that unicode character.


m_unicodeCharacterToEnglishDefn

private final java.util.Hashtable m_unicodeCharacterToEnglishDefn
This hashtable maps Character objects for unicode characters into String english definitions.


m_gbToUnicodeCharacter

private final java.util.Hashtable m_gbToUnicodeCharacter
This is a Hashtable of Hashtable. The key to the first hashtable is the first byte (as a Byte) of the GB character you are looking up. The key to the second hashtable is the second byte (as a Byte). The value is a Character which represents the unicode character for the given gb.

Constructor Detail

UnihanConverterTable

public UnihanConverterTable()
                     throws java.io.IOException
Construct a UnihanConverterTable

Throws:
java.io.IOException - if either the download or the file read fails.
Method Detail

utf8BytesToString

private java.lang.String utf8BytesToString(java.lang.StringBuffer utf8bytes)

init

protected void init(ConverterListener listener)
             throws java.io.IOException
init method does the intialization of this converter table.

Specified by:
init in class AbstractConverterTable
java.io.IOException

normalizeUuSound

private static java.lang.String normalizeUuSound(java.lang.String pinyin)

postInitLookupByPinyin

public void postInitLookupByPinyin(java.lang.String pinyin,
                                   ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a dictionary entry by pinyin. This won't be called until after the intialization is completed.

Specified by:
postInitLookupByPinyin in class AbstractConverterTable
Parameters:
pinyin - The input string is specified in pinyin. The string may be composed of the letters 'a'-'z' (lower case only), the numbers '1'-'5' and space (' '). The string is composed of at least 1 space-separated pinyin syllables. Syllables are separated with exactly one space. There is no leading or trailing space. Each syllable is composed of at least two letters followed by exactly one digit. The neutral tone is expicitly specified with '5'. u w/ umlaut (as in luu2 (donkey, palm tree), or luu3 (drizzle) is specified with uu.
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupByUnicode

public void postInitLookupByUnicode(java.lang.String unicode,
                                    ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a ConverterEntry by Unicode character value. This won't be called until after the initialization is completed.

Specified by:
postInitLookupByUnicode in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

addPinyinLookup

void addPinyinLookup(java.lang.String pinyin,
                     char unicode)

addEnglishDefinition

void addEnglishDefinition(char unicodeChar,
                          java.lang.String englishDefn)

addGB2312Lookup

void addGB2312Lookup(byte b1,
                     byte b2,
                     char unicode)
Adds a GB2312 lookup.


generateFoundEntriesFor

private void generateFoundEntriesFor(ConverterListener listener,
                                     java.lang.Character unicodeChar,
                                     java.lang.String english)
                              throws AbortSearchException
AbortSearchException

generateBig5FoundEntriesFor

private void generateBig5FoundEntriesFor(ConverterListener listener,
                                         byte[] big5,
                                         java.lang.Character unicodeChar,
                                         java.lang.String english)

postInitLookupBySimplifiedChinese

public void postInitLookupBySimplifiedChinese(java.lang.String chinese,
                                              ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a converter table entry by simplified chinese. This won't be called until after the initialization is completed.

Specified by:
postInitLookupBySimplifiedChinese in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupByTraditionalChinese

public void postInitLookupByTraditionalChinese(java.lang.String chinese,
                                               ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a converter table entry by traditional chinese. This won't be called until after the initialization is completed.

Specified by:
postInitLookupByTraditionalChinese in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookup

public void postInitLookup(java.lang.String unicode,
                           LanguageInfoInterface language,
                           ConverterListener listener)

postInitLookupByEnglishSubstring

public void postInitLookupByEnglishSubstring(java.lang.String str,
                                             ConverterListener listener)
Locates ConverterEntryInterface objects corresponding to a given case insensitive English substring. Currently very very inefficient.

Specified by:
postInitLookupByEnglishSubstring in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object