net.sourceforge.wohenchan.convert
Class CedictConverterTable

java.lang.Object
  |
  +--net.sourceforge.wohenchan.convert.AbstractConverterTable
        |
        +--net.sourceforge.wohenchan.convert.CedictConverterTable
All Implemented Interfaces:
ConverterTableInterface
Direct Known Subclasses:
CedictB5ConverterTable, CedictGBConverterTable

public class CedictConverterTable
extends AbstractConverterTable

This class implements the search functions from the ConverterTableInterface. It can search by GB2312, Pinyin, and English. The dictionary entry vector is passed in when construct this class. This class handles the cedict.GB file, and we assume that each line which contains the '[' and ']' is a dictionary entry. If there is a line which contains the '[' and ']' but it's not an entry, then it might cause problem.

Version:
$Name: $ $Date: 2003/09/08 20:56:19 $
Author:
$Author: wtanaka $

Field Summary
private  java.util.Vector cedictEntryVector
           
private  java.io.InputStream cedictFileInput
           
private  char ENG_DEFINITION_SEPARATOR
           
private static java.lang.String FILE_SEPARATOR
           
private  long fileSize
           
private  java.lang.String m_cacheResource
          Name of resource in cache.
private  java.lang.String m_encoding
           
private  java.io.File m_file
          File of source data on local disk.
private  java.lang.String m_readStatusMsg
           
private  java.lang.String m_source
           
private  java.lang.String m_url
          URL for original cedict source data.
private  java.lang.String m_zipentry
           
private  ConverterTableInterface table
           
 
Fields inherited from class net.sourceforge.wohenchan.convert.AbstractConverterTable
 
Constructor Summary
CedictConverterTable(java.lang.String cedictzipurl, java.lang.String zipentry, java.io.File cedictfile, java.lang.String cacheResourceName, java.lang.String encoding, java.lang.String sourceName)
           
 
Method Summary
 void init(ConverterListener listener)
          Initialization method, to be overridden by subclasses.
private  void parseLine(ByteVector linev)
          Parse a line from cedict.GB.
private  boolean pinyinRoughlyMatches(java.lang.String searchString, java.lang.String possibleMatch)
           
private  void postInitLookupByChinese(java.lang.String chinese, ConverterListener listener)
           
 void postInitLookupByEnglishSubstring(java.lang.String str, ConverterListener listener)
          Locates ConverterEntryInterface objects corresponding to a given case insensitive English substring.
 void postInitLookupByPinyin(java.lang.String pinyin, ConverterListener listener)
          Looks up a dictionary entry by pinyin.
 void postInitLookupBySimplifiedChinese(java.lang.String chinese, ConverterListener listener)
          Looks up a converter table entry by simplified chinese.
 void postInitLookupByTraditionalChinese(java.lang.String chinese, ConverterListener listener)
          Looks up a converter table entry by traditional chinese.
 void postInitLookupByUnicode(java.lang.String unicode, ConverterListener listener)
          Looks up a ConverterEntry by Unicode character value.
private  void readInFileByLine(ConverterListener listener)
          Assume that each line in cedict.GB is an entry.
 java.lang.String toString()
           
private  void updateProgress(ConverterListener listener, int current, int max)
           
 
Methods inherited from class net.sourceforge.wohenchan.convert.AbstractConverterTable
bytesToUnicode, fireDone, fireEntryFound, fireProgressChanged, fireStatus, fireTaskChange, initInOtherThread, lookupByEnglishSubstring, lookupByPinyin, lookupBySimplifiedChinese, lookupByTraditionalChinese, unicodeToBytes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ENG_DEFINITION_SEPARATOR

private final char ENG_DEFINITION_SEPARATOR
See Also:
Constant Field Values

FILE_SEPARATOR

private static final java.lang.String FILE_SEPARATOR

m_source

private java.lang.String m_source

m_readStatusMsg

private java.lang.String m_readStatusMsg

m_encoding

private java.lang.String m_encoding

m_cacheResource

private java.lang.String m_cacheResource
Name of resource in cache. Examples: cedict.GB cedict.b5


m_file

private java.io.File m_file
File of source data on local disk. examples: /tmp/cedict.GB /tmp/cedict.b5 C:\WINDOWS\DESKTOP\cedict.GB


m_url

private java.lang.String m_url
URL for original cedict source data. examples: http://www.cs.cmu.edu/~eepeter/cedictgb.zip http://www.cs.cmu.edu/~eepeter/cedictb5.zip


m_zipentry

private java.lang.String m_zipentry

cedictFileInput

private java.io.InputStream cedictFileInput

cedictEntryVector

private java.util.Vector cedictEntryVector

fileSize

private long fileSize

table

private ConverterTableInterface table
Constructor Detail

CedictConverterTable

public CedictConverterTable(java.lang.String cedictzipurl,
                            java.lang.String zipentry,
                            java.io.File cedictfile,
                            java.lang.String cacheResourceName,
                            java.lang.String encoding,
                            java.lang.String sourceName)
Method Detail

init

public void init(ConverterListener listener)
          throws java.io.IOException
Description copied from class: AbstractConverterTable
Initialization method, to be overridden by subclasses.

Specified by:
init in class AbstractConverterTable
java.io.IOException

readInFileByLine

private void readInFileByLine(ConverterListener listener)
                       throws AbortSearchException
Assume that each line in cedict.GB is an entry. Read in cedict.GB byte by byte, and when reach the end of a line, parse the line just read in into a cedictEntry.

AbortSearchException

parseLine

private void parseLine(ByteVector linev)
                throws java.io.UnsupportedEncodingException
Parse a line from cedict.GB. When a Chinese word has multiple English definitions, several entries will be created based on the number of English definitions.

Parameters:
linev - a ByteVector contains a line of cedict.GB
java.io.UnsupportedEncodingException

pinyinRoughlyMatches

private boolean pinyinRoughlyMatches(java.lang.String searchString,
                                     java.lang.String possibleMatch)

updateProgress

private void updateProgress(ConverterListener listener,
                            int current,
                            int max)
                     throws AbortSearchException
AbortSearchException

postInitLookupByPinyin

public void postInitLookupByPinyin(java.lang.String pinyin,
                                   ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a dictionary entry by pinyin. This won't be called until after the intialization is completed.

Specified by:
postInitLookupByPinyin in class AbstractConverterTable
Parameters:
pinyin - The input string is specified in pinyin. The string may be composed of the letters 'a'-'z' (lower case only), the numbers '1'-'5' and space (' '). The string is composed of at least 1 space-separated pinyin syllables. Syllables are separated with exactly one space. There is no leading or trailing space. Each syllable is composed of at least two letters followed by exactly one digit. The neutral tone is expicitly specified with '5'. u w/ umlaut (as in luu2 (donkey, palm tree), or luu3 (drizzle) is specified with uu.
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupByEnglishSubstring

public void postInitLookupByEnglishSubstring(java.lang.String str,
                                             ConverterListener listener)
Description copied from class: AbstractConverterTable
Locates ConverterEntryInterface objects corresponding to a given case insensitive English substring. This won't be called until after the initialization is completed.

Specified by:
postInitLookupByEnglishSubstring in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupByChinese

private void postInitLookupByChinese(java.lang.String chinese,
                                     ConverterListener listener)

postInitLookupByTraditionalChinese

public void postInitLookupByTraditionalChinese(java.lang.String chinese,
                                               ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a converter table entry by traditional chinese. This won't be called until after the initialization is completed.

Specified by:
postInitLookupByTraditionalChinese in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupBySimplifiedChinese

public void postInitLookupBySimplifiedChinese(java.lang.String chinese,
                                              ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a converter table entry by simplified chinese. This won't be called until after the initialization is completed.

Specified by:
postInitLookupBySimplifiedChinese in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

postInitLookupByUnicode

public void postInitLookupByUnicode(java.lang.String unicode,
                                    ConverterListener listener)
Description copied from class: AbstractConverterTable
Looks up a ConverterEntry by Unicode character value. This won't be called until after the initialization is completed.

Specified by:
postInitLookupByUnicode in class AbstractConverterTable
Returns:
a non-null, possibly empty, array of results for this search.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object