net.sourceforge.wohenchan.encoding
Class EncodingGuesser

java.lang.Object
  |
  +--net.sourceforge.wohenchan.encoding.EncodingGuesser

public class EncodingGuesser
extends java.lang.Object

Version:
$Name: $ $Date: 2003/06/22 17:40:26 $
Author:
$Author: wtanaka $

Field Summary
private  java.lang.String ENUS
           
private  java.lang.String PINYIN
           
private  java.lang.String[] pinyinArray
           
private  java.lang.String UPLUS
           
private  java.lang.String ZHCN
           
private  java.lang.String ZHTW
           
 
Constructor Summary
EncodingGuesser()
           
 
Method Summary
 void addRecognizedEncoding(EncodingInfoInterface encodingInfo)
          Adds the given encodingInfo as a recognized encoding for this Guesser.
private  int gbProb(byte[] inputString)
           
 java.lang.String[] guessEncodings(byte[] input)
          This method returns the likely encodings for a given string.
private  boolean lookForPinyin(java.lang.String in)
           
static void main(java.lang.String[] args)
           
private  int pinyinProb(byte[] inputString)
           
private  int utf8Prob(byte[] inputString)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENUS

private final java.lang.String ENUS
See Also:
Constant Field Values

ZHCN

private final java.lang.String ZHCN
See Also:
Constant Field Values

ZHTW

private final java.lang.String ZHTW
See Also:
Constant Field Values

PINYIN

private final java.lang.String PINYIN
See Also:
Constant Field Values

UPLUS

private final java.lang.String UPLUS
See Also:
Constant Field Values

pinyinArray

private final java.lang.String[] pinyinArray
Constructor Detail

EncodingGuesser

public EncodingGuesser()
Method Detail

addRecognizedEncoding

public void addRecognizedEncoding(EncodingInfoInterface encodingInfo)
Adds the given encodingInfo as a recognized encoding for this Guesser.


guessEncodings

public java.lang.String[] guessEncodings(byte[] input)
                                  throws NoLikelyEncodingException
This method returns the likely encodings for a given string. It might be a good idea to optimize for the case that the byte array passed in has, as a prefix, the byte array passed into the last call to this method. Should this return an EncodingInfoInterface[] instead?

Returns:
WHERE (self.length > 0) Now, just return the Encoding name, which has the most likelihood. If we want we can change it to an array, return all the encodings.
Throws:
NoLikelyEncodingException - if there are no likely encodings for the given input.

gbProb

private int gbProb(byte[] inputString)

utf8Prob

private int utf8Prob(byte[] inputString)

pinyinProb

private int pinyinProb(byte[] inputString)

lookForPinyin

private boolean lookForPinyin(java.lang.String in)

main

public static void main(java.lang.String[] args)