Wo3 Hen3 Chan2: Design
Individual data sources are abstracted with a set of interfaces found
in net.sourceforge.wohenchan.convert.
The interfaces are:
- ConverterEntryInterface
- ConverterFactoryInterface
- ConverterTableInterface
The driving idea behind these interfaces is that they be as simple as
possible while still generalizing functionality found in all of the
underlying data sources. A secondary concern is that they be designed
in such a way as to make the implementation of the requirements easy
to do.
Proposal, attempt #1
While thinking about task
21232, I would like to resolve the following issues:
- How to allow for external searches such as to sunrain.net
- How to allow for results to come back as they are found, rather
than all at once? For example, if we are in the middle of downloading
or reading from disk a large file (such as Unihan.txt). Currently,
this might be done with some crazy overloading of
ConverterEntryInterface, but this would require knowing how many
results there will be ahead of time
- How to be able to display status and errors for the results of
external searches, while still allowing for the returning of results.
I woud like to propose the following means of helping deal with these
issues:
- all methods in ConverterTableInterface have their return types
changed to ConverterResultsInterface, described below.
- all methods on ConverterTableInterface are given an extra
parameter of type ConverterListener, also described below.
- Deprecate and remove ConverterFactoryInterface.
- require all ConverterTableInterface implementations to implement a
default constructor. This constructor must not block on IO or take a
long time to complete.
- add a method init() to ConverterTableInterface which will be
called in a thread and will do any long-running initialization needed
(downloading files, reading in files, etc.)
- Temporarily, all ConverterFactoryInterface methods are moved into
ConverterTableInterface. These may be deprecated later if it turns
out that we only want to do status through ConverterResultsInterface
Here is a proposed contents for ConverterResultsInterface:
public interface ConverterResultsInterface
{
}
Here is a proposed contents for ConverterListener:
public interface ConverterListener
{
/**
* Called to inform interested parties that something happened
* w.r.t this search.
**/
handleConverterEvent (ConverterEvent evt)
}
public class ConverterEvent
{
public static class EventType
{
/**
* Once this event happens, there should be no more events.
**/
public static EventType SEARCH_DONE;
/**
* this is an update to the percentage progress of the current
* task. Note that, as there may be more than one task involved
* in a search, that a return of 100 here does not mean
* SEARCH_DONE is immediately pending. there may be more tasks.
**/
public static EventType PROGRESS_UPDATE;
/**
* the current task has changed. (e.g. "downloading unihan.txt",
* "searching", "parsing result page", etc.);
**/
public static EventType TASK_CHANGE;
/**
* an entry has been found.
**/
public static EventType ENTRY_FOUND;
/**
* a problem or unusual condition has occured.
**/
public static EventType STATUS_MESSAGE;
}
public ConverterEvent (ConverterEntryInterface result,
String status,
String taskName,
int progress)
{
}
public EventType getEventType()
{
}
public ConverterEntryInterface getEntry()
{
}
public String getStatusMessage()
{
}
}
This is, in essence a union of two streams -- one for
ConverterEntryInterface and one for errors.
There are several possible ways to structure the streams
- separate, blocking. e.g. return a cursor/enumeration with methods
like getNextInfoMessage(), getNextResult(), isDone()
- multiplexed, blocking. e.g. return a cursor/enumeration with
methods like getNextEvent(), isDone()
- nonblocking. e.g. return a cursor/enumeration with methods like
getNextEvent() throws NoEventReadyException, isDone(),
isEventAvaiable()
- callback. e.g. pass in a listener interface which receives events
as things happen.
The reasoning behind picking the callback implementation was:
- Callback requires less threads in some cases. Because the
processing of results is, by default, happening on the
- Callback puts the thread strategy in the hands of the user of the
interface, not the implementor of the interface
- Callback is easier to implement -- no heavyweight cursor objects
containing all of the dictionary traversal logic, no complex
multi-thread conversions from a method which is visiting elements in
the dictionary to a queue-like producer/consumer cursor object.
- Callback does not require polling, in this particular application.
If we returned a cursor like object, we would either need to start up
a bunch of threads just to step through the various cursors returned
from different converter tables,
Unfortunately,
- Callback will be harder to aggregate later on. It might make
sense to try to keep the interface generic enough such that a pull
based query object may be wrapped around the callback interface in
such a way as to make aggregation of converter tables easier in the
future.
- if the ConverterTable is still being initialized, we can't return
anything to track its status. Actually, this isn't true -- if we
consider the initialization to be part of the search procedure, but
just a longer part for the first searches, until it's done... There
might be a way to make a common base class for Unihan and Cedict, if
their strategies for reading files in during init are going to be
similar.
- Callback is not well suited to JProgressBar
In order to try to get around some of the issues brought up above, we
introduce another object:
public class ConverterCursor implements ConverterListener
{
/**
* blocks until there is a new log message available. if we have
* reached the end, throw NoSuchElementException
**/
public ConverterEvent getNextEvent() throws SearchFinishedException
}
This design will have to satisfy several users on both sides:
- Unihan/Cedict (large initialization cost, index in memory). In
this case, initialization is happening in the background no matter
what, in order to satisfy the requirement that any initialization
involving dictionary data start as soon as possible. Initialization
can happen in a thread started by the constructor of the
ConverterTable.
- init() will be implemented in a common abstract base class. it
will construct a new thread object, which it will maintain a
reference to. Another method on the base class will be used to
register listeners passed in to calls to searches with
the call during initialization, currentTask(): during initialization, this would delegate to
the main ConverterTable object. After initialization, this
would not matter too much (since the search should be quick).
But if this assumption is wrong, it will follow the strategy of
Sunrain.
- getPercentProgress(): during initialization, would delegate to
the main ConverterTable object. After initialization, this
shouldn't matter too much but, if it does, will follow the
strategy of Sunrain.
- isDone():
- Sunrain (dynamic lookup, no storage). In this case,
initialization happening as soon as the search method is called is
ideal, in order to minimize the percieved latency.
- Wohenchan (user interface)
Bah. This sucks.
Proposal, attempt #2
A ConverterTableInterface:
- Can receive input. This could be either a bulk populate or an
input from an external source triggered by a particular lookup
invocation.
- Can service lookup requests. Conceptually, a lookup takes as
input a specification for desired items, requests no or some input,
waits for no or some amount of input, and outputs the items matching
the specification, over some course of time.
- Can generate events. These events are associated with particular
lookup requests.
- May cache all or some of the data that passes through it.
Any input task:
- takes some amount of time, perhaps a lot.
- generates some number of events, perhaps zero.
The obvious mapping of this second concept to a class is as follows:
public abstract class InputTask extends Thread
{
public void addConverterListener (ConverterListener listener)
{
}
}
Remove from the above proposal the idea of init()
This design will have to satisfy several users on both sides of the
interface boundary:
- Unihan/Cedict (large initialization cost, index in memory). In
this case, initialization is happening in the background no matter
what, in order to satisfy the requirement that any initialization
involving dictionary data start as soon as possible. Initialization
can happen in a thread started by the constructor of the
ConverterTable.
- Sunrain (dynamic lookup, no storage). In this case,
initialization happening as soon as the search method is called is
ideal, in order to minimize the percieved latency.
- Wohenchan (user interface)