CHAPTER 6
IMPLEMENTATION OF THE NEOWORK
A prototype implementation of the NEOWork METEOR2 WFMS has been completed by using the CORBA product from Sun Microsystems called NEO, and has been tested with an example workflow application: Immunization Tracking. The core pieces of the system were implemented in the C++ language under the Solaris UNIX operating environment. User interfaces for the user tasks and the WFMS were written as Java Applets. The primary technical areas of the implementation cover CORBA programming with the use of object services, object-oriented software engineering, Java programming, socket data communication, database transactions, multithreading programming and GUI programming using Java's AWT. The core pieces of the system consist of scheduler factories, scheduler objects, task manager factories, task manager objects, the interface server and the data objects of tasks as well as their recovery framework components.
There are two types of component factories in the NEOWork architecture: the scheduler factories and the task manager factories. They are implemented as registered persistent servers distributed on different hosts to provide the services for creating the run-time component objects for NEOWork.
6.1.1 THE FACTORY MODULES
Figure 6.1: The scheduler factory module
As discussed in chapter 4, the dynamic components of NEOWork, scheduler objects and task manager objects, have lifecycles. In order to take advantage of the lifecycle object service provided by NEO, the interface definitions of these components are inherited from the lifecycle object interface (see figure 6.1). The lifecycle object interface contains four virtual functions that manage CORBA objects’ lifecycle: create, activate, deactivate and remove. The create function creates a component object by invoking the constructor of the component class. After a new component object is created, the object reference is stored in an ORB database. Activate and deactivate functions allow a timeout mechanism for users to optimize the time a component object can exist in system memory. The deactivate function invalidates the reference for a CORBA object.
The interface for recovery contains the watchme and watch functions. The watchme is the method that is used by the watchdog of the NEOWork system to synchronize the run-time of the factories, catch run-time exceptions and perform necessary recovery in case of failures. As discussed in the previous chapter, each of the factory servers also synchronizes the watchdog run-time and performs the necessary recovery. The method watch provides the operation to recover the watchdog in case it fails.
6.1.2 SERVICE REGISTRATION AND STARTUP
The registration process for CORBA object servers of NEO contains the registration of the service interfaces with the Interface Repository (IFR) and the registration of the service name with the Naming Service. The registration process is a manual process through the make register ODF utility. During the registration, an instance of each "service" object is created and registered with the Naming Service. The Naming Service of NEO is a persistence mechanism that maintains a list of registered services referenced by names. Each factory service implementation should have a service = "name" statement in its implementation definition.
The ORB of a NEO system automates the server startup process. When a service request from a client comes in, the ORB locates and finds the server, and starts up the server if it is not in the active state. A static member function can be declared as a startup hook on the server class. The ORB during the startup process would invoke the function. In the implementation of NEOWork, we can define the recovery process in the startup hook function to perform failure recovery for the component factories.
6.1.3 THE RECOVERY PROCESS
Component factory servers in NEOWork are subject to failures due to various reasons, such as, hardware failures, system crash, etc. The startup process for a failure recovery should be different from a normal startup. The Watchdog is responsible for detecting the failures of factory servers and invoking the startup process in a recovery mode. The Watchdog maintains a list of component factory names. The following procedures are performed during a server startup in a recovery mode:
In NEOWork, workflow schedulers are CORBA objects created by a scheduler factory to perform scheduling services for a particular workflow instance. Three major functions are defined in the of the scheduler interface: schedule(), recover() and activate(). The "schedule" function defines the scheduling process of the scheduler’s engine. The "recover" function is used for the recover manager (implemented as a thread) that performs failure recovery for the scheduler object. The "activate" is a static member function for a task agent (implemented as a thread) to synchronize the run-
time of a task manager and performs recovery in case of task manager failure.
Figure 6.2: Algorithm of the scheduler engine
6.2.1 THE SCHEDULER ENGINE
The role of a scheduler engine is to evaluate the inter-task dependencies of a workflow and invoke a task manager when the activation precondition evaluates to true. Figure 6.2 above shows the pseudo code of a scheduler engine.
6.2.2 RECOVERY FOR SCHEDULER
Recovery of a scheduler is vital to the recovery framework of NEOWork. During the creation of a scheduler object, a recovery manager (implemented as a thread) is also created by the component factory to watch the execution of the scheduler and perform recovery in case the scheduler fails. Failures of a scheduler can be caught by the synchronized try-catch block. The recovery process of a scheduler object includes the ability to restore the scheduling data from persistent storage, recreating the workflow engine and reconstructing task agents for the currently executing task managers.
When a scheduler is being recovered, task managers cannot communication with the scheduler to update task states. Therefore, forward recovery for the scheduler is also needed to get task states from task managers and update the task states to the scheduler’s control data.
6.3 IMPLEMENTATION OF TASK MANAGERS OBJECT
Three types of task managers are implemented in NEOWork: NonTranTM, TranTM and UserTM as well as an abstract base class called BaseTM. Class BaseTM contains pure virtual functions that support a common interface for task managers – {Initialize(), Run(), Done(), Fail()} for task manager run-time, {Save(), Load()} for data logging, and {Recover()} for recovery manager. The NonTranTM class and TranTM class are inherited from class BaseTM and implement the virtual functions. The UserTM class is further inherited from the NonTransTM class to support user tasks with GUI interfaces.
6.3.1 A STATE-SWITCH APPROACH
![]()
Figure 6.3: A state-switch approach for TM run-time
6.3.2 EXCEPTION HANDLING AND RECOVERY
The run-time exceptions of task managers contain system exceptions and user-defined exceptions. An example of the system exceptions is the failure of binding to a CORBA object generated by the ORB. NEOWork implements two methods to handle system exceptions: retry and alternate-try. Retry is the method to try the same operation in a number of times until the operation succeeds. Alternate-try is the way to execute another equivalent task specified at the workflow designed to achieve the same result. The following is a fragment of codes to depict the retry and alternate-try:
for (int Iterator = 0; Iterator < NumOfTries; Iterator++)
{
try {
ODF_find(tm32, "TaskManager32");
…
}
catch (ODF::Service::Exception& exc
{
if(exc.code==CORBA::Exception:FAIL_COMM)
{
if(ErrorHandle==METEOR::ALTERNATE_TRY)
Alternate(…);
else
// Go back and retry
}
…
}
}
If the system exception cannot be handled by the retry, a recovery process needs to be carried out to restore the task manager. Recovery manager of the task manager performs the recovery process. The recovery process has been discussed in the previous chapter. Some system exceptions are severe errors that cannot be resolved by the automated recovery process. In such cases, human assisted recovery is required to bring the task manager back to a consistent state.
6.3.3 DATA OBJECTS
In NEOWork, a data object of a task is defined using DDL and associated with its task manager so that the data object can be made persistent. During workflow execution, the task manager logs the data object and state changes to the log for the task. In case of failure, the task manager can retrieve the data object from the PSM and restore the task. Delivery of data objects among task managers is achieved by passing object references. Input data used to initialize a task is done in the task manager’s Initialization() call.

Figure 6.4: Socket data communication
6.4 GUI AND THE INTERFACE SERVER
In NEOWork, the GUI for user tasks is implemented in Java Applets. A common layout of user tasks is a group of worklists and each worklist contains the name of each user task waiting to execute. Worklists are object instances of the java.awt.List class. UserTask class is the base class defined in Java to represents a task. The class defines public data members to identify a task, such as WorkflowID, WorkflowName, ServerName, ServerPort, TaskName, TaskType, etc. A Connection object runs as a thread dedicated to receive tasks and task data through socket communications and constructs task objects. When a user clicks on a task on the list, a popup Frame which contains the specific task data will be displayed and the user can start executing the task. A user can select multiple tasks at the same time. Data concurrency is controlled by the task managers through the Object Locking service. The interface server is the media to facilitate the socket data communication. Figure 6.4 outlines how the interface server communicates with GUI clients.