CHAPTER 2
WORKFLOW INFRASTRUCTURE
This chapter offers a brief discussion on recent technologies available to build the infrastructure for workflow management systems. With the advent of autonomous computers that are networked together in workplaces, the role of computers in business computing can be transformed from a passive to an active role by applying the workflow technology. To support large intra- and inter-enterprise workflow systems over a variety of distributed, heterogeneous, and autonomous computing environments, a well-defined and fully-functional communication infrastructure providing interoperability between distributed objects, integration of software applications and a secure communication model is needed. Open middle-ware technology such as, CORBA, OLE, OpenDoc or DCE, which facilitate interoperability for distributed and heterogeneous objects, software applications and computing environment, satisfy the requirements for workflow systems at different levels. With the rapidly growing popularity of the Internet and Electronic Commerce, the Web-based technology offers a simple and flexible way to connect and integrate distributed computing environment and audience. Therefore, the Web-based technology will play an important role for communication infrastructure of workflow systems in the near future. In the following sections, we discuss the use of OMG (Object Management Group)’s CORBA technology and its integration with Web-based technology such as CGI scripts and Java Applets, as the communication infrastructure for WFMSs. The technical issues discussed in this chapter are fundamental to our system architecture design in the following chapters.
2.1 CORBA TECHNOLOGY
The Common Object Request Broker Architecture (CORBA) was developed out of the need for interoperable solutions that work across distributed and heterogeneous hardware and software platforms [OMG93, OMG95ab]. CORBA is being standardized by the OMG, whose membership includes over 700 hardware and software vendors. The CORBA architecture, and particularly its Version 2.0, promotes interoperability to a hitherto unprecedented level: it promotes independence in hardware architecture, language, and location. For instance, by complying with CORBA, software services can be written in any language (C, C++, JAVA, Ada, or even FORTRAN), run on any machine (SUN SPARC, Silicon Graphics, PC), use any operating system (Windows NT, UNIX), and be accessed by client software, which could in turn be written in any language. CORBA provides mechanisms by which objects within a distributed and heterogeneous environment can communicate with each other through a uniform interface. An interface is a description of a set of possible operations that a client may request of an object. The interface is defined by Interface Definition Language (IDL). CORBA also automates many common network programming tasks, such as, object registration, location, activation, request demultiplexing, framing and error-handling, parameter marshalling and demarshalling, and operation dispatching, etc., making it an excellent communication infrastructure for WFMSs.
2.1.1 BASIC CONCEPTS
CORBA 1.0 was the first version of the CORBA architecture adopted by OMG in 1991. The latest update of the CORBA specification was in May 1995 when the OMG released CORBA 2.0 [OMG95a]. Let’s look at some of the basic concepts of CORBA:
2.1.2 THE CORBA ARCHITECTURE
The job of the ORB is to simply provide the communication and activation infrastructure for distributed object applications. Figure 2.1 illustrates the primary components in the CORBA architecture [OMG95b].

Figure 2.1: The CORBA architecture
As mentioned above, the ORB delivers requests for clients to server objects and returns any responses to the clients making the requests. The key feature of the ORB is the transparency of how it facilitates client/server object communication [Vin97]. It can be summarized as follows:
2.1.3 ADVANCED FEATURES IN NEO
DOE (Distributed Object Everywhere) is the CORBA 1.2-compliant product developed by SUN Microsystems in June 1995. The CORBA 2.0-Compliant product (later called NEO) was deployed in the winter of 1995. The NEOWork system described in this thesis was implemented completely in the NEO environment. The following discussion will focus on the programming environment of NEO. Some advanced features of the product that are different from other CORBA products, such as ORBIX and VisiBroker (formerly ORBeline), will be addressed as well.
A NEO application means one or more user programs and ORB objects that work together to support a functional need. Developing a NEO application involve specifying the common interface shared by client applications and server objects using standard IDL, and developing the end-user client programs and server ORB objects that provide functionality for clients. Figure 2.2 shows the development processes for creating a CORBA client and a server object in NEO [Sun95].

Figure 2.2: Procedures to develop NEO client and server
One of the features of the NEO programming environment is the automatic generation of Makefiles. The NEO programming environment provides an "imake" preprocessor called odfimake that takes an "imake" file as input and generates a platform-independent Makefile for compiling source programs. Developers only need to write very simple "imake" files instead of large Makefiles. This feature can greatly speed up the application development process since writing platform-independent Makefiles is generally a tedious job for programmers.
Object Development Framework (ODF) is another feature of NEO that makes developing ORB objects easy [Sun95]. ODF contains components that:
Implementing ORB objects and using the objects in client applications is made easy in NEO. Unlike other CORBA products which provide very limited support for development tools, NEO automates many development processes for CORBA users. Using the ORB object called "Queue" as an example, Table 3.1 summarizes the development procedures to implement and register the object "Queue".
|
Steps |
File |
Code Fragments/Command |
|
|
1. |
Define the object’s interface; |
Queue.idl |
interface Queue { void enqueue (in long item); …. } |
|
2. |
Define the object’s persistent data; (optional) |
Queue.ddl |
module QueueDDL { interface QueueData{ attribute sequence<long> queue; …. } } |
|
3. |
Define the object’s implementation characteristic; |
Queue.impl |
implementation QueueImpl:Queue { persistence = QueueDDL::QueueData; creator new_object(); service = "Queue"; } |
|
4. |
Create the object’s Imakefile to generate Makefile; |
imake |
odfimake |
|
5. |
Edit the object’s skeleton file generated by Makefile; |
QueueImpl.cc |
QueueImpl::enqueue (const long item){ queue.add(item); …. } |
|
6. |
Compile the implementation files along with libQueue.so generated by IDL compiler to produce the Server Object; |
QueueServer (Executable) |
make |
|
7. |
Register the server object with ORB. |
make register |
Table 2.1: Setting up a CORBA service in NEO
After compiling the Queue.idl interface file, the following three files are also generated by the IDL compiler to form the client stubs and to compile with client source files to generate client applications:
To use an ORB object, the client usually issues a bind() call to the ORB; and the ORB will finds the object either on the local machine or a remote machine based on the object skeleton, and returns the object reference to the client. Many CORBA products commonly use this method. But in NEO’s environment, ORB object references are kept in a repository called naming context distributed on different NEO server machines. During the object registration phase, object names and their object references are installed in the naming space of the local NEO server. The object service called Naming Service can be used by a client to find the object reference at run-time based on the ORB object name and location (if specified). Table 3.2 gives an example to find the Queue object in a client program using the naming service.
|
Explanation |
Code Fragments |
|
|
1. |
Declare the object reference variable; |
ODF_ObjRef<Queue> queue; |
|
2. |
Obtain the SimpleCurrency object reference by look up naming space; |
ODF_find(queue,"Queue"); |
|
3. |
Request the object service. |
queue->enqueue(100); |
Table 2.2: Using a registered CORBA server
To deal with creation and registration of run-time CORBA objects, NEO’s ODF provides facilities to support a special type of CORBA object called factory object. A factory object is an object that can be used to create new CORBA objects of a specific type for clients at run-time. Typically, clients request new object instances by invoking a create method of the factory object. The factory object itself is a registered object in the naming space that can be found by client programs using a naming look-up. After the client receives the factory’s object reference, the creation method can then be invoked to create a new CORBA object of a certain type. CORBA objects are defined and implemented in the factory object. This feature is very useful in the implementation of WFMSs because task managers need to be created as CORBA objects according to the scheduling of a workflow during execution. Other system components (like schedulers) may be created at the run-time as well. Chapter 4 has a detailed discussion on the design of NEOWork.
2.1.4 OBJECT SERVICES IN NEO
Object services are domain-independent interfaces that are used by CORBA applications (either client or server) to extend the functionality and interoperability of ORB objects. They are packaged as ORB objects with IDL-specified interfaces. Using object services can greatly extend the ability of CORBA and help to build robust WFMSs as well as other distributed systems. The following is a list of object services provided in NEO Beta and have been used to implement the workflow management system described in this thesis:
module QueueDDL {
interface QueueData{
attribute sequence<long> queue;
….
}
}
So the persistence of the data member "queue" in the object is managed through the object’s lifetime by the persistence service. States of ORB objects are saved in the PSM periodically by the service. Automatic atomic updates to the PSM occur at timed intervals, by default, every twenty seconds [Sun95]. The persistence declaration in the implementation definition file is the control on how frequently updates to the PSM occur and can be changed to fit different needs. Because a completed IDL operation call is considered an atomic unit of change, once a method completes, all changes to DDL declared data as the result of the method call will be written to disk as part of the next update. The Persistent Object Service is an important object service that supports persistent object states in NEO. It has been used as a primary logging mechanism in NEOWork to store and maintain the control data for workflow schedulers and the state information for each component of the workflow system, to facilitate workflow recovery at different levels.

Figure 2.3: Object persistency with the PSM
Property Service and Relationship Service are also available in NEO. NEO will support the Query Service, Externalization Service, and Transactional Service in the near future. Table 3.3 summarizes the object services discussed in this section.
Object Service Name |
Brief Description |
NEO supported |
Used in WFMS |
Naming Service |
Permits object references to be retrieved through associations between names and objects, and for those associations to be created and destroyed. |
yes |
yes |
Persistent Object |
Provides common interfaces to support the persistence of an object's state when the object is not active in memory and between application executions. |
yes |
yes |
Concurrency Control |
Provides interfaces to acquire and release locks that let multiple clients coordinate their access to share objects in the distributed environment. |
yes |
yes |
Life Cycle
|
Provides operations to support creation, copying, moving, and destruction of objects. |
yes |
yes |
Event Channel |
Supports the notification of interested parties when program-defined events occur. |
yes |
yes |
Property Service |
Provides operations to attach attributes at run-time to an ORB object. |
yes |
no |
Relationship Service |
Provides operations for creating, deleting, navigating, and managing relationships between objects. |
yes |
no |
Transaction Service |
Provides support for ensuring that a computation consisting of one or more operations on one or more objects satisfies the requirements of atomicity, isolation and durability. |
no |
no |
Query Service |
Supports operations on sets and collections of objects that have a predicate- based, declarative specification and may result in sets or collections of objects. |
no |
no |
Externalization Service |
Supports the conversion of object state to a form that can be transmitted between systems by a means other than a request broker . |
no |
no |
Table 2.3: A summary of object services
2.1.5 ERROR HANDLING IN NEO
Every NEO client request can result in an exception as a possible outcome [Sun95]. An exception indicates an error has occurred while performing the operation. NEO exceptions can be categorized into system exceptions and user exceptions. A system exception is generated when an error occurs in the underlying NEO infrastructure. Programmers can also predefine user exceptions in the object IDL interface using Raise block thus generating errors when client programs perform illegal operations or a non-system error occurs when an object is running. The following is an example to define a user exception handler in the IDL interface.
interface Queue
{
void enqueue( in long item)
raise ( Queue_Exception );
};
Clients can catch server object exceptions by including server method calls in a try-catch block. Error handling is a very important issue in workflow management systems. NEO supports a strong error reporting and handling system. Error handling in NEOWork is based on the NEO exception handling mechanism.
2.2 WEB-BASED TECHNOLOGY
Today’s companies are looking for ways to harness the ubiquity and power of the World Wide Web to generate new business by reaching out to consumers, enhance the productivity of their workforce by giving them easier access to useful information and improve customer service through better communication. In this section, we give an overview to the current Web-base technologies such as CGI Scripts and JAVA Applets, and explain how CORBA objects can integrate with JAVA Applets to form a more robust communication infrastructure for workflow management systems.
2.2.1 CGI SCRIPTS AND THEIR LIMITATIONS
The Web is accessible through most commercial on-line services and through popular Web browsers, such as Netscape Navigator and Internet Explorer (IE). The Web browsers provide a point-and-click metaphor for accessing a hyperlinked collection of documents written using the Hypertext Markup Language (HTML). HTML is one of the languages that conform to the Standard Generalized Markup Language (SGML) -- an international standard for specifying neutral-format documents. HTML documents are served by web servers that adhere to the Hypertext Transfer Protocol (HTTP), which was designed to efficiently support multiple independent requests for documents. However, Web servers do not maintain any state information of a request: each request for a document is an independent transaction. To support the dynamic creation of HTML documents, the HTTP servers support a Common Gateway Interface (CGI). The CGI is a simple interface for running external programs, software or gateways under HTTP servers in a platform-independent manner. Typically, the HTTP servers invoke CGI programs -- frequently called CGI scripts -- when requested to serve specific documents like forms. CGI scripts can be written to provide access to, and present information coming from, a variety of sources. Together the HTTP server and the CGI script are responsible for servicing a client request by sending back responses. With this in mind, a workflow management system can be built completely using CGI scripts and their interaction with HTTP servers and database systems as the primary communication infrastructure. A Web-based implementation of the METEOR2 workflow management system has been successfully built with this approach in our LSDIS lab [Pal96] [AKM+96]. Another Web-based workflow system example is the WWWorkflow system developed at the Jet Propulsion Lab [ABM96]. Because of concerns about scalability and reliability (see [SKM+96]), alternatives to the CGI approach are being explored in the LSDIS lab.
2.2.2 JAVA AND JAVA APPLETS
Java developed by SUN Microsystems is an object-oriented programming language that was designed specifically for writing executable programs that can be distributed through networks. Generally, there are two different types of Java programs: Java Applications and Java Applets. Java applications are standalone programs that require the assistance of the Java interpreter to run on local machines. They are analogous to C/C++ applications. The Web browser form SUN called HotJava in fact is an example of a Java application that runs as a user application. Java Applets are relatively small programs that are downloaded to and run on client’s machine when requested. Unlike static HTML documents, Java Applets are actual application programs that run on a client’s machine and interact with user at run-time. Java Applets are more security-conscious than Java applications due to their nature as unsecured network programs. Many restrictions are applied on Java Applets to prevent from accessing resources on local systems. For example, unlike applications, Java Applets (untrusted) cannot access to any local drives for security reasons. Java, as a programming language, has the following characteristics:
In summary, Java, along with Java Applets, offers outstanding features that allow workflow management systems to take advantage of the power and flexibility of the World Wide Web as the communication infrastructure.
2.3 INTEGRATING CORBA WITH JAVA APPLETS
Unlike CGI scripts which respond to a client’s request by sending back results in a static HTML document format, Java Applets are downloaded to the client machine and run locally. Also, Java Applets serve clients with application-like front-end interfaces and provide real-time interactions with clients. But as objects located on the client machine, Java Applets still needs to communicate with distributed server objects across the network to get data and perform operations on servers; that is where CORBA objects come into play. Although using CORBA as fundamental communication infrastructure to support scalable, transaction-oriented, high-performance and reliable workflow management systems is a good choice, to take advantages of utilizing the Web resources, we need a better type of technology for the Web; that is where Java and HTML are very popular. In this section, we introduce a way to connect Java Applets with CORBA objects by building TCP/IP socket connections directly between them. Applying this method, Java Applets can communicate with any ORB objects across the network and utilize distributed system resources while ORB objects have a more flexible way to communicate with different user clients on the Web.
Figure 2.4 shows the TCP/IP socket connections between CORBA objects and Java Applets.
Java Applets are running as different threads in a Web browser’s process after being downloaded from the network. Each Java Applet can then open its own TCP/IP socket to send and receive data through different ports. A port is a channel designated for Internet hosts to send and receive data across. On the other end, each CORBA object opens a TCP/IP socket to listen to a port. Different CORBA objects registered on the same ORB server should be associated with different ports since they share the same Internet address. The Internet address and the port number together can differentiate a CORBA object. Therefore, Java Applets can communicate with different CORBA objects based on their location and port.

Figure 2.4: How applets and CORBA objects communicate
Applet clients interacting with CORBA objects are faster than CGI scripts because each CGI script generates and transfers whole HTML pages each time a request is made. Applet clients and CORBA objects, on the other hand, use scalpel like precision when generating and transferring data. Generated and transferred data can be in binary form and only data that is absolutely necessary will be transferred over the Internet. Also, the data moved between the Applet client and the CORBA object does not pass through the Web server. That is, no extra load, as with CGI scripts, is inflicted on the Web server.
In the near future, integration of CORBA objects with the Web objects (Java Applets or ORB objects from other CORBA systems) will be even easier and more robust. The CORBA Internet Inter-ORB Protocol (IIOP) will be mandatory for all CORBA 2.0 compliant products soon by the major CORBA venders like SUN, Visigenic and IONA; and Web server venders like Netscape Corporation and Oracle have incorporated the IIOP into their Web server products. The communication infrastructure will be consisting of:
Java Applets can be downloaded via Web based applications. These Java Applets are capable of directly accessing CORBA objects via IIOP. Within the CORBA system, Java clients will be able to invoke CORBA objects; and the ORB will locate and invoke the right service. If the service is outside the CORBA scope, access will take place via the IIOP to HTTP gateway (this gateway will also convert from IIOP to other protocols such as FTP). The remote object will look like a CORBA object (either another CORBA object from other systems or just a Java object) to the CORBA client.