CHAPTER 1

 

INTRODUCTION

 

Every organization has its own combination of platforms, operating systems, applications, and technologies that make up its information system. The mix contains legacy systems, internally developed applications, and state-of-the-art solutions from a variety of vendors. These products and technologies must operate together to deliver the successful means of daily business processing for the organization. The complexity of this aim is also compounded by the fact that new technologies are being developed and adopted by organizations. New products must work with the existing infrastructure, and still add value and functionality in ways that did not exist before. Innovation without compatibility and inter-operability is not likely to succeed in today’s business.

Workflow technology cuts across the boundaries of organizations to reach a larger number of users, resources and tools. At the same time workflow technology provides the means of cooperating and integrating with existing business applications and preserves the diversity of these applications for specialized functions used by the modern organizations.

However, since workflow technology is new and constantly evolving, many businesses realize that the path from promise to performance is long and arduous. Although there are several hundred of commercial workflow management products available in the market, these workflow systems more or less ignore the functionality of supporting the scalability, reliability and robustness, which are critical to commercial software products. Such limitations of current workflow products prevent them from becoming the backbone of corporate computing.

In this thesis we target the problem of scalability and reliability in the context of the METEOR2 WFMS. Integrating CORBA (NEO) and Web-based technologies as the workflow infrastructure, NEOWork, the topic of this thesis, is a centralized CORBA-based workflow engine that provides a scalable and reliable run-time enactment environment for the METEOR2 WFMS.

This chapter gives an introduction to the basis of workflow technologies including a background discussion on workflows, workflow management systems (WFMSs) and the workflow reference model [WfM97]. A brief overview of this thesis project and a literature review of the related work will be provided at the end of this chapter.

 

1.1 WORKFLOWS AND WORKFLOW MANAGEMENT

 

Research in office automation in the late seventies is generally considered to be the start of workflow research. In an article published in 1980, the authors Ellis and Nutt explain: "…to allow a forms process to guide itself through various work stations and measure its own progress, utilizing the facilities of particular work stations within their own domains" [EN80]. The word workflow had not been invented yet but the authors were definitely talking about it, albeit ahead of time.

What is a workflow? In recent workflow research, there is still little agreement on the definition of a workflow. Here, we use the definition that is given on the workflow management review paper of [GHS95] and the tutorial materials of [She95] -- Workflows are activities involving the coordinated execution of multiple tasks performed by different processing entities. A task defines a logical unit of work in a workflow. Workflow tasks are heterogeneous in nature -- they could be user tasks in which manual processing are involved or application tasks that contain only automated machine operations. Processing entities of tasks may include application systems, Transaction Processing Monitors (TP-Monitors), Database Management Systems (DBMSs), printers, or even persons, etc. A workflow also defines task dependencies that specify how tasks in a workflow are coordinated for execution in a semantically correct order [KS95].

Workflow Management is the automated coordination and integrated control of work processes, e.g., a business process [Joo96]. Workflow Management Systems (WFMSs) provide support for modeling, executing and monitoring workflows involving multiple humans and HAD (Heterogeneous, Autonomous, and Distributed) systems [KS95]. WFMSs manage the flow of work among participants according to inter-task dependencies, and coordinate user and system participants, together with the appropriate data resources that may be accessible directly by the system or off-line, to achieve defined objectives by possibly imposing deadlines. The coordination also involves passing task data from participant to participant in correct sequence, ensuring that the participants fulfill the required contributions, and taking default actions when necessary. There are numerous commercial products in the market that claim to be WFMS. Vendors take the liberty to label their products workflow-enabled if they support a limited set of WFMS functionality, such as, some rudimentary e-mail routing capabilities in their applications. However, the difference between such capabilities and workflow automation is the same as the difference between text editors and word processors. We believe that the WFMS of a workflow product must have the following essential ingredients to classify as a workflow automation solution:

 

Current research on the workflow technology can be categorized into workflow specification and modeling, inter-task dependency and scheduling, workflow management system design, and failure handling and workflow recovery [MSK+96]. Discussion in the following sections will focus on workflow management system design and workflow recovery issues.

 

1.2 THE WORKFLOW REFERENCE MODEL

 

The Workflow Management Coalition (WfMC) defines WFMS as a system that defines, creates and manages the execution of workflows through the use of software, running on one or more workflow engines, which is able to interpret the process definition, interact with workflow participants and, where required, invoke the use of IT tools and applications [WfM97].

Figure 1.1 shows the workflow reference model for WFMSs introduced by the WfMC [WfM97].

Figure 1.1: The workflow reference model

Process definition tools are used to create the process descriptions for workflows in a computer processable form. During the execution of a workflow, the process descriptions are interpreted by the workflow enactment services. The workflow enactment service provides the run-time environment in which process instantiation and activation occur, utilising one or more workflow management engines, responsible for interpreting and activating part, or all, of the process definition and interacting with the external resources necessary to process the various activities. A workflow engine is responsible for providing the runtime execution environment for a workflow instance, which includes enforcing inter-task dependencies, controlling the state of the run-time component instances - creation, activation, suspension, termination, maintaining workflow control data and workflow relevant data, and passing workflow relevant data to/from applications or users. Administration and monitoring tools are used for managing users and workgroups of WFMS, auditing the enactment engines as well as other components and defining policies that could be used system-wide.

 

1.3 DESCRIPTION OF THE PROJECT

 

Reliability and scalability are fundamental issues to a WFMS [GHS95]. A typical large-scale WFMS has to support hundreds or thousands of complex, long duration enterprise processes in a heterogeneous and distributed environment, and still has the ability to ensure correctness and reliability of the workflow execution in the presence of failures.

The NEOWork project is the continuation of research on the centralized run-time architectures for the METEOR WFMS which were proposed and implemented by [Wan95]. Targeting the lack of the scalability in the former architectures, we have redesigned the centralized architectures so that they are strongly based on the object-oriented concept. The run-time components (schedulers and task managers) of NEOWork are CORBA object instances that can be created dynamically by their component factories. The other objective of the thesis project is to deal with the reliability of WFMSs. Failure handling and recovery in the context of the METEOR2 WFMS is addressed. We have augmented NEOWork with a CORBA-based recovery framework and substantial use of CORBA object services to handle various run-time system failures. Using the NEO CORBA system and the Web as primary communication infrastructures, NEOWork also includes the interface server that allows direct socket data communication between task managers and user tasks with Java interfaces. A prototype implementation of the NEOWork architecture has been achieved and demonstrated.

 

1.4 RELATED WORK

 

This section reviews the architectural design of some of the current WFMSs (both commercial and research prototypes). The recovery mechanisms of these WFMSs (if present) will be discussed as well.

The initial centralized run-time architectures for the METEOR WFMS were designed and implemented by [Wan95]. The major components of these run-time architectures are the workflow scheduler, task managers (TMs) and tasks [Wan95] [MSK+96]. Each of these components plays a distinguished role in the architectures:

 

In the centralized systems, task managers and tasks communicate through the CORBA IDL [OMG93] interfaces. CORBA IDL interfaces are functional specifications which clients can use to invoke services located on different hosts/machines. So task managers and tasks can be distributed on different hosts/machines, and still communicate with each other. The CORBA technology provides the fundamental communication infrastructure for WFMSs [MSK+96] [SKM+96].

ORBWork is a CORBA and Web based fully distributed runtime for the METEOR2 WFMS [Das97]. The ORBWork enactment system uses the ORB infrastructure for communication between, and distribution of, workflow components (task managers, data objects, tasks and recovery components) across host boundaries. Web browsers and CGI scripts are used as a standard mechanism for user-interaction with the WFMS. Due to the distributed nature of the workflow engine, ORBWork does not have a centralized scheduling entity. The scheduling mechanism is embedded in each of the task managers. A CORBA-based recovery framework to deal with numerous system failures for the ORBWork has been designed and implemented by [Wor97]. The framework includes additional WFMS system components that enable failure-detection, persistence, and automated and human-assisted recovery of workflow components. The hierarchical and distributed nature of the recovery framework adds to the availability and robustness of the ORBWork.

The APRICOTS [Sch93] is a prototype implementation of the ConTract project. In the ConTract project, a task/process is defined using sequential, parallel, branched, or nested blocks that can be enclosed in a transaction. Consequently, the state control functions, such as, suspend, resume, activate, and restart, etc., for traditional transactional systems are used in the run-time WFMS to control the workflow execution. Forward recovery is provided for the WFMS by using the "check-pointing of blocks" recovery mechanism.

A discussion on the centralized architecture of WFMS for FlowMark can be found in [AAA+95]. The run-time architecture consists of five components: a single ObjectStore database server, FlowMark server, runtime client, buildtime client, and invoked applications. These components correspond to the workflow reference model: buildtime client for build-time functions, runtime client for run-time interactions, FlowMark server for run-time control, and the database server for storing persistent data for logic control and recovery. A message-queue architecture was also introduced in the paper leading to the design of a distributed system architecture for FlowMark using multiple database servers. Further discussion proposed that multiple message-queues could be used and made persistent so that system messages could survive crashes, making system recovery possible.

The Mentor run-time system uses the Client/Server architecture approach, featuring a more open and modular design so that components can easily be added and replaced [WWW+96]. The architecture consists of workflow engines, a communication manager, TP monitors, a log manager, and a worklist manager. Each workflow engine acts as one of the workflow system servers and executes its corresponding partition of workflow activities (tasks). A communication manager manages control-flow-data for its local workflow server, while a local TP monitor handles data communication across workflow servers with other TP monitors. Transitions of workflow states could be logged permanently by a log manager (implemented on top of an Oracle DBMS) to enable recovery from a server crash. In addition, a worklist manager is introduced to assign work items to different workflow servers.

 

1.5 ORGANIZATION OF THE THESIS

 

Chapter 2 gives a review of the CORBA and Web technologies that have been applied for building the workflow infrastructures in the METEOR WFMS. Chapter 3 has a brief introduction to the basic concepts of the METEOR2 model as well as the METEOR2 WFMS architecture designs. In chapter 4, we move to an in-depth discussion of the design of the workflow engine for the NEOWork enactment system. A general discussion on workflow reliability issues and the design of the NEOWork recovery framework is provided in chapter 5. Chapter 6 contains the implementation details of NEOWork and finally chapter 7 concludes the thesis and suggests a possible future work on this project.