Research Statement

Eileen Kraemer

 

The focus of my research is on the development and evaluation of tools for visualization and interaction in support of complex tasks.  As such, it includes work both in human-computer interaction and in the domain of the complex task to be supported.   Early work addressed visualization and interaction in support of program comprehension and performance evaluation of parallel and distributed systems.  Research in techniques for interactive steering built on this early work. 

 

Later work involved the development and evaluation of numerous domain-specific tools employing visualization and interaction, with a strong focus on applications in Bioinformatics.  Projects included visualization and interaction in support of network management, micro-array data analysis, gene-finding, visualization of protein-protein interaction data, comparative genomics displays, and more. These experiences with the development and evaluation of many systems for visualization and interaction in support of a complex task have served as a training ground, permitting us to abstract out higher-level principles about the design, implementation, evaluation, and use of such systems.

 

Motivated by the lessons learned in the domain-specific projects, I have recently begun a program of research that attempts to answer questions about what types of displays are useful and usable, and what properties of those displays promote or inhibit understanding and usability.   I am currently performing empirical studies of the effectiveness of visualization and interaction techniques for program visualization, with an emphasis on applications in computer science education.  In addition, I am involved in the development of user interfaces and visualizations for the web-based biological database for Cryptosporidium, CryptoDB.

 

In the sections below I describe a selection of my prior and current work.  I discuss the goals and contributions of these projects, state the questions that remain open, and identify new research questions suggested as a result of this work.

 

PARADE, POLKA, and the Animation Choreographer

 

This project was sponsored in part by the National Science Foundation, supervised by Professor John Stasko at Georgia Tech, and was the subject of my PhD thesis research. We sought to discover how visualization could be used to support the debugging of parallel programs.  Our approach centered on a comprehensive environment called PARADE for developing visualizations and animations of parallel and distributed programs. A major sub-project was the development of the animation component of the environment, POLKA, which can be used to visualize concurrent (and sequential) programs from many different languages. 

 

My focus was on the development of an interactive tool called the Animation Choreographer.  This graphical, interactive tool allows users to specify the order in which events in associated animations should be displayed, and to view, manipulate, and explore the set of feasible orderings of the program under study.  Using the Choreographer we were able to investigate whether the choice of an appropriate reordering could produce more comprehensible displays, and whether users could use such displays to discover problematic event sequences.

 

This project succeeded in developing techniques that permit the visualization of concurrency in parallel and distributed systems.  However, it was not clear that visualizations alone fully supported the users in answering their questions of interest.  Whether different visualizations or additional interactions would help is an open question.

Query-based Visualization

 

The key obstacle to understanding a particular run of a distributed computation is the large volume of data that such a computation can generate.  The state space, distributed across many processors, is typically far too large to depict in its entirety.  Rather, one might want to explore the computation by selecting values for display, tracking them for some time, and then selecting a new set of data values or processes of interest.  Query-based Visualization (QBV) was proposed as a novel exploratory approach to understanding distributed computations through such interactions.

 

The QBV project was performed in collaboration with Gruia-Catalin Roman at Washington University and funded by an NSF grant.  In QBV queries are used as a device for searching the state space of distributed computations.  Visual presentation techniques borrowed from program animation are also employed, and users are able to navigate through the state space using visual interactions. Query-based visualization treats the state of a running distributed computation as if it were the contents of a distributed database. The user explores this database by issuing queries.  The queries are evaluated and the results visualized. Persistent queries enable continuous visual monitoring of the state of a distributed computation. All views correspond to globally consistent snapshots of the computation. The implementation of query-based visualization provided a uniform interface for learning about PVM-based distributed applications.  Contributions included the development of monitoring and snapshot algorithms for distributed systems, and the development of an architecture for the efficient combining, maintenance, and processing of queries against a running distributed computation.

 

The development and evaluation of the system served to answer the questions of whether such an exploratory style of data collection and visualization could be implemented efficiently and could provide users with an experience that would assist them in better understanding distributed computations.  In this system users interacted through menus and dialog boxes to specify what should be collected and when and how it should be displayed.  Whether better interactions could be designed, perhaps through direct manipulation of on-screen objects representing elements of the computation, remains to be answered.


 

Pathfinder

 

The Pathfinder project was supported by an NSF CAREER award.  Building on QBV, Pathfinder gave users the ability to not only query distributed computations and view animated displays of their execution, but to also perform interactive steering.  Interactive steering permits users to make on-the-fly changes to running distributed computations. This project involved the development of modular algorithms for the collection of snapshots with varying degrees of consistency and of algorithms that permit the consistent application of changes to the executing program, while minimizing resulting lag and perturbation.

 

These algorithms for consistent monitoring and steering built on existing work on snapshot algorithms, and extended this work from the realm of consistent observation of distributed systems to ensuring consistency in the application of changes to an executing distributed computation, through an innovative application of optimistic computing techniques. We developed an environment for the online steering of distributed computations, implemented both conservative and optimistic approaches, and determined the conditions under which conservative or optimistic exhibited better performance.

 

The steering functionality was employed in several experiments with biological applications including entropy-minimization, physical mapping and gene clustering algorithms.  Users found benefits, either in better solutions or faster time to solution.  However, neither the range of applications for which such techniques might be useful nor the range of interaction techniques for controlling steering behavior has been fully explored.  Further, these particular domain areas have not been explored in-depth to determine if custom visualizations or interactions might exist that would better support these tasks.

 

Network Monitoring, Visualization, and Control (NMVC)

 

This project was funded by NSF and performed at Washington University in St. Louis under the guidance of Guru Parulkar and Jonathan Turner and in collaboration with Doug Schmidt and Ron Cytron.  The contribution of the project was the design, prototype implementation, and demonstration of a highly scalable NMVC system with advanced algorithmic and human-in-the-loop capability. The project addressed the problem of efficient management of high-performance local area and wide area networks, through the construction of efficient and user-friendly network monitoring, visualization, and control (NMVC) systems.  In particular, it addressed the problems of detecting, isolating, correlating and correcting faults and performance bottlenecks in situations in which algorithmic methods either did not suffice or did not scale.  Through the system network administrators could calibrate and fine-tune network and application parameters in real-time according to observed traffic patterns, with the goal of ensuring adequate quality of service to network users, while maintaining high network resource utilization.

 

This project involved surveys of end-users, iterative prototyping and the development of interactions through which users could alter the appearance of their displays and interact to turn on or off various types of warnings or to seek additional information. Through this work the benefits of task analysis, prototyping, and evaluation became apparent as each iteration of prototype better served the needs of target users.

 

 

VizEval

 

Many potential users of program visualization (PV) have a strong intuitive belief that visualization is a valuable tool for communicating information about the state and behavior of programs. Yet, in practice, the use of visualization is less pervasive than the notion that it is useful.  Several reasons may exist for this discrepancy: effort required to create and refine visualizations, difficulty in collecting information from running programs, difficulty of the viewers in navigating and refining the views they are presented with, and questions about the true benefits to the user.  Our position is that interactivity is key to addressing many, though not all, of these concerns.

 

This ongoing project is funded by NSF and performed in collaboration with Elizabeth Davis, a perceptual psychologist at Georgia Tech.  In it, we investigate the benefits of program visualization, study characteristics of program visualizations that make them more or less useful, and seek to develop tools and approaches that strive to maximize the benefit that users may derive from the use of such tools.  We investigate the hypothesis that present PV systems have failed to live up to expectations because they have largely ignored the issue of appropriate perceptual properties for effective viewing, and that in order to be effective a PV system must support perceptually appropriate animation, graphical design and layout, as well as good pedagogical design.

 

Our goals are to:

 

 

 

We expect that critical evaluation of the effects of relevant attributes of PVs, the development of metrics for quality of PVs, and the tuning and evaluation of models based on these metrics through empirical studies will serve to provide normalizing parameters for future studies of the benefits of program visualization. More importantly, we hope that this work will serve as the basis for design guidelines for the effective use of PV and other forms of process visualization.

 

In the context of this project we have developed software to support empirical evaluation.  One system supports the creation and conduct of low-level perceptual studies with many trials.  Another system supports the creation and conduct of higher-level studies of program comprehension.  Generalization of these systems to support empirical studies of many types is planned as future work.

 

Bioinformatics applications

 

We have developed and evaluated numerous tools for visualization and interaction in the context of work in bioinformatics.  Our work in the development of these domain-specific tools has served as a training ground, permitting us to abstract out higher-level principles about the design, implementation, evaluation, and use of such systems. 

 

For example, we performed a project in which we developed a gene-finding tool (FFG = Find Fungal Gene) for an organism of particular interest at UGA, Neurospora crassa.  We then compared the performance of this gene-finding tool with that of several other well-known gene finders on a sample for which the genes had been manually annotated.  Through this process we became Òdomain expertsÓ in the tedious, time-consuming, and error-prone task of evaluating gene-finding programs.  This insight permitted us to develop a spreadsheet-like user interface and supporting Java classes that allow a user to perform in a few hours what had taken us weeks to accomplish in the absence of these analysis, visualization, and interaction tools.

 

In our work with the visualization of protein-protein interaction data, we sought to implement and compare a variety of graph-layout algorithm with the goal of determining the ÒbestÓ algorithm for such displays.  We also implemented a number of interactions that permitted the user to select particular nodes, clusters, or cluster sizes for display and then to expand that display through various clicks or menu interactions.  The surprising result was that the graph layout algorithm had little effect on the userÕs ability to locate the clusters of biological interest.  However, the interactions that permitted the users to select particular nodes and to interactively expand or contract the display were found to be essential.

 

Similar results were found in our work with tools to support high-throughput Nuclear Magnetic Resonance (NMR) studies, analysis and visualization of gene expression profiles in support of our collaborative work on ovarian cancer, interactive multiple alignment of retrotransposon sequences, and tools for exploring lateral gene transfer.  In each case, we found that simple graphical displays typically sufficed to solve the problem, given that the user could perform the ÒrightÓ interactions to solve their problem.  The challenge in each case was to perform sufficient task analysis to determine exactly what these tasks were and how the user would like to interact with the system to view the available data, to launch additional analyses, and to filter and manipulate intermediate results.  Prototyping and empirical studies were essential in ensuring that the tool that was created actually met the needs of the target users.

 

The current focus of bioinformatics-related work is in the context of CryptoDB (www.cryptodb.org), which focuses on the pathogenic organism Cryptosporidium and ApiDB (www.apidb.org), an umbrella group for CryptoDB and related organisms, funded under an NIH/NIAID contract.  In this work we designed and implemented the web-based user interface to the biological database and created interactive visualizations that permit the biologist users to explore and understand the results of their database searches.  In addition, we have the opportunity to apply our Òlessons learnedÓ from previous projects.  For example, we have developed a tool for rapid prototyping of the user interface for the GUS (Genomics Unified Schema) WDK (Web Development Kit).  The WDK software is designed to permit easy creation of Òquery-basedÓ websites.  We have developed an XML-based specification and code generation module to simplify the process of creating and modifying the pages that comprise that site.  This tool was used in developing the site seen at www.cryptodb.org. 

 

Another application of Òlessons learnedÓ is in our implementation of a comparative genomics visualization tool for the site.  The users of the site are accustomed to viewing the results of their single-genome queries in the context of GBrowse displays.  Knowing that these users already understood these displays and could think about their work in terms of these displays, we extended the GBrowse framework to supports comparative genome visualization.  This approach reduced the learning time of users, and likely increased the acceptance rate of the newly developed visualizations.

 

 

Ongoing and Future Work:  Design and Verification of Concurrent Software

 

Concurrency is an element of much of the development of new software systems for applications such as e-commerce, online financial systems, and other high-assurance applications.  A challenge in the design and implementation of such software is to safely accommodate and optimize concurrency and synchronization.  Thus, support for those who design and verify concurrent systems is increasingly important. 

 

Although methods exist to address the problem of verification, they are typically exhaustive approaches that operate by applying reachability analysis or temporal-logic model checking to explicit behavioral models of a system. However, this approach is limited by the computational intractability of such exhaustive methods.   To address the intractability problems, designers must use explicit and very compact models of concurrency and synchronization.  These models may be specified by the designer or derived from the code.  The models are then analyzed to check that they satisfy safety and liveness requirements (which the designer must also specify).  Unfortunately, the process of generating these specifications introduces the opportunity for fault-injection.   The usability of the modeling notations and representations employed by the designers has the potential to impact the likelihood of such fault inject.  Further, we observe that existing notations for specifying models and properties are not particularly usable by the general practitioner. 

 

We seek to apply empirical evaluation to discover which modeling metaphors and notations best support the various design and verification tasks that underlie model-based approaches.   Further, we wish to explore the extent to which notations that incorporate assumptions that constrain the allowable interactions among processes simplify the task for the user, reducing the likelihood of error and the time required to generate a concurrent program that satisfies requirements related to concurrency, synchronization, and performance.  Finally, we would like to explore the impact of the use of different notations on the ability of students to learn, apply, and master the tasks of design and verification of concurrent systems.