Research Statement
Eileen Kraemer
The focus of my research is on
the development and evaluation of tools for visualization and interaction in
support of complex tasks. As such,
it includes work both in human-computer interaction and in the domain of the complex
task to be supported. Early
work addressed visualization and interaction in support of program
comprehension and performance evaluation of parallel and distributed systems. Research in techniques for interactive
steering built on this early work.
Later work involved the development and evaluation of numerous domain-specific tools employing visualization and interaction, with a strong focus on applications in Bioinformatics. Projects included visualization and interaction in support of network management, micro-array data analysis, gene-finding, visualization of protein-protein interaction data, comparative genomics displays, and more. These experiences with the development and evaluation of many systems for visualization and interaction in support of a complex task have served as a training ground, permitting us to abstract out higher-level principles about the design, implementation, evaluation, and use of such systems.
Motivated by the lessons learned
in the domain-specific projects, I have recently begun a program of research
that attempts to answer questions about what types of displays are useful and
usable, and what properties of those displays promote or inhibit understanding
and usability. I am currently performing empirical
studies of the effectiveness of visualization and interaction techniques for
program visualization, with an emphasis on applications in computer science
education. In addition, I am
involved in the development of user interfaces and visualizations for the
web-based biological database for Cryptosporidium, CryptoDB.
In the sections below I describe
a selection of my prior and current work.
I discuss the goals and contributions of these projects, state the
questions that remain open, and identify new research questions suggested as a
result of this work.
PARADE, POLKA, and the
Animation Choreographer
This project was sponsored in
part by the National Science Foundation, supervised by Professor John Stasko at
Georgia Tech, and was the subject of my PhD thesis research. We sought to
discover how visualization could be used to support the debugging of parallel
programs. Our approach centered on
a comprehensive environment called PARADE for developing visualizations and animations
of parallel and distributed programs. A major sub-project was the development
of the animation component of the environment, POLKA, which can be used to
visualize concurrent (and sequential) programs from many different
languages.
My focus was on the development
of an interactive tool called the Animation Choreographer. This graphical, interactive tool allows
users to specify the order in which events in associated animations should be
displayed, and to view, manipulate, and explore the set of feasible orderings
of the program under study. Using
the Choreographer we were able to investigate whether the choice of an
appropriate reordering could produce more comprehensible displays, and whether
users could use such displays to discover problematic event sequences.
This project succeeded in
developing techniques that permit the visualization of concurrency in parallel
and distributed systems. However,
it was not clear that visualizations alone fully supported the users in
answering their questions of interest.
Whether different visualizations or additional interactions would help is
an open question.
Query-based Visualization
The key obstacle to understanding
a particular run of a distributed computation is the large volume of data that
such a computation can generate. The
state space, distributed across many processors, is typically far too large to
depict in its entirety. Rather,
one might want to explore the computation by selecting values for display,
tracking them for some time, and then selecting a new set of data values or
processes of interest. Query-based
Visualization (QBV) was proposed as a novel exploratory approach to
understanding distributed computations through such interactions.
The QBV project was performed in
collaboration with Gruia-Catalin Roman at Washington University and funded by
an NSF grant. In QBV queries are
used as a device for searching the state space of distributed
computations. Visual presentation
techniques borrowed from program animation are also employed, and users are
able to navigate through the state space using visual interactions. Query-based
visualization treats the state of a running distributed computation as if it
were the contents of a distributed database. The user explores this database by
issuing queries. The queries are
evaluated and the results visualized. Persistent queries enable continuous visual monitoring of the state of a
distributed computation. All views correspond to globally consistent snapshots
of the computation. The implementation of query-based visualization provided a
uniform interface for learning about PVM-based distributed applications. Contributions included the development
of monitoring and snapshot algorithms for distributed systems, and the
development of an architecture for the efficient combining, maintenance, and
processing of queries against a running distributed computation.
The development and evaluation of
the system served to answer the questions of whether such an exploratory style
of data collection and visualization could be implemented efficiently and could
provide users with an experience that would assist them in better understanding
distributed computations. In this
system users interacted through menus and dialog boxes to specify what should
be collected and when and how it should be displayed. Whether better interactions could be designed, perhaps
through direct manipulation of on-screen objects representing elements of the
computation, remains to be answered.
Pathfinder
The Pathfinder project was
supported by an NSF CAREER award.
Building on QBV, Pathfinder gave users the ability to not only query
distributed computations and view animated displays of their execution, but to
also perform interactive steering.
Interactive steering permits users to make on-the-fly changes to running
distributed computations. This project involved the development of modular
algorithms for the collection of snapshots with varying degrees of consistency
and of algorithms that permit the consistent application of changes to the
executing program, while minimizing resulting lag and perturbation.
These algorithms for consistent
monitoring and steering built on existing work on snapshot algorithms, and
extended this work from the realm of consistent observation of distributed
systems to ensuring consistency in the application of changes to an executing
distributed computation, through an innovative application of optimistic
computing techniques. We developed an environment for the online steering of
distributed computations, implemented both conservative and optimistic
approaches, and determined the conditions under which conservative or
optimistic exhibited better performance.
The steering functionality was
employed in several experiments with biological applications including
entropy-minimization, physical mapping and gene clustering algorithms. Users found benefits, either in better
solutions or faster time to solution.
However, neither the range of applications for which such techniques
might be useful nor the range of interaction techniques for controlling steering
behavior has been fully explored.
Further, these particular domain areas have not been explored in-depth
to determine if custom visualizations or interactions might exist that would
better support these tasks.
Network Monitoring,
Visualization, and Control (NMVC)
This project was funded by NSF
and performed at Washington University in St. Louis under the guidance of Guru
Parulkar and Jonathan Turner and in collaboration with Doug Schmidt and Ron
Cytron. The contribution of the
project was the design, prototype implementation, and demonstration of a highly
scalable NMVC system with advanced algorithmic and human-in-the-loop
capability. The project addressed the problem of efficient management of high-performance
local area and wide area networks, through the construction of efficient and
user-friendly network monitoring, visualization, and control (NMVC)
systems. In particular, it
addressed the problems of detecting, isolating, correlating and correcting
faults and performance bottlenecks in situations in which algorithmic methods
either did not suffice or did not scale.
Through the system network administrators could calibrate and fine-tune
network and application parameters in real-time according to observed traffic
patterns, with the goal of ensuring adequate quality of service to network
users, while maintaining high network resource utilization.
This project involved surveys of
end-users, iterative prototyping and the development of interactions through
which users could alter the appearance of their displays and interact to turn
on or off various types of warnings or to seek additional information. Through
this work the benefits of task analysis, prototyping, and evaluation became
apparent as each iteration of prototype better served the needs of target
users.
VizEval
Many potential users of program
visualization (PV) have a strong intuitive belief that visualization is a
valuable tool for communicating information about the state and behavior of
programs. Yet, in practice, the use of visualization is less pervasive than the
notion that it is useful. Several
reasons may exist for this discrepancy: effort required to create and refine
visualizations, difficulty in collecting information from running programs,
difficulty of the viewers in navigating and refining the views they are
presented with, and questions about the true benefits to the user. Our position is that interactivity is key to addressing many, though not all, of these
concerns.
This ongoing project is funded
by NSF and performed in collaboration with Elizabeth Davis, a perceptual
psychologist at Georgia Tech. In
it, we investigate the benefits of program visualization, study characteristics
of program visualizations that make them more or less useful, and seek to
develop tools and approaches that strive to maximize the benefit that users may
derive from the use of such tools.
We investigate the hypothesis that present PV systems have failed to
live up to expectations because they have largely ignored the issue of
appropriate perceptual properties for effective viewing, and that in order to
be effective a PV system must support perceptually appropriate animation,
graphical design and layout, as well as good pedagogical design.
Our goals are to:
We expect that critical
evaluation of the effects of relevant attributes of PVs, the development of
metrics for quality of PVs, and the tuning and evaluation of models based on
these metrics through empirical studies will serve to provide normalizing
parameters for future studies of the benefits of program visualization. More
importantly, we hope that this work will serve as the basis for design
guidelines for the effective use of PV and other forms of process
visualization.
In the context of this project we
have developed software to support empirical evaluation. One system supports the creation and
conduct of low-level perceptual studies with many trials. Another system supports the creation
and conduct of higher-level studies of program comprehension. Generalization of these systems to
support empirical studies of many types is planned as future work.
Bioinformatics applications
We have developed and evaluated numerous tools for visualization and interaction in the context of work in bioinformatics. Our work in the development of these domain-specific tools has served as a training ground, permitting us to abstract out higher-level principles about the design, implementation, evaluation, and use of such systems.
For example, we performed a project in which we developed a gene-finding tool (FFG = Find Fungal Gene) for an organism of particular interest at UGA, Neurospora crassa. We then compared the performance of this gene-finding tool with that of several other well-known gene finders on a sample for which the genes had been manually annotated. Through this process we became Òdomain expertsÓ in the tedious, time-consuming, and error-prone task of evaluating gene-finding programs. This insight permitted us to develop a spreadsheet-like user interface and supporting Java classes that allow a user to perform in a few hours what had taken us weeks to accomplish in the absence of these analysis, visualization, and interaction tools.
In our work with the visualization of protein-protein interaction data, we sought to implement and compare a variety of graph-layout algorithm with the goal of determining the ÒbestÓ algorithm for such displays. We also implemented a number of interactions that permitted the user to select particular nodes, clusters, or cluster sizes for display and then to expand that display through various clicks or menu interactions. The surprising result was that the graph layout algorithm had little effect on the userÕs ability to locate the clusters of biological interest. However, the interactions that permitted the users to select particular nodes and to interactively expand or contract the display were found to be essential.
Similar results were found in
our work with tools to support high-throughput Nuclear Magnetic Resonance (NMR)
studies, analysis and visualization of gene expression profiles in support of
our collaborative work on ovarian cancer, interactive multiple alignment of
retrotransposon sequences, and tools for exploring lateral gene transfer. In each case, we found that simple
graphical displays typically sufficed to solve the problem, given that the user
could perform the ÒrightÓ interactions to solve their problem. The challenge in each case was to
perform sufficient task analysis to determine exactly what these tasks were and
how the user would like to interact with the system to view the available data,
to launch additional analyses, and to filter and manipulate intermediate
results. Prototyping and empirical
studies were essential in ensuring that the tool that was created actually met
the needs of the target users.
The current focus of bioinformatics-related work is in the context of CryptoDB (www.cryptodb.org), which focuses on the pathogenic organism Cryptosporidium and ApiDB (www.apidb.org), an umbrella group for CryptoDB and related organisms, funded under an NIH/NIAID contract. In this work we designed and implemented the web-based user interface to the biological database and created interactive visualizations that permit the biologist users to explore and understand the results of their database searches. In addition, we have the opportunity to apply our Òlessons learnedÓ from previous projects. For example, we have developed a tool for rapid prototyping of the user interface for the GUS (Genomics Unified Schema) WDK (Web Development Kit). The WDK software is designed to permit easy creation of Òquery-basedÓ websites. We have developed an XML-based specification and code generation module to simplify the process of creating and modifying the pages that comprise that site. This tool was used in developing the site seen at www.cryptodb.org.
Another application of Òlessons learnedÓ is in our implementation of a comparative genomics visualization tool for the site. The users of the site are accustomed to viewing the results of their single-genome queries in the context of GBrowse displays. Knowing that these users already understood these displays and could think about their work in terms of these displays, we extended the GBrowse framework to supports comparative genome visualization. This approach reduced the learning time of users, and likely increased the acceptance rate of the newly developed visualizations.
Ongoing and Future Work: Design and Verification of Concurrent
Software
Concurrency is an element of much
of the development of new software systems for applications such as e-commerce,
online financial systems, and other high-assurance applications. A challenge in the design and
implementation of such software is to safely accommodate and optimize
concurrency and synchronization.
Thus, support for those who design and verify concurrent systems is
increasingly important.
Although methods exist to address
the problem of verification, they are typically exhaustive approaches that
operate by applying reachability analysis or temporal-logic model checking to
explicit behavioral models of a system. However, this approach is limited by
the computational intractability of such exhaustive methods. To address the intractability
problems, designers must use explicit and very compact models of concurrency
and synchronization. These models
may be specified by the designer or derived from the code. The models are then analyzed to check
that they satisfy safety and liveness requirements (which the designer must
also specify). Unfortunately, the
process of generating these specifications introduces the opportunity for
fault-injection. The
usability of the modeling notations and representations employed by the
designers has the potential to impact the likelihood of such fault inject. Further, we observe that existing
notations for specifying models and properties are not particularly usable by
the general practitioner.
We seek to apply empirical
evaluation to discover which modeling metaphors and notations best support the
various design and verification tasks that underlie model-based
approaches. Further, we wish
to explore the extent to which notations that incorporate assumptions that
constrain the allowable interactions among processes simplify the task for the
user, reducing the likelihood of error and the time required to generate a
concurrent program that satisfies requirements related to concurrency,
synchronization, and performance.
Finally, we would like to explore the impact of the use of different
notations on the ability of students to learn, apply, and master the tasks of
design and verification of concurrent systems.