Web 2.0

Web 2.0 has been growing at a rapid pace empowering end-users with a vast set of applications dedicated to improve their experience while using the Web. This improvement comes in the shape of increased personalization that enables the end-users to navigate and search the Web based on their own needs. One of the key icons of Web 2.0 applications is mashups; they are essentially Web services that are often created by end-users. They aggregate and manipulate data from sources around the World Wide Web. Mashups manipulate data using filtering, sorting, truncating and several other operations and all these manipulation are performed based on end-user personalized needs. Examples of mashup platforms over the Web are yahoo pipes and Intel MashMaker. Surprisingly, research related to mashups performance received little attention in research community. My research is targeted towards providing architectures, protocols, and schemes to enhance mashups performance and scalability.



Mashups, while enhancing personalization and end-user participation, also introduce new scalability and performance challenges. Unfortunately, these issues have received little attention from the research community. None, to our best knowledge, has studied the performance characteristics of mashup platforms or proposed techniques for improving the same. Although mashups are conceptually Web services created by end-users, several differences exist between mashups and Web services. First, mashups are designed by end-users. This implies that mashups are highly personalized based on end-user needs. Second, since mashups are designed by end-users, mashup platforms typically host several thousand distinct mashups, whereas the number of distinct Web services in a typical Web services portal is relatively small. Thus, the data generated in a mashup platform is orders of magnitude greater than its Web services counterpart, whereas the opportunity for data reuse is much lower. Third, mashups fetch data from large numbers of diverse data sources distributed across the Internet. These data sources vary widely with respect to the characteristics of their data. This implies that the costs of executing mashups depend upon external conditions upon which the mashup platform has little control. Fourth, mashups are designed by non-technical end-users who are likely not aware of the efficiency and performance implications of their design. Hence, mashups are less structured and it is unrealistic to expect mashups to be optimized from a performance standpoint.



The previous differences between mashups and Web services imply that special attention has to be given to mashup platforms in order to improve their performance. Towards targeting the previously mentioned challenges, we propose a mashup platform that enhances the efficiency and scalability of executing mashups, our platform embodies the following contributions:

  1. We model the performance of mashup platforms.

    We present a model for representing mashups and analyzing their performance; this model defines mashup platforms, mashups, and their components. Modeling for component's inputs, outputs, representation, and execution cost is introduced. We believe to be the first to model and analyze the performance of mashup platforms. In our model, we propose a novel index structure for indexing mashup components. This index is therefore used for accessing mashup components during mashup execution. Using this index structure improves the performance of mashup execution.



  2. We design operator merging technique and operator reordering rules.

    Repeated execution of identical mashup operators leads to deficiency in executing mashups. Detection identical operators and merging them is necessary so that identical operators are executed once. Mashups are not optimized at design time because they are designed by end-users who are not aware of mashup execution efficiency. Consequently, we provide a set of operator reordering rules that arrange mashup operators in the most optimized order. Also, operators reordering increases the efficiency of the identical operators detection process. Both, operator merging and operator reordering lead to more efficient mashup execution.



  3. We provide mashup caching framework.

    Our caching technique takes into consideration common operators across mashup so that cached data are chosen carefully to increase the value of our cache and to make mashup execution more efficient. Our technique is based on a greedy dynamic algorithm that depends on the changes on mashup request rate and the reusability of components across mashups; by doing this, we cache data that yields the best benefit in minimizing delay of executing mashups. Our caching architecture is also designed to maximize the utilization of our cache by finding partial results of mashups in the cache and that helps to avoid the execution of part of the mashup workflow.



  4. We design a distributed architecture for executing mashups.

    Ordinary mashup platforms are based on central server architecture which degrades their scalability. By providing our distributed architecture, we increase the scalability of mashup platforms. We propose a planning algorithm that distributes the execution of mashups on several distributed nodes in a network. The planner decides which mashup operator is assigned to which node. The assignment of operators to nodes is performed in a way that minimizes mashup execution time. Our architecture posses the characteristics of load balancing and fault tolerance.



We believe to be the first to study and analyze mashup performance by proposing the previously mentioned architectures, protocols, and schemes.