Distributed Computing Systems (CSCI 4780/6780)

Programming Assignment 3

In this assignment, you are going to develop a collaborative Web caching system. Your system will comprise of two web caches (referred to as CA and CB) that are run on two different machines. Each cache is be multithreaded and is  assumed to support a set of clients. When CA (or CB) receives a request from a client, it first checks whether the document represented by the URL is available locally. If the document is available, it is served directly to the client. Otherwise, CA contacts CB to check whether it has the document. If so, it obtains the document from CB and serves the client. If CB too does not have the document, CA contacts the origin server to obtain the document (note that the cache that originally recevied the client request contacts the origin server and not the other cache). When the document is obtained (in either way) it is stored locally and served to the client.

Each cache is configured to store certain maximum number of documents (MAX_DOCS). If the number of documents currently in the cache is equal to MAX_DOCS and a new document is obtained, one of existing document has to be removed from the cache. You are going to implement the Least Recently Used (LRU) cache replacement strategy, where the document that has not been accessed for the longest duration is going to be replaced.

Your caches should interact with the external clients and the servers as per the HTTP protocol. However, the communication protocol between caches (inter-cache communication protocol) is propreitery to your system (i.e., you choose the message and the format of the messages). Your cache should accept the MAX_DOCS as a configuration parameter.

You will also implement two HTTP clients each of which repeatedly reads URLs from an input file and issues read requests to one cache (one client will issues requests to CA and the other to CB). You may assume that the requests are for simple static html pages (i.e., you do not need to support cookies or queries with embedded parameters).

In addition, you will do a small performance study. You will use two input files (provided by us shortly) to drive the clients. You will measure local hit rate, remote hit rate and miss rate for each cache. Local hit rate is quantified as ratio of  numbers of local hits to the total number of requests. Analogously, remote hit rate is ratio of number of remote hits to the total number of requests. Miss rate is 1.00 - (local hit rate + remote hit rate).

Your proxy should be multithreaded. A thread should be spawned each time a request is received at the cache. Pay close attention to synchronization issues, esepcially when implementing cache replacement.

This assignment can be either implemented in C, C++ or Java. All three undergrads will work on this project together.

Resources:

You can find many good articles on pthreads. There are also some good books on pthreads and multithreaded programming. Here are some good articles available on the web:

  1.  http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
  2.  http://www.llnl.gov/computing/tutorials/pthreads

Submission Deadline:

May 4th 2010, Demo After Exam.