In this assignment, you are going to
develop a collaborative Web caching system. Your system will comprise
of two web caches (referred to as CA and CB) that are run on two
different machines. Each cache is be multithreaded and is assumed
to support a set of clients. When CA (or CB) receives a request from a
client, it first checks whether the document represented by the URL is
available locally. If the document is available, it is served directly
to the client. Otherwise, CA contacts CB to check whether it has the
document. If so, it obtains the document from CB and serves the client.
If CB too does not have the document, CA contacts the origin server to
obtain the document (note that the cache that originally recevied the
client request contacts the origin server and not the other cache).
When the document is obtained (in either way) it is stored locally and
served to the client.
Each
cache is configured to store certain maximum number of documents
(MAX_DOCS). If the number of documents currently in the cache is equal
to MAX_DOCS and a new document is obtained, one of existing document
has to be removed from the cache. You are going to implement the Least
Recently Used (LRU) cache replacement strategy, where the document that
has not been accessed for the longest duration is going to be replaced.
Your caches should interact with the external clients and the
servers as per the HTTP protocol. However, the communication protocol
between caches (inter-cache communication protocol) is propreitery to
your system (i.e., you choose the message and the format of the
messages). Your cache should accept the MAX_DOCS as a configuration
parameter.
You will also implement two HTTP clients each of
which repeatedly reads URLs from an input file and issues read requests
to one cache (one client will issues requests to CA and the other to
CB). You may
assume that the requests are for simple static html pages (i.e., you do
not need to support cookies or queries with embedded parameters).
In
addition, you will do a small performance study. You will use two input
files (provided by us shortly) to drive the clients. You will measure
local hit rate, remote hit rate and miss rate for each cache. Local hit
rate is quantified as ratio of numbers of local hits to the total
number of requests. Analogously, remote hit rate is ratio of number of
remote hits to the total number of requests. Miss rate is 1.00 - (local
hit rate + remote hit rate).
Your proxy should be multithreaded. A thread should be spawned each
time a request is received at the cache. Pay close attention to
synchronization issues, esepcially when implementing cache replacement.
This assignment can be either
implemented in C, C++ or Java. All three undergrads will work on this project together.
Resources:
You can find many good articles on pthreads. There
are also some good books on pthreads and multithreaded programming.
Here are some good articles available on the web:
- http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
- http://www.llnl.gov/computing/tutorials/pthreads
Submission Deadline:
May 4th 2010, Demo After Exam.