CSCI 2720

Data Structures

Spring 2007

Project 3 -- String Compression  
Assigned: Thursday, March 8th
Due: Thursday, April 12th
See the Project 3 Grade Sheet. and questions and answers regarding project 3 . Also, here's a sample program using command-line parameters

Description

In this project you will implement two programs: The compression program is to be implemented using one of three algorithms: For these simulated compression algorithms, you can write out the 0s and 1s representing bits as characters, if you find it more convenient to do so.

For the selected compression algorithm, write the corresponding uncompression algorithm. To test your program we will compress an input file, view the compressed version to verify the format of the compressed file, and the uncompress the file to verify that the compression/uncompression was lossless.

Both the compression and uncompression algorithms should accept command-line parameters specifying the name of the input file and the name of the output file (in that order). You may choose to accept additional command-line parameters. Structure your program so that if a user types in the program name without the parameters, he/she will receive a help message describing the order and meaning of the expected parameters.

A representation of the Huffman Encoding Tree

One way to encode the Huffman Encoding tree is to encode each leaf with a "1" followed by the character that the leaf represents, and to encode each internal node with a "0", followed by an encoding of its left subtree, followed by an encoding of its right subtree. (A preorder traversal). Since each node in a Huffman Encoding tree is either a leaf node or has two non-null subtrees, this will be a unique representation.