Write a function that load the contents of a given text file

Assignment Help Computer Engineering
Reference no: EM131917138

Programming Project Assignment: Data Compression using Huffman Codes

Introduction

This assignment focuses on building a lossless data compression algorithm using Huffman Codes. In 1952, David Huffman developed a way to find a set of optimal prefix codes for a given set of input symbols. Essentially, Huffman Codes are variable-length bit strings which uniquely represent a set of input symbols. The main benefit offered by this scheme is that the more common the input symbol, the fewer the number of bits that are required to represent it. Clearly, this is advantageous when compressing data.

Compressing ASCII Text using Huffman Codes

The algorithm always begins with the source set of symbols to be compressed. For this assignment, we will focus only on using ASCII text but the algorithm is applicable to any set of symbols. Below is an example source string to be compressed:

to be or not to be

The first step is find the frequency of each letter in the source text. These symbol frequencies are a fundamental aspect of the algorithm. Table 1 gives a summary of these frequencies for this example. One approach to compressing this data is to use a block encoding representation that uses the minimum number of bits that permit all the letters to be encoded, i.e. 3 bits per letter for this example. Since there are 18 characters in this string, 54bits (3 x 18) would be required to represent this sentence using block encoding. Using the Huffman Codes shown in Table 2, the same sentence can be represented with just 47bits - a 13% reduction in size.

Letter

Freq

Space

5

o

4

t

3

b

2

e

2

n

1

r

1

Table 1 - Letter Frequencies

The second step of the algorithm is to construct a binary tree using the data in Table 1. The pseudo-code below describes a way to build a binary tree. It makes use of a priority queue to construct the tree, i.e. in this kind of data structure items with higher priority are placed at the front of the queue. In this case the letter frequency is used as the priority indicator with lower letter frequencies having higher priority.

CREATE leaf node for each letter

ADD all leaf nodes to priority queue

LOOP WHILE more than one node in priority queue REMOVE two nodes from priority queue*

CREATE new node with two removed nodes as children

SET new node's priority to sum of two child node priorities ADD new node to priority queue

END LOOP

REMOVE remaining node from priority queue SET node as root

*when more than two items have the same priority, any node pair is permitted

The third step of the algorithm is to traverse the constructed binary tree to build Huffman Codes for each letter. To do this all links within the tree must be labelled 0 or 1. By convention, each left child link is a 0 and each right child link is a 1. Figure 1 gives an example binary tree for the example string.

1887_Figure-1–Example-Binary-Tree.jpg
Figure 1 - Example Binary Tree

Each Huffman Code is found by traversing the tree from the root node to the letter in question. Table 2 shows the result of traversing the tree for each letter in the example string.

Letter

Huffman Code

Space

00

o

11

t

011

b

010

e

100

n

1010

r

1011

Table 2 - Huffman Codes

Assignment Tasks

There are 8 tasks to complete for this assignment worth a total of 100 marks. Each task needs to be completed before the next one can be attempted (except the first task which can be completed at any time). You have been provided with a sign-off sheet to be completed during tutorial sessions by your tutor. Your tutor may give you partial marks for any incomplete tasks and so please try to complete as many as possible. It is therefore essential that you attend the remaining tutorial sessions. After you have completed a task you must ask a tutor to sign it off on your sheet.

NOTE: You should take extra care to keep your sign-off sheet safe because you will need to provide a scan or photograph of it when you submit your assignment to Blackboard.

Task 1

Huffman Coding is one example of a lossless compression algorithm. Write a short 500-750 word academic report describing the similarities and differences between lossless and lossy compression algorithms and under what circumstances you might favour one kind of compression algorithm over the other. Your report should include brief descriptions of one example algorithm that is used for each kind of compression (excluding Huffman Coding). You should try to use diagrams and illustrations where applicable to help your explanations of your chosen compression algorithms. Marks will be given for good UWE Harvard referencing, good grammar and spelling, and a good report structure.

Task 2

Write a function that loads the contents of a given text file into memory. Once loaded into memory you should output the text to the Console Window. You are expected to use the ‘ToCompress.txt' file for this part of the assignment (which is provided for you on Blackboard).

Task 3

The following code represents a letter-frequency pair structure:

struct LetterFrequencyPair
{
char character; int frequency;
};

Using this structure and the lecture slides to help you, develop a Linked List data structure that can be used to store a list of letters-frequency pairs. You should use the following test data to demonstrate to your tutor that you can store letter-frequency pairs in your Linked List:

Character

Frequency

A

2

a

3

{a space character}

5

;

1

1

2

Task 4

For this task, you need to write a function to find the frequency of each letter in the text supplied in the ‘ToCompress.txt' file. Remember this is a lossless algorithm and so your implementation must be case- sensitive and include all punctuation. You will need to make use of the code you produced for Task 2 and Task 3 to complete this task, i.e. a Linked List of letter-frequency pairs. Finally, you should print the letter- frequency pairs you have discovered to the Console Window.

Task 5

The following code represents a binary tree node that can be used to build a Binary Tree:

struct BinaryTreeNode
{
struct LetterFrequencyPair* letter_frequency_pair; struct BinaryTreeNode* leftChild;
struct BinaryTreeNode* rightChild;
};

Write a function that takes two BinaryTreeNode parameters, merges them according to the Huffman Coding Algorithm (i.e. it creates a new node, points the left and right child to the supplied BinaryTreeNode parameters, sets the frequency of the new node to be the sum of the parameter's frequencies and returns the new node). You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

a

2

b

3

NOTE: you will need to initialise the new node objects leftChild and rightChild equal to NULL to indicate that they do not point to anything

Task 6

Write a function that takes a Linked List of BinaryTreeNode objects as a parameter and returns the BinaryTreeNode with the lowest frequency. This BinaryTreeNode should also be removed from the Linked List. You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

x

4

y

2

z

5

w

3

NOTE: you will need to create a Linked List of BinaryTreeNode objects using the four letter-frequency pairs given above

Task 7

Write a function that builds a Huffman Tree making full use of the code produced in the previous 5 tasks (Tasks 2-6). You should make use of the algorithm given in the explanation above when building the Huffman Tree for the supplied ‘ToCompress.txt' file. You should demonstrate to your tutor that your program has correctly built a Huffman Tree for this file.

Task 8

The final task is to create a recursive function that prints out the Huffman Codes using the Huffman Tree created in Task 7. Your function needs to recursively traverse the Huffman Tree whilst building the Huffman Codes as it goes. All Huffman Codes should be output to the Console Window together with the letter they represent, e.g. A: 001

NOTE: you will need to use Dynamic Memory Allocation for your Linked Lists and Huffman Tree. To avoid Memory Leak, you must ensure that all allocated memory is deallocated before the program exits. Extra credit will be given if this is correctly done, i.e. there should be no examples of Memory Leak.

Reference no: EM131917138

Questions Cloud

What is the change in temperature of the solution : The temperature of the resulting solution increases from 22.5 degrees celcius to 32.3 degrees celcius. What is the change in temperature of the solution?
What has become a key competitive : Internal innovation and dialog had always been a big part of Coloplast's way of doing business. This has over time resulted in a number of improvements.
Initiating precipitation of chloride : If a precipitate will not form, what chloride ion concentration will cause a precipitate of silver chloride to form?
What would the minimum penalty cost have to be : Suppose that the company wants to crash the project a total of 4 days. Based on the information provided in the table and in part c) and the results.
Write a function that load the contents of a given text file : Write a function that loads the contents of a given text file into memory. Once loaded into memory you should output the text to the Console Window.
Trade credit-nominal cost and effective cost : The Thompson Corporation projects an increase in sales from $1 million to $3 million, What is the effective, or equivalent, annual cost of the trade credit?
The income tax expense and the income tax currently payable : what are the income tax expense and the income tax currently payable if Abbey and Benjamin file a consolidated tax return as an affiliated group?
What is the concentration of f : What is the concentration of F- in the solution when the reaction is complete?
What is the best use for these common-size : What is the best use for these common-size? when would you use them?

Reviews

Write a Review

Computer Engineering Questions & Answers

  How will astronomy archives survive the data tsunami

Case Study: How Will Astronomy Archives Survive the Data Tsunami? Astronomers collect and generate petabytes of data

  Find out if the graph is connected

show both graphs, then I need to display which portions of the 2 graphs are "connected" or the same. The areas that are the same need to be put into a minimal spanning tree.

  What is the dissimilatries between rfp and rfq

What is the dissimilatries between an RFP and an RFQ? Are they different, or the same? How does RFI associated to them

  Explain why 802.11b is the first popular standard

the first widely popular standard and still by far most used by IT industry today.

  When does quicksort work best and when does it work worst

When does quicksort work best, and when does it work worst? Write a recursive procedure to implement the insertion sort algorithm.

  Questionin following case statement replaces the 14 7 3

questionin following case statement replaces the 14 7 3 with values that are pulled from a table known as

  What do you estimate is the bandwidth of a local loop

The telephone line that connects your house or business to the central office carries your conversation. What do you estimate is the bandwidth of a local loop?

  Validate and correct the errors in the web page file

While it may sound simple to validate your HTML code, it is often times difficult to understand the error messages validation services return. Using the W3C Markup Validation Service, validate and correct the errors in the web page file.

  Write a markup document to provide a form that collect names

Write a markup document to provide a form that collects names and telephone numbers. The phone numbers must be in the format ddd-ddddddd.

  Initialize the given software stack pointer

Write a PIC18F assembly program at address 0x150 to compare two strings of 10 ASCII characters. The first string is stored starting at 0x30.

  Compute the average for each of the usability questions

Compute the average for each of the usability questions in section 2 and present it in a results spreadsheet. Include a couple of analysis statements discussing

  Write an application that includes two additional methods

Write an application that includes two additional methods in addition to the Main( ) method. One method should return a string consisting of four or five lines.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd