Implementing a relatively simple text-compression scheme

Assignment Help JAVA Programming
Reference no: EM13712367

You will be implementing a relatively simple text-compression scheme. your code should work as follows:

1. It should run from the command line, using the command:

java Compress <FILE_NAME>

where the <FILE_NAME> argument is the name of a text-file found in the same directory as the executable program. (This name is passed into your Java program as an argument, and will then be accessible from position 0 in the array args that is input to the main() method.

2. It should read through the text-file, word-by-word (using white-space to delineate words, as usual, so "words" may contain punctuation, etc.). For each word, it should count occurrences of any substring of 3 or more characters. For instance, if it reads in the word "googoo" it should count 2 occurrences of the substring "goo", 1 occurrence of "oog", and 1 occurrence each of "goog", "oogo", "ogoo", "googo", and "oogoo". Finally, it should count one occurrence of 6-letter substring, "googoo".

This counting should occur over the entire file, and it should be handle efficiently using a hash-map structure to map substrings to the number of times they occur. You can use a built-in Java structure here.

3. Once the counting is complete, the hash-map will contain a collection of substrings, each with a corresponding count value (1 or greater). You should then order these substrings:

- Create a new class that can store a substring and its count value.
- Make that class properly Comparable to other things of its type. In the compareTo ordering, each object should be ordered by its impact factor, which is the product of the substring's length and the number of times it occurs. For instance, if the substring "the" occurs 100 times, it has an impact factor of (3 × 100 = 300).

Two of your class objects should be ordered so that things with higher impact factor come before things with a lower impact factor. Thus, if "the" has impact factor 300, and "goo" has impact factor 100, then "the" comes before "goo" in comparison ordering.

- Once you have completed the class, place each of your substring and count value pairs into a priority queue, so that they will be in order, with highest impact factor at the front of the queue. Again, you can use a built-in Java structure here.

4. Now, you can generate a compressed encoding of the highest-impact substrings. For each of the first 52 such substrings (or al l of them, if your input doesn't contain 52 distinct substrings of length 3 or greater), you will encode each such string with a 2-letter coding, consisting of the plus sign '+', followed by a single character (in range a-z or A-Z).

For instance, if "the" is our highest-impact substring, then its encoding would be "+a".

Note 01: when generating codes, you should ignore all substrings that either contain, or are contained by, one you have already encoded. For instance, if you have already encoded "ther", then you should ignore any future substrings like "there", which contains it, or "the", which is contained by it. You should still attempt to generate 52 distinct codings, if that is possible after discarding overlapping substrings.

Note 02: for this to work, it assumes that the text-file does not actually include the special marker character, '+'; if this is not so, another special character would have to be used.

5. Your code should now read through the file again, writing out a compressed version to a second file. To do compression, read in each word again, and replace each substring in the word with its encoding, if such an encoding exists, and write the result.

For instance, if we read in the word "teethes", and we have an encoding of "+a" for "the", then we should re-write the word as "tee+as".

Note 01: if a word contains multiple possible substrings, then you should replace them all. When there are multiple choices, the replacement should happen by order of length of substrings, so that we get maximum compression of each word. For instance, if we have an encoding of "+a" for "the", "+b" for "theo", and "+c" for "log", then the word "theology" would first be encoded as "+blogy", since "theo" is the longest substring; it would then be encoded finally as "+b+cy", and written to the output file. If a word contains multiple possible substrings of the same length, remove them in the order given by Java String sorting (this means that every word has a uniquely defined coding).

Note 02: when writing to the output file, make sure that line-breaks and other white-space are preserved from the original file.

Note 03: the output file should have a name related, but not identical, to that of the original input. In particular, if the input is named "FILE.txt", then the compressed output file should be named "FILE_comp.txt" (where "FILE" can be any file-name).

6. The code above will somewhat compress a text-file, but we will not be able to de-compress it, unless we know the encoding used. Modify your code so that when it creates the output file, it first writes information about the compression into the first part of the file (in a format of your own choosing), and then follows it with the compressed text. Then, write a second program, which can also be called from the command line:

java Decompress <FILE_NAME>

When called, this should open the named file, if possible, and (assuming it is in proper format), de-compress it, producing a new version of the file that is identical to the original input to the compression program.

Reference no: EM13712367

Questions Cloud

Given the demand and cost estimates : Given the demand and cost estimates, what price should you change if you want to maximize your weekly profit?What output should you produce? What is your weekly profit?
Starting with the differential form of the cauchy equation : Starting with the differential form of the Cauchy equation, derive a differential equation for the kinetic energy, k = 1/(2*u^2) , of a moving fluid.
Steam steadily flows through turbine to produce mechanical : Steam steadily flows through a turbine to produce mechanical work. At the inlet, the pressure, temperature and velocity of the steam are 10 MPa, 450 °C and 110 m/s, respectively. The water exits this turbine at pressure of 10 kPa with the quality of ..
The wage-skills relationship in the labor market : 1. In the US, the wage-skills relationship in the labor market before government intervention is wus=100+0.4s, where s is the number of "efficiency units of skill" and w is the weekly wage. In Canada, the wage-skills relationship is WC=250 + 0...
Implementing a relatively simple text-compression scheme : You will be implementing a relatively simple text-compression scheme - It should run from the command line.
Explain the effect of fillers in altering polymer properties : Explain the effect of fillers in altering polymer properties? What are some of the main differences between thermoplastics and thermo sets?
Phil has two periods of work remaining prior to retirement : Phil has two periods of work remaining prior to retirement. Assume that Phil maximizes the present value of his expected lifetime earnings and his discount rate is 10 percent. He is currently employed in a firm that pays him the value of his ma..
Explain glass transition temperature : Explain glass transition temperature? Present some of the key properties and typical uses of Al2O3? What are main ingredients used in making Pig Iron and how is the oxygen removed.
The market demand for train service : (b) Suppose that the firm is a monopolist. Assuming the firm produces a positive level of output, calculate the output and price it sets. Explain why the profit-maximizing price is greater than the monopolist's marginal cost.

Reviews

Write a Review

JAVA Programming Questions & Answers

  What is server-side and client-side scripting?

1. What is Server-side and Client-side scripting? Explain the differences between server-side and client - side scripting languages. Please provide 3 advantages and disadvantages of each. Please provide applicable references in APA style

  A method that takes a two-dimensional array

A method that takes a two-dimensional array of int's as a parameter and searches the array for the second parameter, returning true if the second parameter matches any of the integers in the array, and false otherwise.

  Create your own online store web site selling products of

create your own online store web site selling products of your choice.create pages that allow you to search and buy

  Write the method called print triangle type.

Write the method called printTriangleType. This method accepts three integer arguments representing the lengths of the sides of a triangle and prints the type of triangle that these sides form. Here are some sample calls to printTriangleType

  Implement a shopping cart class with user interface

project will be to implement a shopping cart class with user interface (UI) that contains main() in Net Beans. The UI class will be used to perform user input/output and to invoke the appropriate methods of shopping cart class. When your program star..

  The commission employee inherits

The Commission Employee inherits from the Employee class. A Commission Employee contains a commission rate and a sales amount variable which are used as part of the pay calculation. An explicit value constructor should be provided to set all 3 val..

  Utilizes a good design process

Analyze, design, and document a simple program that utilizes a good design process and incorporates sequential, selection and repetitive programming statements as well as at least one function call and the use of at least one array.

  Write an application that enables users to enter student ID

Write an application that enables users to enter student ID and three exam scores. Provide a method to compute and returnthe overall exam average.  Provide another method that prints all scores and the average value formatted with no digits to the ri..

  Define java implementation to implement the requirements

Produce a Java implementation to implement the requirements of Question 1, that is, to perform the 32-bits two complement and 32-bit floating-point conversion of a given number.

  Write a method that returns the last digit of an integer

Write a method named lastDigit that returns the last digit of an integer - It should work for negative numbers as well.

  Program to track hourly employee arrival and departure time

THE JAVA SOURCE CODEA company hires you to write a program to track hourly employee arrival and departure times from work. In essence, you are tasked to make an online time clock. The time clock shall keep a history of an employee’s hours for a two-w..

  Write a method, insertat that takes four parameters

Write a method, insertAt that takes four parameters

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd