Reference no: EM133680300
Software Engineering
Programming Assignment - Part 1
Assignment Brief
In many situations, we may need to find a word based on one or more letters in it. For example, when making or completing a crossword you may want to find a word that has 4 letters, starts with J and ends with A. In this assignment, the program you will write will create a lexicon of words from various text files.
A word is a sequence of letters. Words will be treated in a case-insensitive manner. A simple option is to store all the words in lower case. Initially, a word is obtained as a sequence of characters separated by whitespace characters. Then numbers and punctuation characters, if any, must be removed. More specifically:
If the word "and" occurs twice in the file, it is only stored once in the lexicon.
If the word "you" also occurs as "YOU", only one version is stored.
Punctuation, such as commas, quotes, hyphens, full stops and question marks, are removed.
Numeric characters are removed.
If the word "don't" or the word "second-hand" appears in the input texts, they will be stored without punctuation (i.e. "dont" and "secondhand" respectively).
The 'word' "1", for example, will be ignored as it contains all numeric characters.
For each word, we also keep the following information:
The frequency: How many times the word appears in the input files.
The list of neighbours: A neighbour of a word w is a word that is of the same length and differs from w by only one letter. For example, if the word w refers to "cat", then:
"cathy" is not w's neighbour, for they are not of the same length.
"man" is not w's neighbour, for they differ by more than one letter.
"can" is a neighbour of w
"fat" is a neighbour of w
"cot" is a neighbour of w.
To simplify the design of this program, this assignment is broken up into a series of tasks for you to complete. By the end of this assignment, you will build up code to:
Load data from two text files and construct a lexicon
Sort the lexicon (implementing two different sorting algorithms)
Write the sorted lexicon data to a text file
Once you have completed this assignment, you will also need to write a brief report describing the sorting algorithms you selected.
Task 1 - Creating a Word Class
Instructions
From the assignment brief, along with the spelling of the word, a word in your lexicon needs to store the following information:
The frequency: How many times the word appears in the input files.
The list of neighbours: A neighbour of a word w is a word that is of the same length and differs from w by only one letter.
The best way this can be implemented is as a Word class, where you can populate your lexicon with Word objects.
Your task in this section is to create a class Word that is used to represent a Word in your lexicon.
When you do this, you will need to:
Think about what instance variables should be defined (and how they should be initialized)
Think about what methods you need to implement for this class
In a future task, you will need to sort your lexicon full of Word objects. In the labs you saw a similar example where you needed to sort a collection of Person objects. It may be useful to refer back to this to see what methods were required.
You may find that once you attempt the following tasks, you need to come back to this class and add additional methods.
Task 2 - Constructing a Lexicon
Instructions
This task involves creating a lexicon that can store your collection of Word objects, and populating it with words loaded from two text files (in1.txt and in2.txt).
More details on each of these steps is provided below, however in short, for this task you should:
Create and initialize a variable that represents your lexicon.
Define a function read_data() that can read words from a text file into your lexicon (Keeping in mind the constraints from the assignment brief).
Call the read_data() function to load all words from the text files in1.txt and in2.txt into your lexicon.
Step 1:
It is up to you to decide on how you want to represent your lexicon. It may be useful to refer back to the labs to see how we represented collections of Person objects for the sorting algorithms that you implemented.
Step 2:
In this step you should define a function read_data() that reads words from a text file into your lexicon.
This function should take two arguments, the lexicon that words should be loaded into, and the name of the file to load.
From the assignment brief, the criteria for a word is:
A word is obtained as a sequence of characters separated by whitespace
Words should be stored in lowercase
Any numbers or punctuation characters should be removed
Unique words should only be stored once in the lexicon (You can use the frequency of the word to show that a word appears twice in the files)
Step 3.
In this step, you should make two calls to your read_data() function to load both in1.txt and in2.txt into your lexicon. When calling these functions, you can hardcode the names of these files.
Task 3 - Defining Your Sorting Algorithms
Instructions
In this task, you should define two sorting algorithms that you can use to sort your lexicon of words. In the next task you will write a short program to use one of the sorting algorithms to sort your lexicon. Following sorting, words in your lexicon should be sorted in alphabetical order.
It is entirely up to you which two sorting algorithms you include, as long as they are in the subject and different regarding time complexity.
You should write one function per-sorting algorithm. Each of these functions should take a single parameter, the lexicon to be sorted. They should sort the lexicon in-place, so they do not need to return anything.
Task 4 - Sorting Your Lexicon
Instructions
Now you have two sorting algorithms defined, in this task you should create a small program that asks the user which sorting algorithm they would like to use, then sort the lexicon with that sorting algorithm.
You should ensure that you test both sorting algorithms to see they give you the same result. When you do this, ensure that you reinitialize your lexicon (Run Task 2 again) to ensure that your lexicon is not already sorted.
Task 5 - Populating the Neighbours
Instructions
Each word in your lexicon should also store a list of neighbour words.
From the assignment brief, a neighbour of a word w is a word that:
Is of the same length as w and,
Differs from w by only one letter.
For example, if the word w refers to "cat", then:
"cathy" is not w's neighbour, for they are not of the same length.
"man" is not w's neighbour, for they differ by more than one letter.
"can" is a neighbour of w
"fat" is a neighbour of w
"cot" is a neighbour of w
In this task, you should define a function add_neighbours() that takes one argument, your lexicon, and for every word in your lexicon, populates the list of neighbours that meet the criteria above.
When adding neighbours, you should also ensure that the list of neighbours for each word is in sorted order.
REMINDER: You are NOT allowed to use any of the builtin sorting functions/methods in Python (e.g. sorted() or .sort()).
Once you have written this function, call it on your lexicon to populate the neighbours for each word in your lexicon.
Task 6 - Writing Your Lexicon to File
Instructions
In this task you will write all words from your lexicon to the text file out.txt.
To do this you should define a function write_data() that takes two arguments, the lexicon that stores all words, and the name of the file to write to. This function should then write all words in the lexicon to file.
When writing to file:
The words must be written in sorted alphabetical order.
The information on each word, which is also written to the text file, includes the frequency and the list of its neighbours.
The list of neighbours must also be in alphabetical order.
To output each word, first the spelling of the word is displayed, followed by the frequency, and then followed by the word's neighbours (in alphabetical order, inside a pair of square brackets).
Steps
Define the function write_data() that takes two arguments, the lexicon that stores all words, and the name of the file to write to. This function should then write all words in the lexicon to file following the guidelines above.
Call the write_data() function to write your lexicon to the file out.txt. You can hardcode the filename when calling the function.
Open the out.txt file (You can find it in the Files tab) to verify that the output is as described above.
Report
Once you have finished the code for this assignment, you must write a report that describes the sorting algorithms you used.
In this report, you must:
Describe which two sorting algorithms you selected, and the reason for your choice
Describe the difference between the two algorithms in terms of time complexity (i.e. Big-O) in the best, worst and average cases
Describe the difference of the two algorithms regarding time usage (i.e. how many seconds) for sorting the list of Word objects
Please submit your report as a .pdf file.