Produce a driver program and the information-retrieval

Assignment Help Data Structure & Algorithms
Reference no: EM13944085

As different authors tend to use different vocabularies and to use common words with differing frequencies. Given an essay or other text, it is interesting to find what distinct words are used and how many times each is used.

I'm trying to produce a driver program and the information-retrieval package using ordinary binary search trees.

I really want to complete this program using the outline I'm presenting you here:

1) Create the data structure (binary search tree).

2) Ask the user for the name of a text file and open it to read.

3) Read the file, split it apart into individual words, and insert the words into the data structure. With each word will be kept a frequency count (how many times the word appears in the input), and when duplicate words are encountered, the frequency count will be increased. The same word will not be inserted twice in the tree.

4) Print the number of comparisons done and the CPU time used in part 3. (By CPU time, I mean the amount of time for which the CPU was used for processing the comparisons (reading the file, splitting & inserting words)).

5) That I can, If I wish, print out all the words in the data structure, in alphabetical order, with their frequency counts.

6) Put everything in parts 2-5 into a do . . . while loop that will run as many times as the user wishes. Thus the user can build the data structure with more than one file if desired. By reading the same file twice, the user can compare time for retrieval with the time for the original insertion. (Let me clarify here, because parts 2 to 5 will be in a "do while" loop, then I can compare time for retrieval with the time for the original insertion).

Some additional specifications I was trying to implement to the program:

* I want the input to the driver will to be a file. The program will be executed with several different files; the name of the file to be used should be requested from the user while the program is running.

* A word is defined as a sequence of letters, together with apostrophes (') and hyphens (-), provided that the apostrophe or hyphen is both immediately preceded and followed by a letter. Uppercase and lowercase letters should be regarded as the same (by translating all letters into either uppercase or lowercase, as you prefer). A word is to be truncated to its first 20 characters (that is, only 20 characters are to be stored in the data structure) but words longer than 20 characters may appear in the text. Non-alphabetic characters (such as digits, blanks, punctuation marks, control characters) may appear in the text file. The appearance of any of these terminates a word, and the next word begins only when a letter appears.

* The program should not allow to be changed at all when I change implementation of data structures later.

Final specifications for the functions to I want implemented first with binary
search trees:
=========================================================
void update(const String &word,
Search_tree< Record > &structure, int &num_comps);

postcondition: If word was not already present in structure, then word has been inserted into structure and its frequency count is 1. If word was already present in structure, then its frequency count has been increased by 1. The variable parameter num_comps is set to the number of comparisons of words done.
=========================================================

void print(const Search_tree< Record > &structure);

postcondition: All words in structure are printed at the terminal in alphabetical order together with their frequency counts

Attachment:- Binary Search Tree.zip

Reference no: EM13944085

Questions Cloud

Method of production results in optimal output : Discussion Post 4 :Honda uses flexible plans in the manufacturing of its cars.  Discuss whether this method of production results in optimal output.For further information, read The Wall Street Journal, September 23, 2008, p. B1.250 words in length, ..
When you try to close the sale : When you try to close the sale, what is your ultimate goal?  What is a market segment?
What was real gdp for 2014 : What was Real GDP for 2014? What does GDP tell us? How did GDP change from 2013? What caused these changes?
Combination of humans and equipment interacting : Human-Machine systems defined as a combination of humans and equipment interacting to achieve some desired result. Types of human-machine systems are Manual systems, Mechanical systems, and automated systems. **Systems Components are The human, Th..
Produce a driver program and the information-retrieval : All words in structure are printed at the terminal in alphabetical order together with their frequency counts
A wide assortment of business experts predicted : In the late 1990s, a wide assortment of business experts predicted that e-business was completely changing the nature of business transactions. This was reflected in the astronomical prices paid for the stocks of dot-com companies that were just begi..
Compare different components of the business perspective : Compare and contrast the different components of the business perspective as a collective whole vs. individual parts. Defend or criticize the required and/or promoted social responsibility of companies
The principle of lime soda process : What is the principle of lime Soda process? Give their functions with chemical reactions. 2. Write notes on reverse osmosis and ultra filtration.
Briefly explain the concept of supply chain management : Briefly explain the concept of supply chain management.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Convert the following expression in postfix

Convert the following expression in postfix (reverse Polish notation). Remember the rules of precedence for arithmetic operators. To get full credit, you need to show all work done. i.e. sample snapshot of the stack

  Give a worst-case algorithm

The input is an N by N matrix of numbers that is already in memory. Each individual row is increasing from left to right. Each individual column is increasing from top to bottom.

  Write a program that read up to ten letters into an array

Write a program that will read up to 10 letters (characters) into an array and write the letters back to the screen in the reverse order.

  Describe how algorithm works for adjacency matrix

All vertices according to their type (head, neck, shoulder, hip, front foot or rear foot). Note that as adjacency matrix has n2 entries, you can't scan entire matrix in O(n) time. Describe how algorithm works.

  Propose an efficient data structure

Propose an efficient data structure that may hold the tour operator's data using a normalization process. Describe each step of the process that will enable you to have a 2nd Normal Form data structure.

  Describe an algorithm that takes as input a list of n

Describe an algorithm that takes as input a list of n distinct integers and finds the location of the largest even integer in the list or returns 0 if there are no even integers in the list.

  Write algorithm to find median value using queries

As queries are expensive, you would like to calculate median using as few queries as possible. Provide the algorithm which finds median value using at most O(log n) queries.

  Implement iterative version of algorithm heapify

Using any programming language to implement iterative version of algorithm HEAPIFY. Show your algorithm by running it on the array that contain your name characters.

  Prove no asynchronous t-byzantine-robust broadcast exists

Prove that no asynchronous t-Byzantine-robust broadcast algorithm exists for t=N/3. Prove that during the execution of Algorithm 14 .6 at most N(3N + 1) messages are sent by correct processes.

  Developing a new customer order entry system

The system development team at Wilson Corporation is working on developing a new consumer order entry system. In the process on designing the new system,

  Part 1 - report write a 2000-word report that describes a

part 1 - report write a 2000-word report that describes a suitable methodology from the literature for the purpose of

  What is the minimum number of attendants

A nursing home employs attendants who are needed around the clock. Each attendant is paid the same, regardless of when his or her shift begins. Each shift is 8 consecutive hours.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd