Program for searching by indexing text files, Programming Languages

Assignment Help:

Write a program that can facilitate searching by indexing text files according to words. In this task, you are given a large text file, sample.txt, which you will need to index the words stored in them.

To do this, you will separate out the words in the text file and index them according to their frequencies.  Your program shall count the number of unique words and store them in an appropriate Standard Template Library container.  The words are to be normalized to lower-case so that we do not have to deal with case-sensitivity. Your program will ignore the following:

  • Punctuations
  • Numerical numbers (1, 2, etc., but 'one', 'two' are to be treated as words)

Next, your program shall generate two output files, index.txt, and common.txt. At the start of the program, you shall prompt user to enter the threshold number. This number determines if the unique words are to be stored in index.txt or common.txt.

Unique words with frequency greater or equal than the threshold are to be stored in common.txt. Likewise, unique words with frequency less than the threshold are to be stored in index.txt.

As an illustration, suppose a text file, sample.txt, contains the following:

Give us a break!  It is a beautiful day.  We do not want to do programming today.  Do you want to go to the beach with us?

At program starts:

Enter threshold number: 2

The above indicates that user enters 2 for threshold number. Your program shall generate the two output files with following content (words sorted in ascending order):

index.txt

Total words: 15

beach              1

beautiful          1

break               1

day                  1         

give                 1

go                    1

is                      1

it                      1

not                   1

programming  1

the                   1

today               1

we                   1

with                  1

you                  1

common.txt

Total words: 5

a                      2

do                    3

to                     3

us                    2

want                2


Related Discussions:- Program for searching by indexing text files

Windows workflow foundation, Windows Workflow Foundation Microsoft windows ...

Windows Workflow Foundation Microsoft windows Work-flows foundation (WF) is an Enthusiasm technological innovation that provides an API, an in-process workflow website, and a rehos

Create a structure to represent deck of cards, We want to create a structur...

We want to create a structure that will represent a deck of cards (not necessarily full). Each card has a character (between '2' to '9' and 'T', 'J', 'Q', 'K', 'A') and a suit (dia

Find the cookies expiring, Your program can be invoked with option: -d date...

Your program can be invoked with option: -d date, where date is entered in dd/mm/yyyy format. In this case, it must only print the following string: Found cookies expiring bef

Java.., create a program that can determine the number of students that are...

create a program that can determine the number of students that are doing their final year for a particular program (e.g. BCOM Information Systems), calculate the required credits

Characteristics of procedure-oriented programming, Characteristics of proce...

Characteristics of procedure-oriented programming: Emphasis is on doing things (algorithms). Large programs are divided into smaller programs known as functions.

String data transfer instruction in assembly language, What are the string ...

What are the string data transfer instruction in assembly language? Explain all types of string data transfer instructions with examples of each? What will be the behavior of direc

Describe the term inter-process communication, UNIX Operating System 1....

UNIX Operating System 1. Explain all the layers present in a UNIX Architecture? List and explain each of them. 2. Describe the term Inter-Process Communication. What are var

Shell script to compare that given two files are same or not, Normal 0 ...

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

Fileless document and encryption(stegnography), code for using tree view co...

code for using tree view control and fill it with database

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd