What changes would you have to make to your design

Assignment Help JAVA Programming
Reference no: EM131940988

Assignment

Implement the index functions. These include adding a file to the index, and removing a file from the index, and reading and writing the index from/to a file. (Updating the index when a file has been changed, can then be done by removing and then re-adding a file.) Other operations include searching the index for a given word, and returning a Set of pairs (document ID and position) for that word.

Finally, you will have to implement the Boolean search functions of the main user interface. (This is complex enough, that it should have been another project!) I suggest you start with an "OR" search, then worry about implementing the "AND" and "PHRASE" search functions.

When building the index, keep in mind you will need to define what you mean by "word". One possibility is to strip out any non-digits or letters, and convert the result to all lowercase, both when you build the inverted index and when you read the search terms entered by the user

Implementing Boolean Search:

The exact method depends in part on how you implement the inverted index. In the suggested implementation (a Map with words as the keys, and a List or Set of (document ID, position) pairs as the values), you could implement the Boolean searches using algorithms similar to the following (you can come up with your own if you wish):

OR Search

This is the easiest one to implement. The general idea is to start with an empty Set of matching files. Then add to that Set, the files containing each search term; Just search the Map for that word, and add each document found (if any). The result is the OR search results, the files that contain any word in the search list. (If user inputs no search words, say " ,.", then no files are considered as matching.)

AND Search

This is done the opposite way from an OR search, and is only a little harder to implement. The idea is to start with a set of all files in the index. Then for each search term, for each file in the Set, make sure that file is contained in the index for that search term. Remove any files from the set that don't contain that word. The resulting final set is the documents matching all search terms. (If user inputs no search words, say " ,.", then all files are considered as matching. If that isn't the behavior you want, you need to treat that as a special case.)

PHRASE Search

This is the hardest search to implement. Unlike the OR and the AND searches, with PHRASE searching, the position of the search terms in the files matters. The algorithm I came up with is:

Create an initially empty Set of Pair objects.

Add to the set the Pair objects for the files that contain the first word of the phrase. This is the easy part: Just lookup that word in the Map, and add all Pair objects found to a set.

The Set now contains Pair objects for just the files that might contain the phrase. Next, loop over the remaining words of the phrase, removing any Pairs from the set that are no longer possible phrase continuations. (Actually, I just build a new Set.)

For each remaining word in the phrase:

Create a new, empty set of Pairs.

For each Pair in the previous set, see if the word appears in the same file, but in the next position. If so, add the Pair object for the word to the new set.

An example may help clarify this. Suppose the search phrase is "big top now". The set initially contains all the Pair objects for the word "big". Let's say for example, that set looks like:

(file1,position7), (file1,position22), (file3,position4)

For each Pair object in that set, you need to see if "top" is in that same file, but the next position. If so, you add the Pair object for that to the new Set. The (inner) loop for this example checks each of the following:

Is a (file1,position8) Pair object in the Map for the word "top"?

Is a (file1,position23) Pair object in the Map for the word "top"?

Is a (file3,position5) Pair object in the Map for the word "top"?

If the answer is "yes", then add that Pair object to the new set. When this loop ends, the new set will contain the Pair objects for the phrase "big top" (pointing to the position of the word "top").

For example, suppose "top" is only found in (file1,position8) and (file3,position5). You replace the first set with this new set:

(file1,position8), (file3,position5)

Repeat for the next word in the phrase, using the set built in the previous loop.

Continue until the set is empty (so phrase not found), or until the last word of the phrase has been processed. The Pair objects remaining in the final set are the ones that contain the phrase; the position will be that of the last word of the phrase. (We only need to display the file name; in this project, the position of the phrase doesn't matter.)

In this part, you must implement the remaining operations of your search engine application: the index operations, and the searching.

Hints:

Keep your code as simple as possible; worry about clever extras in version 2. (You can always add features later, time permitting.) If you start with a complex, hard-to-implement design, you may (will!) run out of time.

The inverted index is naturally a Map, from words (the keys) to a Set of objects (the values). Each of the objects represents a document and a location within that document, where the word was found. I called these objects Pairs, since they are a pair of numbers, but you can use any name for your classes. Note, you will need to be able to go from a document number to a file name, when you display the search results

1 In the event your search turns up multiple "hits", how could you order the results to have the most relevant results listed first? (You don't have to implement this, just think about how it could be done with your design.)

2 What changes would you have to make to your design, to ignore the 30 most common words? (Real search engines do this so words such as "the", "and", "a", and so on don't waste memory or time. This is called a stop list.) Does your solution correctly handle the case when the search terms were all such common words?

3 Suppose you wanted to scale up your search engine, so that the full index doesn't fit completely in memory. How might that be done?

Reference no: EM131940988

Questions Cloud

Find the unknown concentration for the cell : Find the unknown concentration for the following cell. Pb(s)| Pb2+(aq, ?)|| Pb2+(aq, 0.1 M)| Pb(s) E = 0.053 V Answer in units of M.
Error in endpoint determination : how much would you be in error in your endpoint determination? (Report your error in ml of HNO3). The pKa of methyl orange is 4.10
Identify the professional membership organization : Describe some of the benefits of becoming a member of the organization at the professional level and/or the student level.
How does a gc-ms instrument operate : How does a GC-MS instrument operate? The individual steps that go through the machine to get the results given, and how to interpret the given numbers.
What changes would you have to make to your design : What changes would you have to make to your design, to ignore the 30 most common words? Is a (file1,position23) Pair object in the Map for the word top?
Determine the ideal number of clusters : Determine the ideal number of clusters. Using a standard distance formula measure the distance from each data point to each center point.
How the distinction relates to the scenario : Be specific and explain how this distinction relates to the scenario you posted. Also explain how prejudice and bias might create barriers to fulfilling.
Charge balance equations for solution of ammonium acetate : Write the mass balance (for ammonium acetate) and charge balance equations for a solution of ammonium acetate (NH4+CH3COO-).
Identify anapplication life cycle management product : Using Google or another search engine, identify anApplication Life Cycle Management product which could meet the needs of Sifers-Grayson.

Reviews

Write a Review

JAVA Programming Questions & Answers

  Recursive factorial program

Write a class Array that encapsulates an array and provides bounds-checked access. Create a recursive factorial program that prompts the user for an integer N and writes out a series of equations representing the calculation of N!.

  Hunt the wumpus game

Reprot on Hunt the Wumpus Game has Source Code listing, screen captures and UML design here and also, may include Javadoc source here.

  Create a gui interface

Create GUI Interface in java programing with these function: Sort by last name and print all employees info, Sort by job title and print all employees info, Sort by weekly salary and print all employees info, search by job title and print that emp..

  Plot pois on a graph

Write a JAVA program that would get the locations of all the POIs from the file and plot them on a map.

  Write a university grading system in java

University grading system maintains number of tables to store, retrieve and manipulate student marks. Write a JAVA program that would simulate a number of cars.

  Wolves and sheep: design a game

This project is designed a game in java. you choose whether you'd like to write a wolf or a sheep agent. Then, you are assigned to either a "sheep" or a "wolf" team.

  Build a graphical user interface for displaying the image

Build a graphical user interface for displaying the image groups (= cluster) in JMJRST. Design and implement using a Swing interface.

  Determine the day of the week for new year''s day

This assignment contains a java project. Project evaluates the day of the week for New Year's Day.

  Write a java windowed application

Write a Java windowed application to do online quiz on general knowledge and the application also displays the quiz result.

  Input pairs of natural numbers

Java program to input pairs of natural numbers.

  Create classes implement java interface

Interface that contains a generic type. Create two classes that implement this interface.

  Java class, array, link list , generic class

These 14 questions covers java class, Array, link list , generic class.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd