Calculate the ranking score for each of the ten documents

Assignment Help Database Management System
Reference no: EM13841747

1. Vector Model

This question requires you to use the following data. Assume a repository of 10 documents over eight key terms. Table 1 gives the document-term table that shows the raw frequencies with which the eight key terms appear 1 in each of the 10 documents, as well as the TF values for a query document.

Using the information from Table 1, which documents would be returned by the following queries:

a) Term2 AND Term7

b) Term4 OR Term2

c) (Term2 OR Term7) AND (NOT Term7)

Task

Table 1: A2: Document-Term and Query-Term Table


Term 1  Term 2  Term 3  Term 4  Term 5  Term 6  Term 7  Term 8
Doc 1 4 8 9 0 10 8 0 9
Doc 2 1 5 0 0 12 0 1 3
Doc 3 0 3 0 0 0 4 2 0
Doc 4 1 0 4 3 9 0 0 0
Doc 5 0 4 0 0 0 5 1 0
Doc 6 1 2 2 0 3 1 0 1
Doc 7 0 5 3 4 0 0 4 2
Doc 8 0 7 0 3 0 0 3 3
Doc 9 0 5 0 0 0 4 1 2
Doc 10 0 3 4 0 0 2 4 0
Query  2 3 1 2 2 0 1 0

Is it possible to rank the documents returned in (a) to (c)? If it is possible, then supply the rankings in each case. If it is not possible, then state why.

Exercise 1: Answer the following questions.

a) Using the information from Table 1, calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures:

dot product using TF weight for both documents and query vectors cosine coefficient using TF weight for both documents and query vectors.

b) Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 2: Answer the following questions.

a) Using the information in Table 1, calculate the idf (inverse document frequency) weight vector. Make sure you show how your calculation was performed.

b) Construct a table similar Table 1, but, instead of raw term frequencies, show the tf-idf weights.

c) Using tf weights for the query vector, and tf-idf weights for the document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

d) How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 3: Answer the following questions.

a) This time, using tf-idf weights for both the query and document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

b) How does this ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 4: This time, use tf weights for the query vector, tf-idf weights for the document vectors, and the Dice coefficient rather than the Cosine coefficient as the similarity measure.

a) Compute the ranking scores for all documents. Show how your calculations were performed for the first document only.

b) How does the ranking compare with ranking obtained in Exercise 1? If there are differences between the rankings, then discuss why you think these differences occurred.

2.2. IR Evaluation

Exercise 5: The following data displays retrieval results for two different algorithms (Algorithm 1 and Algorithm 2) in response to two distinct queries (Query 1 and Query 2). An expert has manually labelled each of the documents as being either relevant or not relevant to the queries.

Algorithm 1 Returns the following results:

Query 1 :  d4 ,  d15 ,  d1 ,  d3 ,  d8 ,  d76 ,  d2 ,  d33 ,  d30 ,  d5 ,  d11 ,  d29 ,  d66 ,  d10
Query 2 :  d9 ,  d91 ,  d2 ,  d87 ,  d13 ,  d52 ,  d92 ,  d16 ,  d17 ,  d22 ,  d20 ,  d71 ,  d48 ,  d60 ,  d56

Algorithm 2 Returns the following results:

Query 1 : d8 , d29 , d6 , d5 , d15 , d17 , d20 , d65 , d2 , d33 ,
d44 , d41 , d7 , d77 , d13 , d14 , d90 , d80 , d70 , d4
Query 2 : d3 , d87 , d2 , d28 , d15 , d14 , d12 , d10 , d41 , d11 ,
d85 , d89 , d1 , d49 , d52 , d76 , d55 , d9 , d91 ,
d99 , d30 , d17 , d13 , d26 , d94 , d18 , d86 , d72 , d48 , d8 , d93 ,
d42 , d79 , d43 , d88 , d7 , d98 , d51 , d50 , d6

Relevance The known one is as follows:

Query 1 : d2 , d4 , d7 , d15 , d29
Query 2 : d1 , d2 , d3 , d7 , d8 , d9 , d11 , d12 , d13 , d15 , d16 , d20

a) For Algorithm 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 1 (all three curves should be on a single chart).

b) For Algorithm 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c) Plot the averages for Algorithm 1 and Algorithm 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think one of the algorithms is superior? Why?

Reference no: EM13841747

Create the physical data model

Using anyRDBMS preferably (MySQL), create the physical data model for the logical data model provided in the attachment. This should include all of the data definition langu

Create database model using ms visio database model diagram

You have been asked to create a database model using MS Visio Database Model Diagram Template. The purpose of this lab is to provide experience designing, with limited instr

Briefly summarize the results of the process

Import your data into IBM SPSS software using your assigned data set. Save the data file for future use, and use IBM SPSS software to compute frequencies on all appropriate

Create the rdm with appropriate attributes

Create the RDM with appropriate attributes, based on the newly added entities created by you. List the business rules and constraints that apply to the business case as outli

Describe two database features new to oracle database

This writing assignment calls for you to provide a substantive response 1400 words on the subject of managing databases. Describe two database features new to Oracle Database

Provide the statement necessary to answer the given queries

List all salesperson numbers, salesperson names, and their salaries - List the addresses, balances, and invoice numbers of those customers who were sold merchandises in the da

Measurement of two lengths in feet and inches

"create a form with textboxes to enter the measurement of two lengths in feet and inches as integer values and compute the total length in feet and inches. For example, the

Prepare a data dictionary

In this lab, you will prepare a Data Dictionary based on the list of elements. Also, your task will be determined the tables, their relationships, primary and foreign keys.

Reviews

Write a Review

 
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd