Compute the ranking scores for all documents

Assignment Help Database Management System
Reference no: EM13815737

Task Description

Vector Model

This question requires you to use the following data. Assume a repository of 10 documents over eight key terms. Table 1 gives the document-term table that shows the raw frequencies with which the eight key terms appear 1 in each of the 10 documents, as well as the TF values for a query document.

Using the information from Table 1, which documents would be returned by the following queries:

a) Term2 AND Term7

b) Term4 OR Term2

c) (Term2 OR Term7) AND (NOT Term7)

A2: Document-Term and Query-Term Table

 

Term 1

Term 2

Term 3

Term 4

Term 5

Term 6

Term 7

Term 8

Doc 1

4

8

9

0

10

8

0

9

Doc 2

1

5

0

0

12

0

1

3

Doc 3

0

3

0

0

0

4

2

0

Doc 4

1

0

4

3

9

0

0

0

Doc 5

0

4

0

0

0

5

1

0

Doc 6

1

2

2

0

3

1

0

1

Doc 7

0

5

3

1

0

0

4

2

Doc 8

0

7

0

3

0

0

3

3

Doc 9

0

5

0

0

0

4

1

2

Doc 10

0

3

4

0

0

2

4

0

Query

2

3

1

2

2

0

1

0

Is it possible to rank the documents returned in (a) to (c)? If it is possible, then supply the rankings in each case. If it is not possible, then state why.

Exercise 1. Answer the following questions.

a) Using the information from Table 2.1, calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures: dot product using TF weight for both documents and query vectors cosine coefficient using TF weight for both documents and query vectors.

b) Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 21. Answer the following questions.

a) Using the information in Table 2.1, calculate the idf (inverse document frequency) weight vector. Make sure you show how your calculation was performed.

b) Construct a table similar Table 2.1, but, instead of raw term frequencies, show the tf-idf weights.

c) Using tf weights for the query vector, and tf-idf weights for the document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

d) How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 2. Answer the following questions.

a) This time, using tf-idf weights for both the query and document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

b) How does this ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 3. This time, use tf weights for the query vector, tf-idf weights for the document vectors, and the Dice coefficient rather than the Cosine coefficient as the similarity measure.

a) Compute the ranking scores for all documents. Show how your calculations were performed for the first document only.

b) How does the ranking compare with ranking obtained in Exercise 1? If there are differences between the rankings, then discuss why you think these differences occurred.

IR Evaluation

Exercise 4. The following data displays retrieval results for two different algorithms (Algorithm 1 and Algorithm 2) in response to two distinct queries (Query 1 and Query 2). An expert has manually labelled each of the documents as being either relevant or not relevant to the queries.

Algorithm 1 Returns the following results:

Algorithm 2 Returns the following results:

Relevance The known one is as follows:

a) For Algorithm 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 1 (all three curves should be on a single chart).

b) For Algorithm 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c) Plot the averages for Algorithm 1 and Algorithm 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think one of the algorithms is superior? Why?

Reference no: EM13815737

Questions Cloud

What will be the new market equilibrium price and quantity : A perfectly competitive industry has a large number of potential entrants. all firms have identical cost structure and minimize cost at the same point where Min AC=MC. what will be the new market equilibrium price and quantity?
Compare and contrast important interviewing techniques : Compare and contrast important interviewing techniques
Significant environmental issue : Select one significant environmental issue. What are the costs and benefits of our current environmental issues from an economic standpoint? What type of market failure could those problems is defined as? What can you as an individual do to help in a..
Ethical effects of alcoholism : Ethical Effects of Alcoholism
Compute the ranking scores for all documents : How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.
Competitive market the market demand for a good : In a perfectly competitive market the market demand for a good A is P=50-Q and the market supply is p=5+0.5Q. find the equilibrium price, quantity and the total welfare in this market. a unit tax=$15/unit is imposed on good A. calculate the amount of..
Write a history paper about the river niger- world theme : Write a history paper about The River Niger- World Theme, The Telegraph-Us Theme and The District of Columbia Board of Nursing-Us Theme.
Elements of the fourth amendment reasonableness test : What are the two elements of the Fourth Amendment reasonableness test? Explain how the "totality of circumstances" test works in practice
Examine how behavior exhibits motivation : Examine how behavior exhibits motivation

Reviews

Write a Review

 

Database Management System Questions & Answers

  Express the information about fred and ann

Express the information about Fred and Ann in unary facts. Draw a conceptual schema diagram based on this choice

  Create database for cover 2010 tour de france cycling race

Draw an Entity-Relationship diagram for this database using UML notation. Be sure to include all the entities mentioned above, together with attributes (including primary key attributes).

  List the name of the project with the lowest budget

List the names, ages, and salaries of managers of a user-specified sex (male or female) working in a given department. You can assume that, while there are many departments, each department contains very few project managers.2

  Create a sql statement and execute the same in sqlplus

Statistics can be created on tables, indexes columns and as well as on the individual columns. But, if for some reason table or index statistics have not been updated, then this may result in a full table scan.

  Write sql statements to calculate average salary

Write SQL statements that do the following: Calculate the average salary for all employees. Calculate the maximum salaries for exempt and non-exempt employees.

  Describe a business scenario

Constraints Business Scenario: Describe a business scenario and specify the types of constraints that would be appropriate to ensure the integrity of the database. Be sure to include every constraint discussed .

  Find all of the strong association rules

Find all of the strong association rules. Provide support, c onfidence, and lift for all the rule. Provide the reasons why the rules you selected are interesting.

  Construct a b+ tree

Construct a B+ tree for the following set of key value (2,3,5,7,11,19,23,29,31) in ascending order where the pointers that will fit in one node is as follows:A. four B. six C. eight assuming the tree is initially empty and values are in ascending ..

  Characteristics of relational database management system

Describe the characteristics of a Relational Database Management System (RDBMS).

  Why is a key important in a database

Why is a Key important in a database? How does it help with Referential Integrity? Lists three compelling reasons why Keys are crucial to table structure

  How much of the total materials handling cost

How much of the total materials handling cost would be allocated to the wall mirrors - the materials handling cost is allocated on the basis of material moves

  Show the view defnition statements for employeenames and

You want to authorize your secretary to ?re people (you will probably tell him whom to ?re, but you want to be able to delegate thistask), to check on who is an employee, and to check on average department salaries. What privileges should you grant?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd