Calculate the idf weight vector

Assignment Help Database Management System
Reference no: EM13794676

Vector Model

This question requires you to use the following data. Assume a repository of 10 documents over eight key terms. Table 2.1 gives the document-term table that shows the raw frequencies with which the eight key terms appear 1 in each of the 10 documents, as well as the TF values for a query document.

Exercise 1. Using the information from Table 2.1, which documents would be returned by the following queries:

a) Term2 AND Term7
b) Term4 OR Term2
c) (Term2 OR Term7) AND (NOT Term7)

Table 2.1: A2: Document-Term and Query-Term Table

 

Term 1

Term 2

Term 3

Term 4

Term 5

Term 6

Term 7

Term 8

Doc 1

4

8

9

0

10

8

0

9

Doc 2

1

5

0

0

12

0

1

3

Doc 3

0

3

0

0

0

4

2

0

Doc 4

1

0

4

3

9

0

0

0

Doc 5

0

4

0

0

0

5

1

0

Doc 6

1

2

2

0

3

1

0

1

Doc7

0

5

3

4

0

0

4

2

Doc 8

0

7

0

3

0

0

3

3

Doc 9

0

5

0

0

0

4

1

2

Doc 10

0

3

4

0

0

2

4

0

Query

2

3

1

2

2

0

1

0

Is it possible to rank the documents returned in (a) to (c)? If it is possible, then supply the rankings in each case. If it is not possible, then state why.

Exercise 2. Answer the following questions.

a) Using the information from Table 2.1, calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures:

dot product using TF weight for both documents and query vectors cosine coefficient using TF weight for both documents and query vectors.

b) Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 3. Answer the following questions.

a) Using the information in Table 2.1, calculate the idf (inverse document frequency) weight vector. Make sure you show how your calculation was performed.

b) Construct a table similar Table 2.1, but, instead of raw term frequencies, show the tf-idf weights.
c) Using tf weights for the query vector, and tf-idf weights for the document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

d) How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 4. Answer the following questions.

a) This time, using tf-idf weights for both the query and document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

b) How does this ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 5. This time, use tf weights for the query vector, tf-idf weights for the document vectors, and the Dice coefficient rather than the Cosine coefficient as the similarity measure.

a) Compute the ranking scores for all documents. Show how your calculations were performed for the first document only.

b) How does the ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

2.2. IR Evaluation

Exercise 6. The following data displays retrieval results for two different algorithms (Algorithm 1 and Algorithm 2) in response to two distinct queries (Query 1 and Query 2). An expert has manually labelled each of the documents as being either relevant or not relevant to the queries.

Algorithm 1 Returns the following results:

Query 1 : d4 , d15 , d1 , d3 , d8 , d76 , d2 , d33 , d30 , d5 , d11 , d29 , d66 , d10
Query 2 : d9 , d91 , d2 , d87 , d13 , d52 , d92 , d16 , d17 , d22 , d20 , d71 , d48 , d60 , d56

Algorithm 2 Returns the following results:

Query 1 : d8 , d29 , d6 , d5 , d15 , d17 , d20 , d65 , d2 , d33 , d44 , d41 , d7 , d77 , d13 , d14 , d90 , d80 , d70 , d4
Query 2 : d3 , d87 , d2 , d28 , d15 , d14 , d12 , d10 , d41 , d11 , d85 , d89 , d1 , d49 , d52 , d76 , d55 , d9 , d91 , d99 , d30 , d17 , d13 , d26 , d94 , d18 , d86 , d72 , d48 , d8 , d93 , d42 , d79 , d43 , d88 , d7 , d98 , d51 , d50 , d6

Relevance The known one is as follows:

Query 1 : d2 , d4 , d7 , d15 , d29
Query 2 : d1 , d2 , d3 , d7 , d8 , d9 , d11 , d12 , d13 , d15 , d16 , d20

a) For Algorithm 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 1 (all three curves should be on a single chart).

b) For Algorithm 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c) Plot the averages for Algorithm 1 and Algorithm 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think one of the algorithms is superior? Why?

Reference no: EM13794676

Questions Cloud

Impact of the issue on american politics : What is the impact of the issue on American politics? Why? What is your opinion on the issue? Please back it up.
Independent auto sales and service : Let's go back to the ongoing case you selected. This will be either your current employer, a specific organization you want to work in, or one of the two hypothetical organizations described: Independent Auto Sales and Service (IAS) or Network Nut..
What is an ultimate form of foreign involvement : Normal 0 false false false EN-US X-NONE X-NONE What is an ultimate form of ..
How is technology changing the face of business today : Assignment On How is Technology Changing the Face of Business Today
Calculate the idf weight vector : Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred - calculate the idf weight vector.
Define total customer satisfaction is how measured : Total customer satisfaction is measured based on the
Analyze possible outcome measurement strategies : Analyze possible outcome measurement strategies related to organizational change processes. Determine how you will measure quality, cost, and satisfaction outcomes to evaluate your proposed organizational change.
Problems based on managerial epidemiology : Evaluate the cost effectiveness of using epidemiologic principles as a prophylactic measure, as opposed to a lack of health oversight in disadvantaged communities
How to prepare college faculty for incoming net-generation : how to prepare college faculty for the incoming 'Net-generation of students Clayton-Pedersen and O'Neill claim that "much of the learning technology innovation in higher education.

Reviews

Write a Review

Database Management System Questions & Answers

  Design a case for other student to investigate

You need to design a CASE for other student to investigate. As an example; at the end of this week you should generate the following materials: A case description.

  Build a sql server database in your visual studio project

build a sql server database in your visual studio project. add a table in the database using the properties of the

  Create a table for patients with information

Create a table (by your own imagination) which comprises the least 25 patients with next information (columns): Calculate average of Value1 for each Gender.

  What is meant by data independence

What is meant by data independence? Explain your answer and identify two benefits of separating application software from the database management system.

  Database modeling and design

Database Modeling and Design: A complete table diagram. Transform model to tables, include all attributes and keys.

  Installing the ms-sql server on your computer

Use Hands-on Projects that are in the back of your Chapter to introduce you to the Microsoft SQL Server 2008.

  Create a detailed erd using the entities and attributes

Create a detailed ERD using the Entities and Attributes for Driver's Log document found on the Huffman Trucking Intranet site. Use Microsoft® Access® to create the preliminary

  First - second or third normal form

How many entities are shown by this relation?

  Sketch hash table to result from using hash function

Sketch hash table of length 11 which results from using hash function h(i) = (2i + 5) mod 11, to hash the keys 12, 44, 13, 88, 23, 94, 11, and 39,

  Questionethical and legal considerations in marketing

questionethical and legal considerations in marketing intellectual property and product safetyprepare a paper in which

  Designing of database

a. Discuss the degree to which you believe the Visio diagram reflects the database design. b. Describe any assumptions that you had to make about the business rules to in order to create the Visio diagram and the associated relationships.

  Create a decision table that describes movement of inventory

Name four attributes that you can use to define a data flow in the grocery inventory information system.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd