Implement several neural information retrieval methods

Assignment Help Other Subject
Reference no: EM133547399

Information Retrieval and Web Search

Aim

Project aim: The aim of this project is for you to implement several neural information retrieval methods, evaluate them and compare them in the context of a multi-stage ranking pipeline.

The specific objectives of Part 2 are to:

Set up your infrastructure to index the collection and evaluate queries. Implement neural information retrieval models (only inference).
Examine your ability to perform evaluation and analysis when different neural models are used.

The Information Retrieval Task: Web Passage Ranking

As in part 1 of the project, in part 2 we will consider the problem of open-domain passage ranking in answer to web queries. In this context, users pose queries to the search engine and expect answers in the form of a ranked list of passages (maximum 1000 passages to be retrieved).

The provided queries are actual queries submitted to the Microsoft Bing search engine. There are approximately 8.8 million passages in the collection, and the goal is to rank them based on their relevance to the queries.

What we provide you with:

Files from practical

A collection of 8.8 million text passages extracted from web pages ( collection.tsv - provided in Week 1).
Pytorch file for ANCE model(refer to week10-prac ).

Standard DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERT- D R").eval() to load this model.
Extra files for this project

A query dev file that contains 30 queries for you to perform retrieval experiments
A query dev file that contains 30 queries (same query ids with previous one, but with typos in the query
text ( data/dev_typo_queries.tsv )

A qrel file that contains relevance judgements for you that can be used to tune your methods for dev queries( data/dev.qrels ).

A leaderboard system for you to evaluate how well your system performs.

A test query file that contains 60 queries for you to generate run files to submit to the leaderboard

This jupyter notebook, which you will include inside your implementation, evaluation and report.

An hdf5 file that contains TILDEv2 pre-computed terms weights for the collection. Download from this link Typo-aware DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERT-DR- a ug").eval() to load this model.

Put this notebook and the provided files under the same directory.

What you need to produce
You need to produce:

Correct implementations of the methods required by this project's specifications.

An explanation of the retrieval methods used, including the formulas that represent the models you implemented and the code that implements that formula, an explanation of the evaluation settings followed, and a discussion of the findings. Please refer to the marking sheet to understand how each of these requirements is graded.

You are required to produce both of these within this jupyter notebook.

Required methods to implement
In Part 2 of the project, you are required to implement the following retrieval methods as two-stage ranking pipelines (bm25 + one dense retriever). All implementations should be based on your code (except for BM25, where you can use the Pyserini built-in SimpleSearcher).
. ANCE Dense Retriever: Use ANCE to re-rank BM25 top-k documents. See the practical in Week 10 for background information.
. Standard DPR Dense Retriever: Use standard DPR to re-rank BM25 top-k documents. See the practical in Week 10 for background information.
. Typo-aware DPR Dense Retriever: typo-aware DPR is a DPR model that is fine-tuned with augumented typos in the training samples, please use this model (provided in the project) to re-rank BM25 top-k documents, the inference is the same to standard DPR Dense Retriever.
. TILDEv2: Use TILDEv2 to re-rank BM25 top-k documents. See the practical in Week 10 for background
information.
For TILDEv2, unlike what you did in practical, we offer you the pre-computed term weights for the whole collection (for more details, see the Initial packages and functions cell). This means you can have a fast re-ranking speed for TILDEv2. Use this advantage to trade off effectiveness and efficiency for your ranking pipeline implementation.

You should have already attempted many of these implementations above as part of the computer prac exercises.

Required evaluation to perform

In Part 2 of the project, you are required to perform the following evaluation: we consider two types of queries, one of which contains typos (i.e. typographical mistakes, like writing iformation for information , and another one with the typos resolved. An important aspect of the evaluation in the project is to compare the retrieval behaviour of search methods on queries with and without typos (note this is the same as project part 1).

. For all methods, evaluate their performance on data/dev_typo_queries.tsv (queries with typos) and data/dev_queries.tsv (the same queries, but typos are corrected), using data/dev.qrels with four evaluation metrics (see below).

. Report every method's effectiveness and efficiency (average query latency) on the data/dev_queries.tsv (no need for typo queries) and the corresponding cut-off k for reranking into a table. Perform statistical significance analysis across the results of the methods and report them in the tables.. Produce a gain-loss plot that compares the most and least effective ones of the four required methods above in terms of on data/dev_typo_queries.tsv .. Comment on trends and differences observed when comparing your findings.

Does the typo-aware DPR model outperform the others on the data/dev_typo_queries.tsv queries?

When evaluating the data/dev_queries.tsv queries, is there any indication that this model loses its effectiveness?

Is this gain/loss statistically significant? (remember to perform a t-test as well for this task).

(optional) submit your runs on the data/test_queries.tsv based on your implemented methods from the dev sets to the leaderboard system (not counted in your mark for this assignment, but the top-ranked student on the leaderboard could request for a recommendation letter from Professor Guido Zuccon).

Attachment:- Information Retrieval and Web Search.rar

Reference no: EM133547399

Questions Cloud

Research the literature and obtain two to three resources : Research the literature and obtain two to three resources for current, evidence-based information related to the pharmacological agent.
How is this article connected to the business criticism : How is this article connected to the Business Criticism and Corporate Response? 3. Read Article: The coronavirus downturn has highlighted a growing investment
What is so important that we understand cultural diversity : What is so important that we understand cultural diversity in a time like this? Who cares if we're all different? What does it matter anyway?
Which historic perspectives on individuals : Explain two ways in which historic perspectives on individuals living with a disability persist despite the ADA.
Implement several neural information retrieval methods : INFS7410 Information Retrieval and Web Search, University of Queensland - implement several neural information retrieval methods, evaluate them
Describe the strategy you would suggest for communication : Discuss what strategies and tools you believe are most effective for formal communication between senior management and employees.
Explain how your mind map relates to applied anthropology : Read the What Are Mind Maps? How to Use Mind Maps to Unleash Your Brain's Potential. Explain how your mind map relates to applied anthropology.
What value does the sapphire card create for customers : Who are the customers targeted by Chase with the Sapphire card? What are the key values and profile characteristic of these customers?
Orientation toward advocacy and allyship : Describe the ways in which the participants' knowledge of psychology has shaped their orientation toward advocacy and allyship.

Reviews

len3547399

10/15/2023 11:06:07 PM

Because i need GOOD marks in this assignment else i m gonna fail the entire course, so can you able to provide the answes so that i can get enought time to execute that on my computer. this is the marking criteria, can you look into this and try to provide with max score All the necessary supporting files I will send once you sure to do this work

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd