CE706 Information Retrieval Assignment

Assignment Help Other Subject
Reference no: EM132801237

CE706 Information Retrieval - University of Essex

Scenario: In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19) . CORD-19 is a resource of over 181,000 scholarly articles, including over 80,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in information retreival and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Your task

This task comes in stages. Marks are given for each stage. The stages are as follows:
• Indexing (20%) The first step for you will be to obtain the dataset. Once you have done so upload a sample of 1000 articles with full text to Elasticsearch (the simplest thing is to use the first 1000 documents). You will work with the metada.csv file provided by the challenge.
• Sentence Splitting, Tokenization and Normalization - The next step should be to transform the input text into a normal form of your choice. This should include the identification of sentences, bullet points and cells in tables.
• Selecting Keywords - One aim of your system is to identify the words and phrases in the text that are most useful for indexing purposes. Your system should remove words which are not "useful". E.g. very frequent words or stopwords. You should also identify phrases suitable as index terms. Apply tf.idf as part of your selection and weighting step.

• Stemming or Morphological Analysis - Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way, e.g.bus and busses refer to exactly the same thing even though they are different words.
• Searching (10%) Once you have indexed the collection you want to be able to search it. You can do that on the command line, but it would be much better to have an interactive system. You could start with Kibana for that but you are free to use other open source tools for your Graphical User Interface(GUI). Note that the each article in the collection contains different fields. Make sure that a user can decide which field to search (Hint:one of the fields is the publication date of the article).
• Engineering a Complete System - The final system should allow a user to have control over all the individual components, so inthe final result we will have a complete search engine, not disperate code.
You will have noticed that the percentages above only add up to 80%. This is because one of the important aspects of the project is that your work should be well documented and your code well commented. 20% of your mark will come from this. The report should contain:
• Instructions for running your system
• Screenshots illustrating the functionality you have implemented
• Design and design decisions/justifications of your overall architecture
• A description of the document collection you have chosen
• Discussion of your solution focussing on functionality implemented and possible improvements and extensions.

Attachment:- Information Retrieval.rar

Attachment:- Metadata.rar

Reference no: EM132801237

Questions Cloud

Define organizational behavior : Define organizational behavior. Describe how different components of organizational behavior are used within an organization.
Determine the amount of deposits in transit : Patry Corp. deposits all receipts intact and makes all payments by cheque. Determine the amount of deposits in transit and outstanding cheques at May 31
What is the Accounting Department cost : The Maintenance Department's costs of $300,000 are allocated on the basis of machine hours. What is the Accounting Department cost
How cash should be distributed during the entire course : Partners A, B, C and D share profits in the ratio of 3:3:1:1, respectively. How cash should be distributed during the entire course of liquidation
CE706 Information Retrieval Assignment : CE706 Information Retrieval Assignment Help and Solution, University of Essex - Assessment Writing Service - growing urgency for these approaches
HI6025 Accounting Theory and Current Issues Assignment : HI6025 Accounting Theory and Current Issues Assignment Help and Solution, Holmes Institute - Assessment Writing Service
Understanding individual behavior in a social context : Social psychology is about understanding individual behavior in a social context. Social psychologists, therefore, deal with the factors that lead us to behave
Receptive fields of cat optic nerve and lgn neurons : What new properties were associated with the discovery of these receptive fields? How did these properties require that the definition of receptive field be cha
Importance of a multicultural perspective in crisis interven : Give two examples of crisis situations in which an understanding of another culture will enable you to more effectively respond.

Reviews

len2801237

2/18/2021 4:24:13 AM

It's a technical work please give to good Writer Please see the below information. 1.. Topic- CORD-19 Word Count – No particular Count

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd