Design a text retrieval system

Assignment Help Other Subject
Reference no: EM133876834

Machine Learning Applications

Assessment - Design a Text Retrieval System

Type: Coding and Presentation

Task

Design a text retrieval system to find similar movies/shows based on the descriptions.

Assessment Description

We humans communicate using different languages, either by speaking or writing. Text data is abundant in the real world. It's a challenging task to work with natural languages. Your team lead has assigned you one such task of recommending movies based on the movie description.

Data

A movies/shows dataset with description is curated by pre-processing the Kaggle IMDb Movies/Shows with Descriptions dataset and is provided to you in MyKBS. You are encouraged to explore the original source.

The original dataset is pre-processed and is provided in 2 files - train.csv and test.csv. MyKBS provides you these files each containing following columns:

title: Title of the movie/show.
description: Description of the movie/show.

You are required to train a text retrieval system using the train.csv file. And test the system using the test.csv file.

Problem Statement

As an individual, you are required to download the data sets, i.e., train.csv and test.csv files from MyKBS. You must build a text retrieval system to find similar movies/shows based on the descriptions. You should systematically approach the problem by addressing the below tasks:

Load the data sets and pre-process them to fit your requirements. You must use at least two pre-processing techniques. (5 marks)

Design a text retrieval system using TF-IDF (with inverted file) algorithm. (10 marks)

Find the top 3 movies/shows matches in the train.csv based on the descriptions provided in the test.csv. (5 marks)

You are to record a 5-minute video accompanying PowerPoint slides to elaborate the approach and performance of the system using relevant metric(s). In recording this video, you will need to prepare accompanying PowerPoint slides thar are clear, concise, of the required quality and references in accordance with the Kaplan Harvard Referencing style. (20 marks)

Learning outcome 1: Explore programming functions to source, store and prepare data for machine learning applications.
Learning outcome 2: Design algorithmic models for the application of machine learning in information technology.
Learning outcome 3: Create advanced insights of strategic organisational value with the aid of machine learning.

Assessment Guidelines

You are required to follow the below guidelines:

You should write your Text Retrieval System code using Python 3 programming language.

The use of any Python third-party package(s) is restricted to the following tasks:
Loading the datasets. E.g., Pandas.
Any necessary text pre-processing steps. E.g., Natural Language Toolkit, etc.
Performing necessary calculations during the building of the system. E.g., NumPy.
Calculating the performance of the system. E.g., Scikit Learn, Matplotlib, Plotly, etc.

Reference no: EM133876834

Questions Cloud

How can they stay abreast of these legal guidelines : How important are legal requirements to a customer care manager's work? How can they stay abreast of these legal guidelines?
What is the best control method for the spill : You work at a residential care facility. One day, you notice someone has spilled orange juice in the kitchen. What is the best control method for this spill?
How would you respond to colemans request : How would you respond to Coleman's request? Be sure to cite research that supports your position.
Explain how the diagnostic history of heterosexual african : Explain how diagnostic history of heterosexual African-American male who has been diagnosed with trichotillomania can used to inform intervention recommendation
Design a text retrieval system : Design a text retrieval system using TF-IDF (with inverted file) algorithm - Find the top 3 movies/shows matches in the train.csv based on the descriptions
Describe polypharmacy in the elderly patient : Describe polypharmacy in the elderly patient and the problems polypharmacy can cause in the elderly. Delineate in detail three problems polypharmacy can cause.
Determine source of the breach without involving superiors : Determine the source of the breach without involving superiors, Contaet a data protection consultancy for advice before taking any internal steps.
What is the program assessment and evaluation process : How are course outcomes evaluated? What is the program assessment and evaluation process? Is there a systematic program evaluation plan?
Techniques for applying the hierarchy of risk control : Explain principles and techniques for applying the Hierarchy of Risk Control, upon identifying a hazard while working offsite.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd