Prepare the data for a gensim word2vec model

Assignment Help Other Subject
Reference no: EM133967976

Information Technology and Marketing in the New Economy

Assignment 1

Part 1. Text representation

There are 100 reviews for restaurants and films in a collection under the IA1_1.csv file. For this assignment, you are asked to preprocess these reviews such that each of the reviews will be represented as a TF-IDF vector. In particular, please follow the steps listed below:

1. Tokenize each review in the collection.

2. Use the tokenized reviews after step 1, lemmatize all the words.

3. Based on the output in step 2, remove all the stop-words and the punctuations.

4. Based on the output in step 3, convert each of the reviews to TD-IDF vectors. The minimal document frequency for each term is 3. Also, include 2-grams.

5. Based on the output in step 1, POS-tag each word and do a TD-IDF vectorization, the minimal document frequency for each term is 4 (please don't do normalization and stop-word removal). Get dependable, budget-friendly assignment help-starting today!

Tip: you may consider using a "for loop" for step 1 to step 3, so you could process the whole collection at once.

Please submit these files:

1. A Jupyter Notebook file .ipynb which includes your python code with your comments # or markdowns, and the results of each successful running through. Use a markdown at the end of the .ipynb file to report the #dimension of the vectors of step 4 and step 5.

2. A CSV file with your final TF-IDF vectors (step 4). Each review should correspond to one row and each column should correspond to one item in the vectors. (Note: you don't need to submit the intermediate output data in step 1, step 2 and step 3).

3. A CSV file with your POS-tag TF-IDF vectors (step 5). Each review should correspond to one row and each column should correspond to one item in the vectors. (Note: you don't need to submit the intermediate output data in step 1).

Part 2. Word2Vec

The data in IA1_2.csv has the information about 11914 cars. There are two fields: Maker_Model and description. The description column contains a set of tags (separated by commas), where the Maker_Model is also included.

1. Prepare the data for a gensim Word2Vec model.

2. Run the model (with size = 50) and display the vector for ‘Toyota Camry'.

3. Compute the similarity between 'Porsche 718 Cayman' and 'Nissan Van'.

4. Find the five cars most similar to 'Mercedes-Benz SLK-Class'.

5. Generate a t-SNE graph for a list of 50 unique cars.

Reference no: EM133967976

Questions Cloud

Analysis of modernity as mass society and as class society : State four defining characteristics of social change. Contrast analysis of modernity as mass society and as class society.
Considering the components of three-legged stool : Burn the Free Fuel When considering the components of the three-legged stool
How will they impact your academic plan and your career : What are the three options in the General Studies degree program that are you considering? How will they impact your academic plan and your career?
Explain the concepts of fertility-morality and migration : Explain the concepts of fertility, morality, and migration and how they affect population size.
Prepare the data for a gensim word2vec model : Prepare the data for a gensim Word2Vec model - Compute the similarity between 'Porsche 718 Cayman' and Nissan Van
How intersectional identity can affect womens experiences : THEN, discuss how intersectional identity can affect women's experiences and relationship to society.
Blue ridge tunnel built : What engineering approach was first used in the construction of the Blue Ridge Tunnel built through Afton Mountain in the 1850s?
At which age will patient likely catch up developmentally : A nurse practitioner assesses a newborn patient at the first clinic visit following hospital. At which age will the patient likely catch up developmentally?
Which is an appropriate response by the nurse practitioner : The parent is concerned about temper tantrums, which have been worsening over the past few months. Which is an appropriate response by the nurse practitioner?

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd