Building the classification model

Assignment Help Marketing Research

Reference no: EM132917304

Final Assignment

Part 1:

Step 1: Read the Tripadvisor hotel reviews dataset

Step 2: Create a diagram to take a look at the variable "Score" to see if majority of the customer ratings are positive or negative.

Step 3: Create wordclouds to see the most frequently used words in the reviews and save it.

Step 4: Do Sentiment analysis with VADER
• Applying the model on our dataset
• Assign reviews with compound > 0 as positive sentiment, compound < 0 negative sentiment and remove score = 0
• export csv files
• Now that we have classified reviews into positive and negative, let's build wordclouds for each!
• Take a look at the distribution of reviews with sentiment across the dataset and save the diagram

Step 5: Building the classification model
Build the sentiment analysis model! This model will take reviews in as input.
It will then come up with a prediction on whether the review is positive or negative.
This is a classification task, so you will train a simple logistic regression model to do it.

Step 6: Split the Dataframe
The new data frame should only have two columns - "Review", and "sentiment" (the target variable).

Training the sentiment analysis model
80% of the data will be used for training, and 20% will be used for testing.

Step 7: Create a bag of words
Use a count vectorizer from the Scikit-learn library.
Convert the text into a bag-of-words model since the logistic regression algorithm cannot understand text.

Step 8: Logistic Regression
Split target and independent variables Fit model on data
Make predictions:

Step 9: Test the accuracy of your model Find accuracy, precision, recall
Create the classification report

Part 2: Topic Modelling

LDA
Step 1: Import the positive.csv dataset you have created in Part 1 Step 2: Applying LDA on the "Review" column
Step 3: Define number of topics as 5
Step 4: Create topics along with the probability distribution for each word in our vocabulary for each topic.
Step 5: Print the 10 words with highest probabilities for all the five topics
Step 6: Add a column to the original data frame that will store the topic for the reviews.
Step 7: Save the new dataset as: reviews_topic(lda).csv

Non-Negative Matrix Factorization (NMF)
Step 1: Import the positive.csv dataset you have created in Part 1
Step 2: Apply Non-Negative Matrix Factorization (NMF) on the dataset Step 3: Define number of topics as 5
Step 4: Create topics along with the probability distribution for each word in our vocabulary for each topic.
Step 5: Print the 10 words with highest probabilities for all the five topics
Step 6: Add a column to the original data frame that will store the topic for the reviews.
Step 7: Save the new dataset as: reviews_topic(nmf).csv

Attachment:- Reviews Assignment.rar

Reference no: EM132917304

Questions Cloud

Find what is return over the year : Over the year, Microsoft has paid dividend of $2.24 per share. If the current share price is $255, what is your return over the year?

What is the direct spot quotation for the euro in canada : You observe that the direct spot quotations for Germany and Canada are $1.44 and $0.83, respectively. What is the direct spot quotation for the euro in Canada?

Show the impact of the share bonus on stockland ltd : Show the impact of the share bonus on Stockland Ltd's outstanding shares and current market share price. Stockland Ltd had a total net profit of $19 million.

Calculate the new cost of equity of the company : Base on M&M proposition II, calculate the new cost of equity of the company after the debt issuance. Calculate the present value of interest tax shield

Building the classification model : Create a diagram to take a look at the variable "Score" to see if majority of the customer ratings are positive or negative - Applying the model on our dataset

How much equity does it need to finance the expansion : If the firm aims to maintain a debt ratio of 30 per cent and need to raise an additional $1.3 million, how much equity does it need to finance the expansion?

How can humans establish and maintain careful : Elon-Musk donated $10 million to a foundation called the Future of Life Institute. How can humans establish and maintain careful oversight of the work.

Explain circumstances under : Explain circumstances under which bite should be sent to buyers

Demonstrate the impact of share split on stockland ltd : Stockland Ltd had a total net profit of $19 million. Demonstrate the impact of share split on Stockland Ltd's outstanding shares and current market share price.

User Account

All Pages