Reference no: EM132365228
Statistics Assignment - Predicting Wine Quality
Descriptions: This dataset is related to red vinho verder wine samples from the north of Portugal. The goal is to model wine quality (v12) based on physicochemical attributes (v1-v11). Each row of the data set corresponds to a wine sample.
Descriptions for the data follow: v1 fixed acidity v2 volatile acidity v3 critic acid v4 residual sugar v5 chlorides v6 free sulfur dioxide v7 total sulfur dioxide v8 density v9 pH v10 sulphates v11 alcohol v12 quality (score from 1-10) Please download the train.csv and test.csv from BBlearn.
A wine sample is identified as good wine if the quality v12>=6. The research goal is, therefore, to predict if a wine sample is a good wine. Please create a new dependent variable that takes the values of 1 if a sample is good wine and 0 otherwise.
(i) Please build a logistic regression model that includes all the attributes from the training set to predict the wine quality (model 1) in the testing set. Show the confusion table. What are precision, recall, and accuracy?
(ii) From model 1, what will happen to the odds of good wine if the concentration of sulphates is increased by 1 unit?
(iii) Please build a logistic regression model that v1, v2, v3, v4, and v5 from the training set to predict the wine quality (model 2) in the testing set. Show the confusion table. What are precision, recall, and accuracy?
(iv) Please compare model 1 and model 2 using the ROC curve on the testing set. Which one performs better?
(v) Suppose that a merchant will lose $10 if he misclassifies a bad wine as a good wine. He will also lose $1 if he misclassifies a good wine as a bad wine. Please compare the total cost of model 1 and model 2.
(vi) We can also use a linear regression model for classification. Please use v12 as the dependent variable and build a linear regression model on all the attributes from the training set. Please show the Mean Square Error (MSE): the average of the squared differences between the predicted and actual wine quality score.
(vii) We can also use a linear regression model for classification. Please use v12 as the dependent variable and build a linear regression model on all the attributes from the training set. To make decisions, a merchant will classify a sample as good wine if the predicted V12 is greater than or equal to 6. Show the confusion table. 2. It's very important to inspect the dataset before conducting analysis.
Please download the final.csv from BBlearn. Use v10 as the dependent variable and build a logistic regression model on all the attributes. You may find that the algorithm does not converge as the warning message indicates. Can you find out why?