Reference no: EM133944278
Question - Supervised machine learning classifiers
Data: The zip file "hw2.q2.data.zip" contains 3 CSV files:
"hw2.q2.train.csv" contains 8,000 rows and 11 columns. The first column ‘y' is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features: x1, ..., x10.
"hw2.q2.test.csv" contains 2,000 rows and 11 columns. The first column ‘y' is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features: x1, ..., x10.
"hw2.q1.new.csv" contains 30 rows and 10 columns. The first column ‘ID' is an identifier for 30 unlabeled samples. The remaining 10 columns contain input features: x1, ..., x10. Get expert assignment help online from PhD writers.
Task 1.
Use 4-fold cross-validation with the 8,000 labeled exampled from "hw2.q2.train.csv" to identify a classifier that achieves mean cross-validation accuracy of at least 0.96. You should try several Scikit-Learn classifiers, including: GaussianNB, DecisionTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, KNeighborsClassifier, LogisticRegression, SVC, and MLPClassifier. Try different hyper-parameter values for the better performing classifiers to obtain a good set of hyper-parameter values. Then select the best performing model. Report the following:
Selected model with hyper-parameter values:
Mean cross-validation accuracy: ............................ (rounded to 4 decimal places)
Task 2.
Train the classifier with the hyper-parameter values determined in Task 1 on all 8,000 training samples and use it to predict the output class ‘y' for the 2,000 examples in "hw2.q2.test.csv". Report the following:
Accuracy on 2,000 test examples: ........................ (rounded to 4 decimal places)
Classification report for the 2,000 test examples:
Confusion matrix for the 2,000 test examples:
Task 3.
Use the model trained in Task 2 to predict the output class ‘y' for the 30 examples in "hw2.q2.new.csv".