Reference no: EM132399649
Reading and processing dataset
replace the following path with the directory path where you placed the KDD dataset
Differentiating between nominal, binary, and numeric features
root_shell is marked as a continuous feature in the kddcup.names
file, but it is supposed to be a binary feature according to the
dataset documentation
Let's proceed with StandardScaler- Apply to all the numeric columns
5-class supervised classification
K Nearest Neighbors Classifier
Support Vector machine classifier
Attempting unsupervised learning
First, let's visualize the dataset (only numeric cols)
Use PCA to reduce dimensionality so we can visualize the dataset on a 2d plot
Apply k-means (k=5, only using numeric cols) + PCA + plot
Fit the training data to a k-means clustering estimator model
Retrieve the labels assigned to each training sample
Attachment:- Lab7C.rar