Normalize the training and testing data

Assignment Help Other Subject
Reference no: EM132108691

Purpose

In this assessment, you need to demonstrate your skills for applying regularized logistic regression to perform two-class and multi-class classification for real-world tasks. You also need to demonstrate your skill in recognizing under-fitting/overfitting situations

Instructions

This is group assessment task. Students will be required to analyse a given real-world scenario and contribute to the classifier design.

The group response to problem solution should not exceed 30 pages. Students will be required to consolidate their individual solutions and propose best solution that evidences each group member's contribution along with a rationale for the group's response to solving the problem.

Task A - Binary Classification

For this problem, we will use a subset of here. Note that this dataset has some information missing.

1.1 Data Munging

Cleaning the data is essential when dealing with real world problems. Training and testing data is stored in "data/wisconsin_data" folder. You have to perform the following:

- Read the training and testing data. Print the number of features in the dataset.

- For the data label, print the total number of 1's and 0's in the training and testing data. Comment on the class distribution. Is it balanced or unbalanced?

- Print the number of features with missing entries.

- Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries.

- Normalize the training and testing data.

1.2 Logistic Regression Train logistic regression models with L1 regularization and L2 regularization using alpha = 0.1

and lambda = 0.1. Report accuracy, precision, recall, f1-score and print the confusion matrix.
1.3 Choosing the best hyper-parameter
For L1 model, choose the best alpha value from the following set:

{0.1,1,3,10,33,100,333,1000, 3333, 10000, 33333}.

For L2 model, choose the best lambda value from the following set:

{0.001, 0.003, 0.01, 0.03, 0.1,0.3,1,3,10,33}.

To choose the best hyperparameter (alpha/lambda) value, you have to do the following:

- For each value of hyperparameter, perform 100 random splits of training data into training and validation data.

- Find the average validation accuracy for each 100 train/validate pairs. The best hyperparameter will be the one that gives maximum validation accuracy. Use the best alpha and lambda parameter to re-train your final L1 and L2 regularized model. Evaluate the prediction performance on the test data and report the following:

- Precision

- Accuracy

- The top 5 features selected in decreasing order of feature weights.

- Confusion matrix

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning.

Task B Multiclass Classification

For this experiment, we will use a small subset of MNIST dataset for handwritten digits. This dataset has no missing data. You will have to implement one-versus-rest scheme to perform multi-class classification using a binary classifier based on L1 regularized logistic regression.

2.1 Read and understand the data, create a default One-vs-Rest Classifier
1- Use the data from the file reduced_mnist.csv in the data directory. Begin by reading the data. Print the following information:

- Number of data points

- Total number of features

- Unique labels in the data

2- Split the data into 70% training data and 30% test data. Fit a One-vs-Rest Classifier (which uses Logistic regression classifier with alpha=1) on training data, and report accuracy, precision, recall on testing data.

2.2 Choosing the best hyper-parameter

1- As in section 1.3 above, now create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}. To choose the best alpha hyperparameter value, you have to do the following:

- For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.

- For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.

- On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) w.r.t. each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.

- Print the best value of alpha hyperparameter.

2- Evaluate the prediction performance on test data and report the following:

- Total number of non-zero features in the final model.

- The confusion matrix

- Precision, recall and accuracy for each class.

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning

Attachment:- Machine learning.zip

Reference no: EM132108691

Questions Cloud

Discuss about the organizational stressors : Can you discuss these topics in a meaningful way either in a casual conversation or in a more formal interview setting?
Substance taken into body may have significant effects : Any substance taken into the body may have significant effects. Analyze whether this early substance use has a disproportionate impact down the road.
Principles of management to the context of diversity : Considering the demographic trends of the United States and the global workforce, apply the most important principles of management
After a workplace project was completed : After a workplace project was completed, you were rightfully upset. You and two other team members did all of the work
Normalize the training and testing data : Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries
Personal patient data-laboratory data and financial data : Sensitive information in a physician's office setting includes: personal patient data, laboratory data, and financial data.
Why employee well being would be a crucial factor : Having conducted a job analysis for commercial pilots and examined their training and development needs, you now need to examine other factors that may affect.
What are the purposes of specifications : What are the purposes of specifications; what common problems are encountered in developing specifications, and how can specifications limit competition?
Suggest two strategic marketing recommendations : When conducting a review on any business, the first thing that needs to completed is a SWOT Analysis (strengths, weaknesses, opportunities and threats).

Reviews

len2108691

9/9/2018 11:18:48 PM

Criteria 2: 7 marks 5 marks 4 marks 0 mark • Create 10 random splits of training data into training and validation data. For L1 Successfully Successfully Successfully Failed to complete any model, choose the best alpha value from the provide set of values. completed all three completed any two of completed any one of given task. • Evaluate the prediction performance on test data and report requested results. tasks. the three tasks. the three tasks. • Discuss if there is any sign of underfitting or overfitting with appropriate reasoning

len2108691

9/9/2018 11:18:42 PM

PART 2 Excellent Good Fair Unsatisfactory Criteria 1: 3 marks 2 marks 1 mark 0 mark • Read and report requested properties of the provided data set. Successfully Successfully Successfully Failed to complete any • Split the data into 70% training data and 30% test data. Fit a One-vs-Rest Classifier. completed all three completed any two of completed any one of given task. tasks. the three tasks. the three tasks.

len2108691

9/9/2018 11:18:33 PM

Criteria 3: 5 marks 3 marks 2 marks 0 mark • For L1 model, choose the best alpha value from the provide set of values. Successfully Successfully Successfully Failed to complete any • For L2 model, choose the best lambda value from the provided set of values. completed all three completed any two of completed only one of given task. • Evaluate the prediction performance on test data, report results and discuss if there tasks. the three tasks. the three tasks. is any sign of underfitting or overfitting with appropriate reasoning.

len2108691

9/9/2018 11:18:23 PM

Criteria 2: 5 marks 3 marks 2 marks 0 mark • Train logistic regression model with L1 regularization using alpha = 0.1. Successfully Successfully Successfully Failed to complete any • Train logistic regression model with L2 regularization using lambda = 0.1. completed all three completed any two of completed any one of given task. • Report accuracy, precision, recall, f1-score and print the confusion matrix. tasks. the three tasks. the three tasks.

len2108691

9/9/2018 11:17:58 PM

Criteria Excellent Good Fair Unsatisfactory PART 1 Criteria 1: 3 marks 2 marks 1 mark 0 mark • Read the training and testing data. Print the number of features in the dataset. Successfully Successfully Successfully Failed to complete any • For the data label, print the total number of 1's and 0's in the training and testing completed all four completed at least 2 completed only one task satisfactorily. data. Comment on the class distribution. Is it balanced or unbalanced? tasks. tasks and satisfactorily task. • Print the number of features with missing entries. tried other tasks. • Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries. • Normalize the training and testing data.

len2108691

9/9/2018 11:17:41 PM

This document supplies detailed information on assessment tasks for this unit. Key information • Due: 5th by 11.30pm AEST • Weighting: 25% • Word count: Max 30 pages Learning Outcomes This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO): Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO) ULO 2: Work collaboratively and apply linear and GLO 1: Discipline knowledge and capabilities logistic regression, and linear Support Vector GLO 4: Critical thinking Machines for designing accurate classifier. GLO 5: Problem solving ULO 5: Implement model selection and compute GLO 1: Discipline knowledge and capabilities relevant evaluation measure for a given problem. GLO 4: Critical thinking

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd