What is the objective of the data collection process

Assignment Help Other Subject
Reference no: EM132121974

Machine Learning Project Assessment -

Learning Outcomes -

  • Perform linear regression, classification using logistic regression and linear Support Vector Machines.
  • Perform non-linear classification using Support Vector Machines with kernels, Decision trees and Random forests.
  • Understand the concept of maximum likelihood and Bayesian estimation.
  • Construct a multi-layer neural network using backpropagation training algorithm.
  • Perform model selection and compute relevant evaluation measure for a given problem.

Purpose - This assessment is an extensive machine learning project. Students will be given a specific data set for analysis and will be required to develop and compare various classification techniques. Each student must demonstrate skills acquired in data representation, classification and evaluation.

Instructions

  • the dataset consists of training and testing data in "train" and "test" folders. Use training data: X_train.txt labels: y_train.txt and testing data: X_test.txt labels: y_test.txt. There are other files that also come with the dataset and may be useful in understanding the dataset better.
  • Please read the pdf file "dataset-paper.pdf" to answer Part 1.

Task A: Understanding the data

Answer the following questions briefly, after reading the paper

  • What is the objective of the data collection process?
  • What human activity types does this dataset have? How many subjects/people have performed these activities?
  • How many instances are available in the training and test sets? How many features are used to represent each instance? Summarize the type of features extracted in 2-3 sentences.
  • Describe briefly what machine learning model is used in this paper for activity recognition and how is it trained. How much is the maximum accuracy achieved?

Task B: K-Nearest Neighbor Classification

Build a K-Nearest Neighbor classifier for this data.

  • Let K take values from 1 to 50. For choosing the best K, use 10-fold cross-validation. Choose the best value of K based on model F1-score.
  • Show a plot of cross-validation accuracy with respect to K.
  • Using the best K value, evaluate the model performance on the supplied test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.

Task C: Multiclass Logistic Regression with Elastic Net

Build an elastic-net regularized logistic regression classfier for this data.

  • Elastic-net regularizer takes in 2 parameters: alpha and l1-ratio. Use the following values for alpha: 1e-4,3e-4,1e-3,3e-3, 1e-2,3e-2. Use the following values for l1-ratio: 0,0.15,0.5,0.7,1.
  • Choose the best values of alpha and l1-ratio using 10-fold cross-validation, based on model F1-score.
  • Draw a surface plot of F1-score with respect to alpha and l1-ratio values.
  • Use the best value of alpha and l1-ratio to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.

Task D: Support Vector Machine (RBF Kernel)

Build a SVM (with RBF Kernel) classfier for this data.

  • SVM with RBF takes 2 parameters: gamma (length scale of the RBF kernel) and C (the cost parameter). Use the following values for gamma: 1e-3, 1e-4. Use the following values for C: 1, 10, 100, 1000.
  • Choose the best values of gamma and C using 10-fold cross-validation, based on model F1-score.
  • Draw a surface plot of F1-score with respect to gamma and C.
  • Use the best value of gamma and C to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.

Task E: Random Forest

Build a Random forest classifier for this data.

  • Random forest uses two parameters: the tree-depth for each decision tree and the number of trees. Use the following values for the tree-depth: 300,500,600. Use the following values for the number of trees: 200,500,700.
  • Choose the best values of tree-depth and number of trees using 10-fold cross-validation, based on model F1-score.
  • Draw a surface plot of F1-score with respect to tree-depth and number of trees.
  • Use the best value of tree-depth and number of trees to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.

Task F: Discussion

Write a brief discussion about which classification method achieved the best performance. Your thoughts on the reason behind this. What method performed the worst? Could you do better or worse than the results in the dataset paper? Do you have any suggestions to further improve model performances?

Reference no: EM132121974

Questions Cloud

Produce a public relations campaign proposal : Produce a Public Relations Campaign Proposal document and an essay that explains the theory behind your planned approach to the Proposal task
Probabilities of occurring : Given two events G and H, the probabilities of each occurring are as follows: P(G) = 0.22; P(H) = 0.34; P(H AND G) = 0.09. Using this information:
Pay in order to emulate starbucks best strategies : Make several recommendations for pick n pay in order to emulate starbucks best strategies.
Identify a market research professional or company : You will be conducting an interview with a market research professional or a company representative.
What is the objective of the data collection process : SIT720 Machine Learning Project Assessment - Deakin University Australia. What is the objective of the data collection process
Calculate the probability that 1 carry intestinal parasites : Imagine that 5 individuals are sampled at random from this population. Calculate the probability that the average calculated will be less than the value:
Henry checks the collar that has tag with dale address : Henry checks the collar that has a tag with Dale’s address and returns the dog, not knowing of the offer. Is Henry entitled to the $100 reward?
Have jim and donna breached their contract of sale : Clem and Clara are shocked when they move in and find the chandelier is gone. Have Jim and Donna breached their contract of sale?
Develop a profile on the professional networking site : Develop a profile on the professional networking site, LinkedIn. Please submit 1- to 2 page essay describing the development of your LinkedIn profile.

Reviews

len2121974

9/25/2018 2:47:37 AM

This assessment is an extensive machine learning project. Students will be given a specific data set for analysis and will be required to develop and compare various classification techniques. Each student must demonstrate skills acquired in data representation, classification and evaluation. Submission details - Deakin University has a strict standard on plagiarism as a part of Academic Integrity. Late submission penalty is 5% per each 24 hours from . No marking on any submission after 5 days (24 hours X 5 days from ) Be sure to downsize the photos in your report before your submission in order to have your file uploaded in time. Referencing - You must correctly use the Harvard method in this assessment. See the Deakin referencing guide.

len2121974

9/25/2018 2:47:31 AM

Criteria 1: Understand the data by reading the provided research article and answer four questions asked in the Part 1 of the assignment. Criteria 2: Build a K-Nearest Neighbor classifier for this data: Choose the best K value from given set of values and F1-score. Show a plot of cross-validation accuracy with respect to K. Using the best K value, evaluate the model performance using the supplied test set. Report the results as requested in the assignment.

len2121974

9/25/2018 2:47:25 AM

Criteria 3: For L1 model, choose the best alpha value from the provide set of values. For L2 model, choose the best lambda value from the provided set of values. Evaluate the prediction performance on test data, report results and discuss if there is any sign of underfitting or overfitting with appropriate reasoning. Criteria 4: Build a SVM (with RBF Kernel) classifier for this data. SVM with RBF takes 2 parameters: gamma (length scale of the RBF kernel) and C (the cost parameter). Use the following values for gamma: 1e-3, 1e-4. Use the following values for C: 1, 10, 100, 1000. Choose the best values of gamma and C using 10-fold cross-validation, based on model F1-score. Draw a surface plot of F1-score with respect to gamma and C. Use the best value of gamma and C to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.

len2121974

9/25/2018 2:47:19 AM

Criteria 5: Build a Random forest classfier for this data. (6 Marks) Random forest uses two parameters: the tree-depth for each decision tree and the number of trees. Use the following values for the tree-depth: 300,500,600. Use the following values for the number of trees: 200,500,700. Choose the best values of tree-depth and number of trees using 10-fold crossvalidation, based on model F1-score. Draw a surface plot of F1-score with respect to tree-depth and number of trees. Use the best value of tree-depth and number of trees to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy. Criteria 6: Write a brief discussion about which classification method achieved the best performance. Your thoughts on the reason behind this. What method performed the worst? Could you do better or worse than the results in the dataset paper? Do you have any suggestions to further improve model performances? Successfully completed all the parts.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd