Estimate and report the optimal value for lambda

Assignment Help Programming Languages
Reference no: EM132389022

Assignment

The objective of this assignment is to use ridge regression and the lasso in order to train a number of regression models for prediction. You will use a data set from the University of Wisconsin where each record represents follow-up data for one breast cancer case after surgery. 

The data set contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.  They describe characteristics of the cell nuclei present in the image.

Information about the outcome of the patient is also included, such as time to recurrence or time to last seen, for those who have not experienced recurrence yet. Here, time to recurrence will be considered as the response variable of interest.
 
The information of the variables in the data are shown below, corresponding to the numbering of the columns:
 
1) ID number

2) Outcome (R = recur, N = nonrecur)

3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N)
 
4-33) Ten real-valued features are computed for each cell nucleus:
 
1. a) radius (mean of distances from center to points on the perimeter)

2. b) texture (standard deviation of gray-scale values)

3. c) perimeter

4. d) area

5. e) smoothness (local variation in radius lengths)

6. f) compactness (perimeter^2 / area - 1.0)

7. g) concavity (severity of concave portions of the contour)

8. h) concave points (number of concave portions of the contour)

9. i) symmetry

10. j) fractal dimension ("coastline approximation" - 1)
 
The mean, standard error (SE), and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.  For instance, column 4 contains Mean Radius, column14 is Radius SE and column 24 is Worst Radius.
 
34) Tumor size - diameter of the excised tumor in centimeters

35) Lymph node status - number of positive axillary lymph nodes observed at time of surgery
 
The dataset has been prepared in a .csv format in the file bc_data.csv.
 
Tasks
 
1. Read the data into R, making sure that you code the missing values properly. The character “?” is used for denoting missing values in the .csv file. Notice that there is no header in the data file.
 
2. In your analysis you will use as predictors and focus only on the mean values of the above described (a) – (j) FNA features (which as found in columns 4-13), and the variables found in columns 34 and 35. You first need to convert the number of axillary nodes (column 35) into a categorical variable, with three levels: 0, 1-3, 4 or more.

Make a subset of the original dataset with only those with recurrence. Using this dataset, generate appropriate descriptive statistics and plots for the predictors.
 
3. Using as predictors the 12 features described in 2 (11 continuous and 1 categorical), train a ridge regression model for prediction of time to recurrence. Use the default grid of values for the lambda parameter in the glmnet R function. Make a plot showing the coefficients of these predictors for different levels of regularization. Comment on the results.

4. Using a 5-fold cross-validation estimate and report the optimal value for lambda (i.e. that minimizes the MSE). Make a plot showing the MSE against the values of log(lambda). Report the coefficients of the predictors for the optimal lambda value.
 
5. Calculate the MSE on the whole set of the recurrent group for the model using the optimal lambda value.
 
6. Repeat tasks 3-5 above but this time using the lasso method. This time report also what the selected features are for the optimal lambda value.
 
7. Make some comments on how the two methods compare based on the analysis you did and the results you generated above. Suggest a rigorous method or approach of actually comparing the performance of the two prediction methods on these data. You do not have to apply this comparison method.
 
8. Make comments on the overall appropriateness of this “design” (i.e. choice of response variable and dataset) for achieving the objective.
 
You will need to submit separately two files, one with your report (in either Word or pdf format) and one separate R file with your code.

Make sure you annotate your figures and format your tables properly.  Make sure you organize and document your code properly.

Reference no: EM132389022

Questions Cloud

Negative control in bendict test for reducing sugar : What is the negative control in Bendict Test for reducing sugar and why is it considered a negative control?
While red blood cells carry oxygen from the lungs to cells : While red blood cells carry oxygen from the lungs to cells throughout the body, they must also carry CO2 from those cells back to the lungs.
What cell structures did you place in the plant cell : 1. What cell structures did you place in the plant cell that you did not place in the animal cell?
Important to our understanding of genetics : 1. Why were Gregor Mendel's studies important to our understanding of genetics?
Estimate and report the optimal value for lambda : Calculate the MSE on the whole set of the recurrent group for the model using the optimal lambda value. Estimate and report the optimal value for lambda.
What is nondisjunction : 1. At the end of meiosis II, how many daughter cells are there in all? Are they haploid or diploid?
What are pigments : What are pigments? Which is the main photosynthetic pigment? What wavelengths (colors) does it absorb and reflect?
What are the internal structures of the cholorplast : What are the internal structures of the cholorplast and what parts of photosynthesis take place in each part?
Century of population explosion and fossil fuels burning : The authors assert that the twentieth century is considered as the century of population explosion and fossil fuels burning, environmental policies

Reviews

Write a Review

Programming Languages Questions & Answers

  Write a haskell program to calculates a balanced partition

Write a program in Haskell which calculates a balanced partition of N items where each item has a value between 0 and K such that the difference b/w the sum of the values of first partition,

  Create an application to run in the amazon ec2 service

In this project you will create an application to run in the Amazon EC2 service and you will also create a client that can run on local machine and access your application.

  Explain the process to develop a web page locally

Explain the process to develop a Web page locally

  Write functions

These 14 questions covers java class, Array, link list , generic class.

  Programming assignment

If the user wants to read the input from a file, then the output will also go into a different file . If the user wants to read the input interactively, then the output will go to the screen .

  Write a prolog program using swi proglog

Write a Prolog program using swi proglog

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Create a application using the mvc architecture

create a application using the MVC architecture. No scripting elements are allowed in JSP pages.

  Develops bespoke solutions for the rubber industry

Develops bespoke solutions for the rubber industry

  Design a program that models the worms behavior

Design a program that models the worm's behavior.

  Writing a class

Build a class for a type called Fraction

  Design a program that assigns seats on an airplane

Write a program that allows an instructor to keep a grade book and also design and implement a program that assigns seats on an airplane.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd