How do the coefficient estimates from the usual rule

Assignment Help Engineering Mathematics
Reference no: EM131446643

1. This exercise revisits the Hitters data set.

(a) The glmnet() function, by default, internally scales the predictor variables so that they will have standard deviation 1, before solving the ridge regression or lasso problems. This is a result of its default setting standardize=TRUE. Explain why such scaling is appropriate for this application.

(b) Verify that, for a very small value of λ, both the ridge regression and lasso estimates are very close to the least squares estimates. Also verify that, for a very large value of λ, both the ridge regression and lasso estimates approach 0 in all components (except the intercept, which is not penalized by default).

(c) An alternative method for selecting the tuning parameter λ is to use the one-standard-error rule. Under this rule, instead of choosing λ to minimize test MSE, the largest value of λ for which the test MSE is within one standard error of the minimum is chosen. Provide a rationale for the one-standard-error rule.

(d) For each of the ridge regression and lasso models corresponding to the grid of λ values defined in the notes, perform 5-fold cross-validation to determine the best value of λ. Report the results from both the usual minimum MSE rule, and the one-standard-error rule for choosing λ. Note that the cv.glmnet() returns the value of λ selected using the one standard-error rule under the name lambda.1se.

(e) From the last part, you should have computed 4 values of the tuning parameter:

λridgemin , λridge1se , λlassomin , λlasso1se

These are the results of running 5-fold cross-validation on each of the ridge and lasso models, and using the usual rule (min) or the one-standard-error rule (1se) to select λ. Now, using the predict() function, with type="coef", report the coefficient estimates at the appropriate values of λ. That is, you will report two coefficient vectors coming from ridge regression with λ = λridgemin and λ = λridge1se , and likewise for the lasso. How do the coefficient estimates from the usual rule compare to those from the one standard error rule? How do the ridge estimates compare to those from the lasso?

(f) Suppose that you were coaching a young baseball player who wanted to strike it rich in the major leagues. What handful of attributes would you tell this player to focus on?

2. Predic the number of applications received (Apps) using the other variables in the College data set, which is available in the ISLR library.

(a) Use ?College to access information about the data set and answer the following questions. Note that you may also find the summary() function useful.

i. Not including Apps, how many variables are in the data set? In other words, what is p?

ii. Are there any missing values in the data set? If so, remove them.

iii. What is the sample size (once missing values have been removed, if necessary)? In other words, what is N?

iv. Are there any qualitative variables in the data set? If so, list them.

(b) Split the data set into a training set and a test set.

(c) Fit a linear model using least squares on the training set and report the test error obtained.

(d) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

(e) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

(f) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these three approaches?

Reference no: EM131446643

Questions Cloud

How does my conclusion address the complexities of the issue : How can I validate the accuracy of my statement? How is this information relevant? How does my conclusion address the complexities of the issue? What is another interpretation or viewpoint of the issue?
Which need the most emphasis and directed efforts : The Department of Homeland Security (or DHS) regularly publishes a list of strategic challenges to national security in the United States. The challenges are aspects of their mission in which need the most emphasis and directed efforts.
How is tobacco depicted culturally : The purpose of the paper is to gain and in depth understanding of tobacco based on biological, pharmacological, and psychological information. Then use the information to examine social perceptions. FACTS about tobacco, it's properties and it's u..
Daimler-chrysler group : Assuming that Daimler-Chrysler group needs to cover its expenses in Korea and thus wants to hedge its won exposure using a forward contract with a US bank in Korea, what is the minimum amount of won they should receive on August 30th, 2007 given t..
How do the coefficient estimates from the usual rule : These are the results of running 5-fold cross-validation on each of the ridge and lasso models, and using the usual rule (min) or the one-standard-error rule (1se) to select λ. How do the coefficient estimates from the usual rule compare to those ..
Review the vargas family case study : Review the available literature regarding the sustainability of solution-focused therapy. Write a 350-500-word Op-ed article with your thoughts and argue for or against the use of brief therapy. Support your argument with at least two academic re..
What are component that contribute to port security planning : What are the components that contribute to port security planning. Why are they important and what aspects of security planning are considered in port facility operations?
Why cultural differences is important : In business, recognizing cultural differences is important because a. Doing so helps reduce the chances for misunderstanding b. Someone from another culture may try to take advantage of your ignorance.
Organize and develop the three parts : Ensure that your PowerPoint presentation fulfills the appropriate length requirements and professional style requirements - Include a title slide containing the title of the assignment, your name, the professor's name, the course title, and the date..

Reviews

len1446643

3/31/2017 8:34:20 AM

Please see attached document. You will need to use data set Hitters for problem 1 and data set college for problem 2 both data sets are available in ISLR library in R/RSTUDIO. Please provide solution in MS Word format.

Write a Review

Engineering Mathematics Questions & Answers

  Draw the corresponding transition diagram

Draw the corresponding transition diagram (digraph). Provide 5 strings that are in the language generated by the automaton.

  Decribe monthly inventories to minimize total cost

R&R keeps at most two of each model in inventory each month but wants to have at least one of Model D in inventory at all times. The current inventory of each model is 2.

  Derive piecewise quadratic cumulative distribution function

What must the height h of the tent be, and why? Use your answer to derive a piecewise-linear function defining the probability density function p(x) for the domain x ∈ [0, 1]. Derive a piecewise quadratic cumulative distribution function cdf(x) for..

  Problem regarding the industrial home-improvement

An industrial home-improvement shop produces custom atrium arches, window frames, and doors. These products are made from mahogany and maple wood.

  Null and the alternative hypothesis

For problem below, state the null and the alternative hypothesis, determine a critical value, present a test statistic, and provide a p-value and your decision. Use

  Formulate a linear programming model for the given problem

Formulate a linear programming model for this problem. Solve the model with Excel Solver. Interpret your solution. If you have the chance to buy additional land at the price of $50 per acre, would you prefer to do so? What if the price would be $70..

  Develop simulations that can be readily tested in matlab

BE101 Engineering Mathematics - Is your implementation exactly the same as the algorithm described in the paper? If they are not exactly the same, why did you make those changes?

  Projects a profit contribution

The company's accounting department projects a profit contribution of $2,400 for each EZ-Rider produced and $1,800 for each Lady-Sport produced.

  Convective heat transfer coefficient

The brake shoe and steel drum on a car continuously absorbs 25 W as the car slows down. Assume a total outside surface area of 0.1 m2 with a convective heat transfer coefficient of 10 W/m2 K to the air at 20°C. How hot does the outside brake and d..

  What is the expected completion time of the project

What is the expected completion time of the project? Which (if any) tasks should be crashed, to make up the lost time? Why? What is the additional cost of crashing the project?

  What is the initial rate of cooling

What is the initial rate of cooling? How long does it take for the wafer to reach a temperature of 50°C? Comment on how the relative effects of convection and radiation vary with time during the cooling process.

  Compare and contrast financial management traits

Compare and contrast financial management traits of public sector/ NFP virus the private sector. Secondly, opine on what financial management traits of the private sector are worth "stealing"/ replicating to the benefit of government?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd