Create multiple linear regression using numerical predictors

Assignment Help Basic Statistics
Reference no: EM131083044

Prostate cancer Data:

Hastie, Tibshirani and Friedman (2001) analyze data taken from Stamey et al. (1989). According to Hastie, Tibshirani and Friedman: The goal is to predict the log-cancer volume (lacavol) from a number of measurements including log prostate weight (lweight), age, log of benign prostatic hyperplasia (lpbh), seminal vesicle invasion (svi) "Categorical Variable", log of capsular penetration (lcp), Gleason score (gleason), and percent of Gleason scores 4 or 5, train "Categorical Variable".

Guide: Please try to answer the following questions:

1) Start by indentifying your response variable and your potential predictors.

2) Create a summary statistics for each of the variables in your data. What can you tell about each variable?

3) Find the best predictor (just one numerical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics). Create a correlation matrix (you may graph it).

4) Find X`X, then use the matrix notations and calculations to get the best linear model of your best predictor. Show how you can get all elements of the anova table for your simple linear regression. Find an estimate for σ2.

5) Use both Inverse response plot and Box-Cox to find the best power of the predicted variable and perform the transformation, check your model, is there any enhancement?

6) Find the best predictor (just one categorical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics).

7) Use the t.test analysis and calculations to get the best linear model of your best predictor.

8) Now create a Multiple linear regression using numerical predictors only. Summaries your results and check the assumptions. Show how you can get all elements of the anova table for your multiple linear regression. . Find an estimate for σ2.

9) Do you have a multicollinearity problem? Do you have a heteroscedasticity problem? What can you do to fix those? Summaries your final model. Show how would you calculate the VIF for one predictor (Best one).

10) Now with the fixed model (part 9)that has the less number of predictors (reduced). Show that the difference in the R2 is not significant using F test as well as using anova command in R.

11) Use matrix notations and calculations to get the reduced model coefficients. Show that both approaches give the same results. Find the var-cov matrix of the Betas. Find the Hat matrix (H) then calculate the sum of its diagonal.

12) Add the two categorical variables to your reduced model. Is your R2 better? Are those variables significant? Explicitly write all the possible models derived from the main model for each category then interpret the coefficients of each model.

13) State your final model, check its assumptions, Is there any room to enhance it (MMPs, AVPlots, Box-Cox and Inverse response plot. create the anova table for your model. Find an estimate for σ2.

14) Make a table for the AIC, AICc and BIC for your simple linear model, full model and your final reduced model.

15) Write a paragraph explaining your rational of ending up with such a model.

Attachment:- Data.rar

Reference no: EM131083044

Questions Cloud

Components and process of creating annual report : Write an e-mail to your new employee explaining the components and the process of creating an annual report. Focus strongly on how accounting inputs are formulated into financial reports and statements for the organization.
Calculate the suta and futa taxes for the employer : Assume that the SUTA tax rate is 6.5% and the FUTA tax rate is .7%. Also assume that SUTA and FUTA taxes are collected on the first $7,850. Calculate the SUTA and FUTA taxes for the employer.
Key characteristics of the development chain : 1. What are the challenges in concurrent product and supply chain design in regards to key characteristics of the development chain. 2. Can inventory management and risk pooling theory be used to explain the rationale for an ATO strategy
Investors in a typical partnership : A disadvantage of the corporate form of organization is that corporate stockholders are more exposed to personal liabilities in the event of bankruptcy than are investors in a typical partnership.
Create multiple linear regression using numerical predictors : Now create a Multiple linear regression using numerical predictors only. Summaries your results and check the assumptions. Show how you can get all elements of the anova table for your multiple linear regression. . Find an estimate for σ2
Developing productivity standards for him departments : For this week's second written assignment, please Read the article by Judy Sturgeon called "Tips for Setting Productivity Standards" (under week 5 Supporting Lesson Links).
What are hr metrics and hr analytics : What are HR metrics and HR analytics? What do organization use HR metrics and analytics for? How can they be used for HRM decision, and what is their importance?
Identify the names of the dominant life forms : identify the names of the dominant life forms that existed during the choseon era. Provide information to the traveler about plant forms they may see what they should avoid and other dangers that may be present.
Is this an ethical advertisement : A dentist advertises that he specializes in creating "dazzling smiles." In your opinion, is this an ethical advertisement? Explain your answer.

Reviews

Write a Review

Basic Statistics Questions & Answers

  The test is administered to all students and the test

q1. a study is conducted to determine if time of day has any effect on test grades. an introductory psychology class is

  Confidence interval for the mean monthly rent

Find a 95% confidence interval for the mean monthly rent for unfurnished one-bedroom apartments available for rent in this community.

  What is the mean of the sampling distribution of p a

a greenhouse in a tri-county area has kept track of its customers for the last several years and has determined that 28

  Determine the test statistic show all work writing the

a certain researcher thinks that the proportion of women who say that the earth is getting warmer is greater than the

  Find a linear correlation coefficient r and find the

the blood pressure measurements of a single patient were taken by twelve different medical student and the results are

  A family history of allergic disease.

(a) Give a 95% confidence interval for the proportion p of children with asthma, in the general population of children who were exposed to parac-etamol in early life and have a family history of allergic disease.

  Use linear regression to forecast values for periods 11 to

use linear regression to forecast values for periods 11 to 13 for the following time

  Reviews the earnings of three divisions

A corporate CEO reviews the earnings of three divisions over the last eighteen months. Each division reports its earnings quarterly.

  What is the conditional probability that th affected person

suppose an unrelated 77 yr old man76 yr old woman and 82 yr old woman are selected frfrom a community..suppose we know

  Let a be the number of rounds until lionel wins his first

lionel wants to play the crane machine arcade game the one with the mechanical claw to win a stuffed animal. he is good

  Use of confidence intervals

Analysis of how hypothesis testing and the use of confidence intervals are the same and analyze how these are different.

  National cancer-incidence rates

Twelve cases of leukemia are reported in people living in a particular census tract over a 5-year period. Is this number of cases abnormal if only 6.7 cases would be expected based on national cancer-incidence rates?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd