Reference no: EM131083044
Prostate cancer Data:
Hastie, Tibshirani and Friedman (2001) analyze data taken from Stamey et al. (1989). According to Hastie, Tibshirani and Friedman: The goal is to predict the log-cancer volume (lacavol) from a number of measurements including log prostate weight (lweight), age, log of benign prostatic hyperplasia (lpbh), seminal vesicle invasion (svi) "Categorical Variable", log of capsular penetration (lcp), Gleason score (gleason), and percent of Gleason scores 4 or 5, train "Categorical Variable".
Guide: Please try to answer the following questions:
1) Start by indentifying your response variable and your potential predictors.
2) Create a summary statistics for each of the variables in your data. What can you tell about each variable?
3) Find the best predictor (just one numerical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics). Create a correlation matrix (you may graph it).
4) Find X`X, then use the matrix notations and calculations to get the best linear model of your best predictor. Show how you can get all elements of the anova table for your simple linear regression. Find an estimate for σ2.
5) Use both Inverse response plot and Box-Cox to find the best power of the predicted variable and perform the transformation, check your model, is there any enhancement?
6) Find the best predictor (just one categorical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics).
7) Use the t.test analysis and calculations to get the best linear model of your best predictor.
8) Now create a Multiple linear regression using numerical predictors only. Summaries your results and check the assumptions. Show how you can get all elements of the anova table for your multiple linear regression. . Find an estimate for σ2.
9) Do you have a multicollinearity problem? Do you have a heteroscedasticity problem? What can you do to fix those? Summaries your final model. Show how would you calculate the VIF for one predictor (Best one).
10) Now with the fixed model (part 9)that has the less number of predictors (reduced). Show that the difference in the R2 is not significant using F test as well as using anova command in R.
11) Use matrix notations and calculations to get the reduced model coefficients. Show that both approaches give the same results. Find the var-cov matrix of the Betas. Find the Hat matrix (H) then calculate the sum of its diagonal.
12) Add the two categorical variables to your reduced model. Is your R2 better? Are those variables significant? Explicitly write all the possible models derived from the main model for each category then interpret the coefficients of each model.
13) State your final model, check its assumptions, Is there any room to enhance it (MMPs, AVPlots, Box-Cox and Inverse response plot. create the anova table for your model. Find an estimate for σ2.
14) Make a table for the AIC, AICc and BIC for your simple linear model, full model and your final reduced model.
15) Write a paragraph explaining your rational of ending up with such a model.
Attachment:- Data.rar
Components and process of creating annual report
: Write an e-mail to your new employee explaining the components and the process of creating an annual report. Focus strongly on how accounting inputs are formulated into financial reports and statements for the organization.
|
Calculate the suta and futa taxes for the employer
: Assume that the SUTA tax rate is 6.5% and the FUTA tax rate is .7%. Also assume that SUTA and FUTA taxes are collected on the first $7,850. Calculate the SUTA and FUTA taxes for the employer.
|
Key characteristics of the development chain
: 1. What are the challenges in concurrent product and supply chain design in regards to key characteristics of the development chain. 2. Can inventory management and risk pooling theory be used to explain the rationale for an ATO strategy
|
Investors in a typical partnership
: A disadvantage of the corporate form of organization is that corporate stockholders are more exposed to personal liabilities in the event of bankruptcy than are investors in a typical partnership.
|
Create multiple linear regression using numerical predictors
: Now create a Multiple linear regression using numerical predictors only. Summaries your results and check the assumptions. Show how you can get all elements of the anova table for your multiple linear regression. . Find an estimate for σ2
|
Developing productivity standards for him departments
: For this week's second written assignment, please Read the article by Judy Sturgeon called "Tips for Setting Productivity Standards" (under week 5 Supporting Lesson Links).
|
What are hr metrics and hr analytics
: What are HR metrics and HR analytics? What do organization use HR metrics and analytics for? How can they be used for HRM decision, and what is their importance?
|
Identify the names of the dominant life forms
: identify the names of the dominant life forms that existed during the choseon era. Provide information to the traveler about plant forms they may see what they should avoid and other dangers that may be present.
|
Is this an ethical advertisement
: A dentist advertises that he specializes in creating "dazzling smiles." In your opinion, is this an ethical advertisement? Explain your answer.
|