Reference no: EM132349458
Advanced Methods For Analytics Assignment -
You have been given a starting observation number for the Excel spreadsheet Final Exam Data.xlsx. All of the data you will need will be in the 80 rows of observations that begin with your starting observation number. You submission is to be a pdf, and you are to restrict your answers to the space provided.
Challenge 1 - The data in columns B through J involve hypothetical sales of franchised auto dealerships nationwide. The variables are:
1) PRICE (the sales price, in $105, of the dealership);
2) SALES (the dealership's most recent annual sales, in $106);
3) AGE (the age, in months, of the dealership);
4) UNITS (the dealership's most recent unit sales);
5) ACREAGE (the footprint, in acres, of the dealership); 5) BLDG (the footprint, in 103 ft2 of the dealership's building(s));
6) COMPS (the number of franchised dealership competitors in the dealership's market;
7) COBRNDS (the number of other franchised dealerships in that market owned by the dealership's owners); and 8) WITHIN (the number of franchised dealerships located within 3 miles of the dealership).
Use the first 60 of your observations for your training sample and the next 20 as your validation sample. With the former use your regression and regression-model-building skills to estimate a "good" (by your standards) model to predict the sales price of a dealership. Evaluate that model using your validation sample. In the space provided, report the steps you took and the conclusions you arrived at, as well as your assessment of your model's performance.
Challenge 2 - A state is considering an overhaul of its restaurant health-inspection protocol. The data in columns K through O resulted from inspections done ("Pass" or "Fail" using the proposed protocol) on a large number of of small (seating cap. < 50) restaurants. These columns also include: 1) EXPER (the number of years of experience of that restaurant's general manager; 2) AGE (the number of years that particular restaurant has been in that particular location; 3) CHAIN (whether that restaurant is part of a chain; 1 = Yes); and 4) REGION (the region (A, B or C) in the state of that location).
A) Using your first 60 observations as a training sample, formulate (and summarize) a model that would allow you to predict whether a particular location will pass the inspection. How do those predictors that you use in your model influence the likelihood of a location's passing?
B) Using your fitted prediction model, estimate the likelihood of your 20 held-out restaurants passing the inspection (use this rule ... if the estimated probability of passing is less than or equal to 0.45, forecast that restaurant as a "fail"; if the estimated probability is greater than or equal to 0.55, forecast that restaurant as a "pass"). Summarize how well this fitted model works with a 2 x 2 table.
Challenge 3 - Use the data in your training sample from the previous challenge to formulate a classification tree (pruned according to the "minimum xerror" rule). Describe the "paths" that your tree retains (e.g., "If Age ≤ 12 and Exper > 4 then "PASS"). In a simple 2 x 2 table, report how well your classification tree does in predicting the pass/fail question in your validation sample.
Challenge 4 - The data in column P are a quarterly time series depicting unique visitors to a health care website over a 20-year period. For this challenge, use the first 72 periods as your training sample and the last 8 periods as your validation sample. If you had used multiplicative Loess decomposition to forecast those last 8 quarters, what would the correlation between your forecasts and the actual values have been? What would it have been had you used an ARIMA(p,d,q)(P,D,Q)4 model?
Challenge 5 - A consultant for the mortgage lending industry has developed a new technique for assessing the risk involved in lending to those with less-than-stellar credit profiles. The technique involves using data-mining techniques to create a Personal Financial Responsibility (PFR) index. To evaluate the predictive power of this index, the consultant selected a random sample of borrowers and evaluated their risk using PFR. The consultant then compared borrowers' PFR to their mortgage payment performance (a rating based on a variety of factors with a range of 0 to 200). Data from the sample were as follows:
|
PFR
|
Payment Performance
|
|
89
|
188
|
|
99
|
140
|
|
90
|
108
|
|
31
|
29
|
|
54
|
82
|
|
60
|
49
|
|
52
|
20
|
|
65
|
64
|
|
32
|
41
|
|
67
|
31
|
A) Conduct an appropriate hypothesis test to evaluate whether these data provide sufficient evidence regarding PFR's usefulness as a predictor of mortgage payment performance.
B) If you conclude PFR is a useful predictor of mortgage payment performance, how effective is it?
C) Construct a 95% confidence interval for the mean Payment Performance given a PFR score of 58.
D) Construct a 95% prediction interval for an individual's Payment Performance given his/her PFR score is 58.
Attachment:- Advanced Methods For Analytics Assignment & Data File.rar