Reference no: EM133548641
Read through the Anderson (n.d.) case study thoroughly. Ensure you follow the links to additional analysis, evidence, graphs, and so on. Also, it is important to remember that it is far easier to find fault in others' work than it is to do the work yourself.
There are two topics sections of topics. From the first section, choose one of the topics. In the second section, discuss all of the topics.
Section 1
What is driving this project? (What drives a data science project?) Explain.
Consider the stages presented in this course for data science projects. Connect each of these stages to the stages used in this case study. Explain each connection you make.
Consider the suggestions that the case study analyst provided.
What evidence in the analysis supports each of the suggestions? Explain.
If you feel that there is insufficient evidence to support one of the suggestions. Explain.
Section 2
Regarding the scatter plots used in this case study:
Several scatter plots have many overlapping data points. Is it possible to accurately interpret a plot like this? Explain. Justify your reasoning.
The scatter plots are labeled with a value the author assigned to R2. The author later identifies this as a measure of "correlation." Pearson's product-moment correlation coefficient is the score most refer to as "correlation." However, there are many formal statistical tests for correlation, so it's always important to specify which type is used. The metric identified as R2 is not used for any of the formal statistical tests for correlation. However, in linear regression, where there is one independent and one dependent variable, the coefficient of determination (R2) is the absolute value of the score for Pearson's product-moment correlation coefficient (the results of which is called the rho value or r). Whether using linear regression or Pearson's product-moment correlation, there are several formal statistical assumptions that the data must meet. In particular, both of these analysis methods are extremely susceptible to the influence of outliers.
Based on the plots and this information, what can you determine regarding the results associated with these plots and scores? Explain.