Reference no: EM132392214
Project Assignment
For this assignment, you need to subset your data file. If your student id number ends in 0, 1, 2 or 3, you should analyze the data subset for which industry = 8 (Health), if your student id number ends in 4, 5, or 6, you should analyze the data subset for where industry = 11 (Manufacturing), and if your student id number ends 7, 8, or 9, you should analyze data for where industry = 15 (Retail).
Because different students analyze different subsets, your answers will be different to those of other students. Please see this week's instructional video to see how you can subset your data appropriately. Importantly, in this assignment, I am looking to see your substantive interpretation of the statistical results (i.e. your interpretation and conclusions matter as much as the statistical analysis!).
Questions:
i) What ‘level of measurement' are the ‘data science' variables in this data set (i.e. Extraction, Modeling, Visualization, Statistics, Programming, and Experimentation. How might this impact the analyses you perform?
ii) Undertake data screening and cleaning. Ensure you recode any missing values appropriately, and make sure you examine the patterns of missing data in your analysis, including addressing both ‘don't know' and ‘missing data' responses. Note, we have a lecture scheduled on missing data analysis on Tuesday 15 th , a video will be uploaded ahead of this class.
iii) Create three ‘composite scores' by creating an average of the items for each scale. Composite 1 should include the average of the variables Extraction and Modeling, Composite 2 should include the average of the variables Experimentation and Statistics, Composite 3 should include the average of the variables Programming and Visualization.
Summarize and interpret each composite distribution by presenting a box plot (also known as a 5- point summary) for the variable (i.e. a graph including the minimum, maximum, median, and lower and upper quartiles), and create histograms showing the distribution of each of your variables.
iv) Check that each of your three composites is is reliable using Cronbach's alpha and interpret your results. Note, we will discuss the concept of reliability on Tuesday 8 th , and a video will be uploaded following this class.
v) This is a question about associations between variables. Please examine the correlation between each of your three composite variables and the reported level of job satisfaction, labelled Satisfaction. Choose the most appropriate correlation coefficient, and interpret it.
vi) This is a question about differences between subgroups of respondents. Examine whether there is any difference in scores on your composites for people who are considered High Potentials and people who are not considered High Potentials. Create an appropriate graph that illustrates your results.
vi) Your colleagues are considering follow-up qualitative research interviews that they say will give a richer perspective on how data science skills have changed for segments. What ethical considerations should they factor into their thinking about a proposed research design?
Attachment:- Data Sci Evolution.rar