Summarize and interpret each composite distribution

Assignment Help Other Subject

Reference no: EM132385487

Project Instructions -

This project will be based on the Data Science Evolution data set (please see tutorial 1 instructions for a description of the data file). However, the data file has been augmented with 2 new variables. The first new variable is 'Satisfaction', which is a rating of agreement with the statement 'I leave work with a sense of achievement each day' rated on a 1 to 5 scale where 1 = 'strongly disagree', 2= 'disagree', 3='neither disagree or agree', 4 = 'agree', 5 = 'strongly agree', and 6 = don't know. The second new variable is 'HiPo', which is whether a respondent has been formally identified as a high potential employee (0 = No, 1 = Yes, 6 = Don't know). Make sure to download the assignment version of this data file DataSciEvolution_A1.csv.

For this assignment, you need to subset your data file. If your student id number ends in 0, 1, 2 or 3, you should analyze the data subset for which industry = 8 (Health), if your student id number ends in 4, 5, or 6, you should analyze the data subset for where industry = 11 (Manufacturing), and if your student id number ends 7, 8, or 9, you should analyze data for where industry = 15 (Retail). Because different students analyze different subsets, your answers will be different to those of other students. Please see this week's instructional video to see how you can subset your data appropriately. Importantly, in this assignment, I am looking to see your substantive interpretation of the statistical results (i.e. your interpretation and conclusions matter as much as the statistical analysis!).

Questions -

i) What 'level of measurement' are the 'data science' variables in this data set (i.e. Extraction, Modeling, Visualization, Statistics, Programming, and Experimentation. How might this impact the analyses you perform?

ii) Undertake data screening and cleaning. Ensure you recode any missing values appropriately, and make sure you examine the patterns of missing data in your analysis, including addressing both 'don't know' and 'missing data' responses. Note, we have a lecture scheduled on missing data analysis on Tuesday 15th, a video will be uploaded ahead of this class.

iii) Create three 'composite scores' by creating an average of the items for each scale. Composite 1 should include the average of the variables Extraction and Modeling, Composite 2 should include the average of the variables Experimentation and Statistics, Composite 3 should include the average of the variables Programming and Visualization.

Summarize and interpret each composite distribution by presenting a box plot (also known as a 5-point summary) for the variable (i.e. a graph including the minimum, maximum, median, and lower and upper quartiles), and create histograms showing the distribution of each of your variables.

iv) Check that each of your three composites is is reliable using Cronbach's alpha and interpret your results. Note, we will discuss the concept of reliability on Tuesday 8th, and a video will be uploaded following this class.

v) This is a question about associations between variables. Please examine the correlation between each of your three composite variables and the reported level of job satisfaction, labelled Satisfaction. Choose the most appropriate correlation coefficient, and interpret it.

vi) This is a question about differences between subgroups of respondents. Examine whether there is any difference in scores on your composites for people who are considered High Potentials and people who are not considered High Potentials. Create an appropriate graph that illustrates your results.

vii) Your colleagues are considering follow-up qualitative research interviews that they say will give a richer perspective on how data science skills have changed for segments. What ethical considerations should they factor into their thinking about a proposed research design?

Attachment:- Assignment Files.rar

Reference no: EM132385487

Questions Cloud

ITC573 Data and Knowledge Engineering Assignment Problem : ITC573 Data and Knowledge Engineering Assignment - Decision Support Systems and Privacy Issues, Assessment Help and Solution - Charles Sturt University

COIT12207 - Internet Applications Assignment Problem : COIT12207 - Internet Applications Assignment Help and Solution, Assessment Help - CQ University, Australia - Creating an online Temperature convertor

Quantitative Research Methods Assignment Problem : Quantitative Research Methods Assignment - Describe the complete sample as well as each experimental condition separately

GHL6017 People Resourcing and Development Assignment : GHL6017 People Resourcing and Development Assignment help and solution Gulf College, Assessment help, Case Study: Changes in HRD at Hansen Group

Summarize and interpret each composite distribution : Summarize and interpret each composite distribution by presenting a box plot (also known as a 5-point summary) for the variable

Demonstrate an advanced knowledge of stored procedures : Demonstrate an advanced knowledge of stored procedures, stored functions and triggers, Display the number of patients referred to by each medical practitioner

Write a report to management identifying any issues : Write a brief description on the Company, identifying the industry they were in, their turnover, how long they had been in business and history of organisation

ACCT5391C Implement an insolvency program Assignment : FNSACC611 - ACCT5391C Implement an insolvency program Assignment help and solution, RMIT University, Assessment help - Who obtained the Supreme Court order

HRM09103 Organisational Change Management Assignment : HRM09103 Organisational Change Management Assignment help and solution, Edinburgh Napier University, Assessment help - evaluate the use of motivation theories.

User Account

All Pages