Generate a histogram for height of the whole sample

Assignment Help Applied Statistics
Reference no: EM132275636

Introduction to Biomedical Data Analysis Assignment -

Part A: Orientation and Introduction to SPSS

Exercise 1 - Exploring SPSS & Preparation of Data Files

Questions -

1. The video tutorial will talk through aspects of important columns in variable view, note them below:

a. Name

b. Type

c. Label

d. Values

e. Measure

2. Enter Data View and scroll down the columns, you will notice that some data is missing. Name TWO reasons this may occur.

3. How many different variables are there in this dataset?

4. How many people have been surveyed in this dataset?

5. For the following variables, decide whether they are quantitative (numerical) or qualitative (categorical). If they are categorical, list the different categories in terms of their value and label (e.g. smokat2: 0 - Non-smoker, 1 - Smoker).

a. Gender

b. Heart Rate

c. Age

d. Eye colour

Part B: Visualising and Describing Data

Exercise 1 - Displaying Data (Histograms)

Q. Generate a histogram for height of the whole sample. Consider this distribution. Describe at least FOUR things you can discern from the data.

Q. What do you notice about the axes for these histograms? (Hint: look at the range of the vertical axes)

Q. What do you notice about the shape of each of these histograms? Is there a difference in the spread of the data? Where is most of the data clustered? Comment on whether you can observe a gender effect for height within the histogram.

Exercise 2 - Displaying Data (Bar Charts)

Q. Generate bar charts for the following:

a. Simple bar chart for frequency of height.

b. Simple bar chart for mean height of males and females.

c. Clustered bar chart for mean height of males and females, clustered by hair colour.

Exercise 3 - Displaying Data (Boxplots)

Q. Determine if there are differences between genders by generating boxplots for height and weight as separated by gender (that is, there will be one figure produced - with two boxplots). Fill in the table (attached) using the output produced.

Exercise 4 - Describing Data (Descriptive Statistics)

Q. Generate (via Descriptives) the options as pictured in the screenshot adjacent for the variables age, heart_rate, alco_drinks_wk, height and weight.

a. Examine the minimum and maximum values - are any absurd values present that might indicate errors?

b. How tall was the tallest student, and how short was the shortest, out of how many total students?

c. Are there any extreme values? Comment on any values that you consider extreme. What should you do with such values?

Q. Using the information from the table above, generate the five number summary for male and female height using one of the built-in features of SPSS (that is, you shouldn't have to estimate/guess, or use multiple features - just use one SPSS functionality!). Note: the 'Statistic' row is for you to label which of the 5-number summary statistics you're writing (e.g. minimum etc.).

Q. Do these numbers correspond to your estimations from Exercise 3?

Part C: Data Management and Manipulation

Q. What difference(s) do you note between the output generated here (using split file), and that generated using panelling in Exercise 1 of Part B?

Ensure both histograms have the same scale (horizontal and vertical axes same minimum and maximum, same ticks etc). Adjust the scale if they're different (see instructions below). Comment on similarities and differences between male and female height.

Exercise 1 - Recoding age into a categorical variable

Q. Assign the appropriate value labels in the Variable View, and select the appropriate variable type (ordinal). Next, produce a bar chart to illustrate the proportion (%) of students that fall into each of these age categories, and write a brief statement about what this bar chart tells you.

Exercise 2 - Calculating BMI from height and weight (Compute function)

Q. Why do we need to divide height by 100 in the numeric expression? (see screenshot above if necessary)

Exercise 3 - Examining female-only data for weight and drinking habits (Select Cases)

Q. What numeric value have females been assigned/encoded as in this dataset?

Q. Analyse the relationship between the number of alcoholic drinks consumed per week [alco_drinks_wk] and smoking status (smokers and non-smokers) [smoke_cat], but only in females, using a boxplot.

Exercise 4 - Assumption testing for male and female height

a. Use the Explore function to assess whether height is normally distributed for males and females.

b. Check the assumption of homogeneity of variance, to ensure that the variances are approximately equal between males and females in relation to height. 

SPSS Analysis Assignment - Part 1

This assignment consists of a series of tasks that generally should be completed sequentially. You will need to use the template document provided to compile your assignment into a single document; hereafter, this document is referred to as your "assignment submission".

SPSS ASSIGNMENT PART 1 - TASKS -

Task 1 - Data Preparation

1. Open the dataset called SPSS_Assign_Pt1.sav in SPSS.

2. Go to the Variable View. You will see that the variable names are given, but no variable labels or value labels for categorical variables have been assigned.

3. Enter the variable and/or value labels as per the table in the Information/Background section. These variable/value label assignments must be used in all further analyses.

4. Check that the Measures (or variable types; namely, Scale, Nominal or Ordinal) are correct. If necessary, change these to the correct setting.

5. To demonstrate that you have completed this task correctly, post a screenshot of your Variable View in your task submission (under the Task 1 heading in the Template document). Your screenshot should show the full list of variables, and all columns of the Variable View. It should be readily readable by the markers (check the resolution and size of the screenshot at 100% zoom).

Task 2 - Describing Your Dataset

1. Replace the [.....] fields below with the correct values/quantities, then post your completed sentences:

There are [.....] variables in the dataset. Of these, [.....] are quantitative and [.....] are qualitative.

There are [.....] countries included in the study.

2. Please bold and/or change the colour of the numbers you added.

Hint: the sum of the quantitative and qualitative variables should match the total number of variables in the dataset.

Task 3 - Generating and Interpreting Boxplots

1. Generate a simple boxplot for life expectancy in 1977 (lifeExp77) in each of the categories of the Hemisphere variable (hemisphere), with any outliers/extreme outliers labelled by their Country ID.

Note: you should generate a single image, not a separate graph/image for each of the categories of the hemisphere variable.

2. Paste this boxplot image in your assignment submission.

3. Identify any outlier(s) and/or extreme outlier(s) present: describe in your post how you know these are outliers and/or extreme outliers, and list the Country ID number(s) (not Case Number) of these outliers or extreme outlier(s) you identified.

4. Briefly describe in no more than one to two sentences some of the key observations you can make about these data, based on this boxplot graph (note any symmetry, skewness, any descriptive statistics you can estimate by eye, perhaps comparing these between the two groups etc).

Task 4 - Generating the Five-number Summary

In Task 3, you generated a simple boxplot for life expectancy in 1977 in each category of the hemisphere variable.

For this task, you need to use SPSS to generate the five-number summary (i.e. the minimum, maximum, median, first quartile and third quartile) in a single table for life expectancy of countries in the Northern hemisphere.

Post the table generated in your assignment submission.

Task 5 - Generating Histograms (Part I)

Use the Graphs menu (not the Explore option) to generate histograms of lifeExp77 across each of the categories of hemisphere. For this question, a single image should be produced, with the histograms shown as a single column of images, over two rows (not columns).

Insert this image into your assignment submission.

Task 6 - Generating Histograms (Part II)

Here, you will again produce histograms of lifeExp77 across each of the categories of hemisphere. However, for this task, you must use the Split File option to produce separate histogram images for each of the categories being compared.

Post your images of the Northern hemisphere and Southern hemisphere histograms produced. Comment on the following (referring to each group individually, and then compare/contrast across the two groups):

  • Measure(s) of central tendency given on each graph (what is the type of measure of central tendency provided? And what is its value?) - does this match the measure of central tendency shown in the boxplots from task 3? (It may or may not match - just mention whether they are similar or not).
  • Measure(s) of variability given (what is the type measure of variability provided? And what is its value?).
  • Sample size given.
  • Observed symmetry (or lack thereof) - does this match what you were expecting from your boxplots in task 3?
  • Observed outliers (or lack thereof) - does this match what you were expecting from your boxplots in task 3?
  • Observed mode(s) - exact value not necessary - range of values is acceptable.

Task 7 - Converting a Quantitative Variable to a Qualitative Variable

For this task, you must first convert the lifeExp77 variable into a new categorical age variable (lifeExp_cat), which converts the lifeExp data into two categories:

  • Life expectancy up to and including 60 years old should be reassigned as a '0'
  • Life expectancy greater than 60 years should be recoded into '1'

Appropriate variable labels and value labels should be assigned to this new variable.

In your post, succinctly describe the method you used to complete this task (dot points are fine). Screenshots (maximum of 2) should be used and included in your submission to show the settings/options that you used to generate this new categorical variable, as well as where/how/what you assigned as value labels.

Also, produce a frequency table that shows the number of countries in each of these life expectancy groups. Include this frequency table in your assignment submission.

Task 8 - Generating Bar Charts

Using the lifeExp_cat variable you generated in Task 7, produce a clustered bar chart that demonstrates the percentage of countries in the two life expectancy groups for quality of life; that is, the data should be clustered by Human Development Index (HDI).

Insert this clustered bar chart into your assignment submission, then comment on any trends you observe regarding HDI across the two life expectancy groups.

Bonus mark: receive an extra half a mark if you can correctly add data value labels to show the actual percentage (to one decimal place) represented by each bar of only the "low" Human Development Index bars (Note: maximum mark for overall assignment cannot be exceeded by correctly completing this).

Task 9 - Computing a Ratio from Two Quantitative Variables

Create a variable for the rate of population growth (population in 2007 as a ratio from the population in 1977), with variable name popgrowth, computed from the pop77 and pop07 variables. This ratio should represent the fold-change in population over time.

Produce a table that provides at least the minimum, maximum, mean, standard deviation and sample size for your new variable popgrowth. Include this table in your task submission.

Also answer the following (maximum of 3 words necessary for each):

1. If data is missing for 1977 (but not 2007), what value does SPSS generate/calculate for the ratio?

2. If data is missing for 2007 (but not 1977), what value does SPSS generate/calculate for the ratio?

3. If data is missing for both 1977 and 2007, what value does SPSS generate/calculate for the ratio?

4. If 1977 is recorded as zero (instead of missing) for a country (but a non-zero value is present for 2007), what value does SPSS generate/calculate for the ratio?

5. If 2007 is recorded as zero (instead of missing) for a country (but a non-zero value is present for 1977), what value does SPSS generate/calculate for the ratio?

6. If both 1977 and 2007 are recorded as zero (instead of missing) for a country, what value does SPSS generate/calculate for the ratio?

Reflection (you do not need to include these reflections in your assignment submission): based on your answers above, think about what this means in terms of data entry. Consider the implications of entering unknown data (e.g. where a survey respondent doesn't answer a question) as a zero versus if it's not entered at all. Also consider what the presence of a zero (versus having a missing value) would do to statistics like the mean).

Before proceeding to the next task, appropriately label this new Population Growth Ratio variable.

Task 10 - Assumption Testing

Use the Explore function to test the assumptions of Homogeneity of Variance, and normality (graphically and numerically) for the life expectancy in 2007 between the Northern and Southern hemisphere.

Post the following to your assignment submission:

  • Tests of Normality table
  • Test of Homogeneity of Variance Table
  • Histograms (with normal curve overlayed)
  • Trended Q-Q plots

If necessary, convert p-values less than 0.001 into scientific notation (do this in SPSS; do NOT do it manually).

Finally, comment on each table/graph as they relate to the assumptions that are being tested (e.g. describe if the image suggests the data is normally distributed, and why) and then come to a conclusion about whether the assumption of homogeneity of variance and the assumption of normality (conclusion for each) have been met or violated overall (when considering all the evidence from these output). Note: each of these assumptions are independent of the other. One can be violated when the other is not violated, and vice-versa.

Task 11 - Submit Your Dataset and Output Files

1. Save your assignment file as a PDF document with the file name format as follows: LastNameFirstName_sNumber_Part1.pdf e.g. FozzardNikki_s1234567_Part1.pdf

2. Save any/all changes you've made to your dataset (.sav file).

3. Save your output file(s). If you have more than one output file, ensure each is labelled with the tasks they cover (e.g. "Output Task 1", "Output Task 1-5")

For this task 11, you should end up with at least 3 files to submit: your assignment document (in PDF format), your dataset file, and your output file(s).

Note: these data and output files may not be assigned marks based on their content - you are submitting them in case we need to verify any of your work. There will however be marks assigned for completing this task as instructed.

Attachment:- Assignment Files.rar

Reference no: EM132275636

Questions Cloud

Determine the primary key of r : Given Relation X = (D0,D1,D2,D3,D4,D5,D6,D7,D8,D9) and given these functional dependencies:
Comparing categories and distributions of quantities values : The chart you select to represent your data will be influenced by many factors. Kirk (2016) has put each chart into the five main families below.
Verification and validation in qa testing : What is meant by Verification and Validation in QA testing?
Memory in a von neumann architecture : Is secondary storage often equated with the main memory in a von Neumann architecture? Why or why not?
Generate a histogram for height of the whole sample : 1002MSC Introduction to Biomedical Data Analysis Assignment, Griffith University, Australia. Generate a histogram for height of the whole sample
Print out the values of that loop using the print function : In Python, create a program that meets the following requirements: Print out the values of that loop using the Print function in Python.
Decision about which interior gateway protocol : Prove this statement wrong: When making a decision about which Interior Gateway Protocol (IGP) to use in an enterprise network environment
Assess the use of the attacker methodology : Find an instance of a cyber crime which was committed by "Organized Crime" and assess the use of the Attacker Methodology.
Discuss the role of reviews and inspections : Discuss the role of reviews and inspections during the front-end of the software development life cycle. Under what circumstances

Reviews

len2275636

4/5/2019 1:35:13 AM

Please read the student's comment. Start With Module 2 Spss Workbook It Contains The Majority Of The Assignment. This assignment must be completed individually. In discussing the assignment with your peers (which should be kept to a minimum), your enquiries should be general in nature (about SPSS and not about the assignment questions) and should never reveal the solution to the assignment tasks. No output is to be shared between students either. Failure to comply with these guidelines is a breach of the University’s Academic Integrity policies and will result in further academic penalty according to the Student Academic Misconduct Policy.

len2275636

4/5/2019 1:35:04 AM

Marking Criteria & Feedback - This assignment assesses your ability to use SPSS to correctly generate descriptive and inferential analysis output, as well as the interpretation of this output. The marking criteria used will be based on your ability to: Generate the correct output, based on the instructions provided, Interpret and communicate your understanding of the output, Relate your findings back to the research question or background of the data given, Use statistical terminology correctly.

len2275636

4/5/2019 1:34:54 AM

A small number of marks will also be given to the organisation and presentation of your results, so you should therefore endeavour to make your submissions clear, concise, grammatically correct, and easy to navigate for any reader of your assignment. Your mark will be available via My Marks, and an announcement will be made when this is available. These results will be available prior to the provision of the Part 2 assignment.

len2275636

4/5/2019 1:34:47 AM

Some specific expectations: p-values less than 0.001 should be changed to scientific notation. All variables should have correctly defined variable labels and value labels (with units, if necessary). Axes on graphs should be labelled correctly, with variable labels (not variable names) and units. Images should be resized so that they are clear to a reader of your assignment. Tables should be inserted as images, and not simply copy-pasted (unless explicitly stated otherwise). The instructions that follow have been designed to be very specific about what is required to complete the task. Ensure you read the tasks very carefully.

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd