STAT6001 Data Wrangling and Visualisation Assignment

Assignment Help Basic Statistics
Reference no: EM132634293

STAT6001 Data Wrangling and Visualisation - University of Newcastle

Section A - Space Race (all launches since 1957)

For Section A of this assignment you will use Excel and/or PowerBI to prepare the dataset "A1A space race.csv" and to create visualisations that help answer questions about the data. The dataset was sourced from kaggle which was scraped from and contains data on all space missions since 1957.

Question 1

a) Create a variable for Country based on the launch location. Document any decisions you make regarding the country of any launches conducted at sea or on islands.

Show a table of the total number of launches by country. Which two countries have the highest number of launches?

b) Create a line graph showing the number of launches per year since 1957. According to the graph, what year was the peak?

c) Filter the data to launches in the USA only. Is there any seasonal trend in the timing of launches throughout the year?

d) Create a graph that shows the status of rockets and a graph that shows the status of missions. What proportion of rockets are active and what proportion of missions have been successful?

e) Create a table that shows the number of and total cost of rocket launches by country. Which are the two countries that have spent the most on rocket launches? What issues are there with this comparison?

Question 2

a) Dichotomise mission status into "Successful" and "Not successful". Create a stacked bar chart with heights set to 100% that shows the mission success rates of Russia and the USA.

b) Compare the annual number of launches over time for Russia and the USA. What periods of high activity and/or trend(s) do you see in terms of mission launches for the two nations?

Hint: the time period of the ‘space race' is generally considered to be 1955-1975 and the Cold War between US and Soviet union spanned from 1947 to 1991.

c) Collapse "Russia" and "Kazakhstan" into a single category called "USSR/Russia". How would this affect the results of previous parts of Questions 1 and 2?

Section B - Earthquakes 1965-2016

Import the dataset ‘A1B earthquakes.csv' into SAS to answer the following questions. The dataset contains the date, time, location, size, and source of significant earthquakes (magnitude 5.5 or higher) recorded by seismograph networks between 1965 and 2016. The data were recorded by the National Earthquake Information Center (NEIC) and made available online by the United States Geological Survey (USGS).

Description of variables
• Latitude - number of degrees north or south of the equator (negative values for southern hemisphere, positive values for northern hemisphere), -90 to +90
• Longitude - number of degrees east or west of the prime meridian (negative values indicate west, positive values indicate east), -180 to +180
• Type - type of seismic event
• Depth - in kilometres, vertical distance below mean sea level
• Depth seismic stations - number of seismic stations that supplied data for the depth measurement
• Magnitude - best available estimate of the size of the seismic event at its source, measured on a (base 10) logarithmic scale
• Magnitude type - algorithm type used to calculated magnitude
• Magnitude seismic stations - number of seismic stations that supplied data for the magnitude measurement
• Azimuthal gap - in degrees (0-360), gap between seismic stations. Larger values indicate higher uncertainty in depth and location measurements
• Horizontal distance - in kilometres, indicates uncertainty in the horizontal location measurement
• Status - indicates whether the event has been reviewed for validity by a human or automatically processed by the system.

Questions
For each question part your answer should only include necessary SAS output (tables, graphs). You should include brief sections of SAS code.

Question 1
a) Explore the variables in the dataset and complete the table below.

For each variable in the table, list the type (e.g., continuous, discrete, ordinal, categorical) and the number of rows missing an entry for that variable. If the variable is categorical or ordinal list the number of levels; if the variable is continuous or discrete list the minimum and maximum values.

Variable

Variable type

N levels

(if categorical)

Min, Max

(if numeric)

N missing

Latitude

 

 

 

 

Longitude

 

 

 

 

Type

 

 

 

 

Depth

 

 

 

 

Depth seismic

stations

 

 

 

 

Magnitude

 

 

 

 

Magnitude_type

 

 

 

 

Magnitude seismic

stations

 

 

 

 

Azimuthal_gap

 

 

 

 

Horizontal_distance

 

 

 

 

Status

 

 

 

 

b) Are there any range errors for the numeric variables? Explain why/why not.

c) Use an appropriate graph and summary statistics to describe the distribution of magnitude.

d) Create a formatted numeric variable that categorises magnitude according to the following classes:

Show your SAS code and a frequency table of magnitude class.

e) Examine the distribution of depth using a histogram.

The depth of earthquakes can be categorised into three zones. Shallow earthquakes are between 0 and 70km deep; intermediate earthquakes, 70-300 km deep; and deep earthquakes, 300-700 km deep.

Create a formatted numeric variable that categorises depth for earthquakes only (not other seismic events that are recorded in the dataset). Show your SAS code and a frequency table of depth zone.

What proportion of earthquakes occur in the Deep zone?

f) Examine the relationship between depth zone and magnitude class for Earthquakes using a contingency table.

Does magnitude differ by depth zone? Use appropriate summary table(s) and graph(s) to support your conclusion.

Question 2 - Own question

Propose your own question that can be answered by this dataset and investigate the answer using tables and/or charts. Write a summary of your findings (approximately 100-200 words).

For example, you might like to investigate one of these topics:
• Create maps in PowerBI showing the location of earthquakes. Show the depth zones and then the magnitude classes.
• Has the annual number of earthquakes changed over time? What about the average magnitude?
• What are the characteristics of the events that were not earthquakes?
• Investigate the bump in the tail of the distribution of depth.
• Compare one of the error variables (e.g., azimuthal gap, horizontal distance) by whether or not the measurements were verified by a human.

1. Describes a question or topic of interest as it relates to variables in the dataset.
2. Statements are supported by relevant tables or charts as evidence from the data.
3. Refers to specific quantities (counts, percentages, statistics) as part of written answer.
4. Communicates clearly regarding filtered/grouped data or categories when summarising data or making comparisons.
5. Writes with clarity and organisation using report-style language.

Attachment:- Data Wrangling and Visualisation.rar

Reference no: EM132634293

Questions Cloud

How has your construct been tested in the past : How has your construct been tested in the past? Include a focused review of the literature in this area, discussing information on specific measures.
Big data analytics in e-healthcare industry : There are several benefits as well as challenges associated with the use of Big Data Analytics in the e-Healthcare industry.
Find what was the total manufacturing cost assigned to job : What was the total manufacturing cost assigned to Job 0? Sweeten Company had nojobs in progress at the beginning of March and no beginning inventories.
Which course topics would you have liked to have covered : Which course topics would you have liked to have covered in more depth or have added to this course? What was the most important thing you learned in this.
STAT6001 Data Wrangling and Visualisation Assignment : STAT6001 Data Wrangling and Visualisation Assignment Help and Solution, University of Newcastle - Assessment Writing Service
What is the selling price for job : What is the selling price for Job 408 if the total number of machine- hours in the Assembly Department increases from 3,000 machine-hours to 5,000 machine-hours
Identify the necessary case management roles : Individuals or families who need case management services do so because they have a number of problems. Like Lonnie and Dorothy, they may "get by" for a period.
What was the cost of unused capacity during the month : Prepare an income statement for the month. Your income statement should include the cost of unused capacity as a period expense.
What impact does racism have on student behavior issues : Based on what you have learned from your readings in Comparative Approaches to Program Planning pertaining to the "when" and "which" questions.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd