STAT6001 Data Wrangling and Visualisation Assignment

Assignment Help Basic Statistics

Reference no: EM132634293

STAT6001 Data Wrangling and Visualisation - University of Newcastle

Section A - Space Race (all launches since 1957)

For Section A of this assignment you will use Excel and/or PowerBI to prepare the dataset "A1A space race.csv" and to create visualisations that help answer questions about the data. The dataset was sourced from kaggle which was scraped from and contains data on all space missions since 1957.

Question 1

a) Create a variable for Country based on the launch location. Document any decisions you make regarding the country of any launches conducted at sea or on islands.

Show a table of the total number of launches by country. Which two countries have the highest number of launches?

b) Create a line graph showing the number of launches per year since 1957. According to the graph, what year was the peak?

c) Filter the data to launches in the USA only. Is there any seasonal trend in the timing of launches throughout the year?

d) Create a graph that shows the status of rockets and a graph that shows the status of missions. What proportion of rockets are active and what proportion of missions have been successful?

e) Create a table that shows the number of and total cost of rocket launches by country. Which are the two countries that have spent the most on rocket launches? What issues are there with this comparison?

Question 2

a) Dichotomise mission status into "Successful" and "Not successful". Create a stacked bar chart with heights set to 100% that shows the mission success rates of Russia and the USA.

b) Compare the annual number of launches over time for Russia and the USA. What periods of high activity and/or trend(s) do you see in terms of mission launches for the two nations?

Hint: the time period of the ‘space race' is generally considered to be 1955-1975 and the Cold War between US and Soviet union spanned from 1947 to 1991.

c) Collapse "Russia" and "Kazakhstan" into a single category called "USSR/Russia". How would this affect the results of previous parts of Questions 1 and 2?

Section B - Earthquakes 1965-2016

Import the dataset ‘A1B earthquakes.csv' into SAS to answer the following questions. The dataset contains the date, time, location, size, and source of significant earthquakes (magnitude 5.5 or higher) recorded by seismograph networks between 1965 and 2016. The data were recorded by the National Earthquake Information Center (NEIC) and made available online by the United States Geological Survey (USGS).

Description of variables
• Latitude - number of degrees north or south of the equator (negative values for southern hemisphere, positive values for northern hemisphere), -90 to +90
• Longitude - number of degrees east or west of the prime meridian (negative values indicate west, positive values indicate east), -180 to +180
• Type - type of seismic event
• Depth - in kilometres, vertical distance below mean sea level
• Depth seismic stations - number of seismic stations that supplied data for the depth measurement
• Magnitude - best available estimate of the size of the seismic event at its source, measured on a (base 10) logarithmic scale
• Magnitude type - algorithm type used to calculated magnitude
• Magnitude seismic stations - number of seismic stations that supplied data for the magnitude measurement
• Azimuthal gap - in degrees (0-360), gap between seismic stations. Larger values indicate higher uncertainty in depth and location measurements
• Horizontal distance - in kilometres, indicates uncertainty in the horizontal location measurement
• Status - indicates whether the event has been reviewed for validity by a human or automatically processed by the system.

Questions
For each question part your answer should only include necessary SAS output (tables, graphs). You should include brief sections of SAS code.

Question 1
a) Explore the variables in the dataset and complete the table below.

For each variable in the table, list the type (e.g., continuous, discrete, ordinal, categorical) and the number of rows missing an entry for that variable. If the variable is categorical or ordinal list the number of levels; if the variable is continuous or discrete list the minimum and maximum values.

Variable	Variable type	N levels (if categorical)	Min, Max (if numeric)	N missing
Latitude
Longitude
Type
Depth
Depth seismic stations
Magnitude
Magnitude_type
Magnitude seismic stations
Azimuthal_gap
Horizontal_distance
Status

b) Are there any range errors for the numeric variables? Explain why/why not.

c) Use an appropriate graph and summary statistics to describe the distribution of magnitude.

d) Create a formatted numeric variable that categorises magnitude according to the following classes:

Show your SAS code and a frequency table of magnitude class.

e) Examine the distribution of depth using a histogram.

The depth of earthquakes can be categorised into three zones. Shallow earthquakes are between 0 and 70km deep; intermediate earthquakes, 70-300 km deep; and deep earthquakes, 300-700 km deep.

Create a formatted numeric variable that categorises depth for earthquakes only (not other seismic events that are recorded in the dataset). Show your SAS code and a frequency table of depth zone.

What proportion of earthquakes occur in the Deep zone?

f) Examine the relationship between depth zone and magnitude class for Earthquakes using a contingency table.

Does magnitude differ by depth zone? Use appropriate summary table(s) and graph(s) to support your conclusion.

Question 2 - Own question

Propose your own question that can be answered by this dataset and investigate the answer using tables and/or charts. Write a summary of your findings (approximately 100-200 words).

For example, you might like to investigate one of these topics:
• Create maps in PowerBI showing the location of earthquakes. Show the depth zones and then the magnitude classes.
• Has the annual number of earthquakes changed over time? What about the average magnitude?
• What are the characteristics of the events that were not earthquakes?
• Investigate the bump in the tail of the distribution of depth.
• Compare one of the error variables (e.g., azimuthal gap, horizontal distance) by whether or not the measurements were verified by a human.

1. Describes a question or topic of interest as it relates to variables in the dataset.
2. Statements are supported by relevant tables or charts as evidence from the data.
3. Refers to specific quantities (counts, percentages, statistics) as part of written answer.
4. Communicates clearly regarding filtered/grouped data or categories when summarising data or making comparisons.
5. Writes with clarity and organisation using report-style language.

Attachment:- Data Wrangling and Visualisation.rar

Reference no: EM132634293

Questions Cloud

How has your construct been tested in the past : How has your construct been tested in the past? Include a focused review of the literature in this area, discussing information on specific measures.

Big data analytics in e-healthcare industry : There are several benefits as well as challenges associated with the use of Big Data Analytics in the e-Healthcare industry.

Find what was the total manufacturing cost assigned to job : What was the total manufacturing cost assigned to Job 0? Sweeten Company had nojobs in progress at the beginning of March and no beginning inventories.

Which course topics would you have liked to have covered : Which course topics would you have liked to have covered in more depth or have added to this course? What was the most important thing you learned in this.

STAT6001 Data Wrangling and Visualisation Assignment : STAT6001 Data Wrangling and Visualisation Assignment Help and Solution, University of Newcastle - Assessment Writing Service

What is the selling price for job : What is the selling price for Job 408 if the total number of machine- hours in the Assembly Department increases from 3,000 machine-hours to 5,000 machine-hours

Identify the necessary case management roles : Individuals or families who need case management services do so because they have a number of problems. Like Lonnie and Dorothy, they may "get by" for a period.

What was the cost of unused capacity during the month : Prepare an income statement for the month. Your income statement should include the cost of unused capacity as a period expense.

What impact does racism have on student behavior issues : Based on what you have learned from your readings in Comparative Approaches to Program Planning pertaining to the "when" and "which" questions.

User Account

All Pages