Find the average price of a listing by room type

Assignment Help Basic Statistics
Reference no: EM132393635

BUS5DWR Data Wrangling and R Assignment - La Trobe University, Australia

The purpose of this assignment is to develop and assess your skills in R programming including summarising, wrangling and plotting data. Using the tidyverse package is recommended but not compulsory. Please read through the entire assignment and understand the submission format and marking rubrics before starting.

Part 1 -

The spreadsheet titled 'sports_and_recreation.xlsx' details the location, types of sport played as well as the condition, age and other details about sports and recreational facilities in Victoria. You will see that it is far from being ready for analysis and needs to be 'wrangled'. Additionally a few errors have been deliberately introduced so these will need to be corrected by initial analysis.

1. Explain why the data in its current form is not considered to be in 'tidy' format.

2. Write R code to read in the data, manipulate it and output the result to a single csv file having the following header row. Each row should provide the details for a single sport within a sports and recreation facility (ignore cases where the sport is unspecified).

facility_ID,facility_name,street_no,street_name,street_type,suburb_town,postcode,LGA,latitude,long itude,sports_played,number_field_courts,field_surface_type,facility_age,facility_condition,facility_up grade_age

Your code will have the following sections (not necessarily in the order given and the process may be iterative as you find more things to do). Please include comments in the code to separate each segment and explain your steps.

a) Read the data into a dataframe (or tibble).

b) Observe the layout of the data and describe any issues you encounter in terms of missing or duplicated data. Make the necessary modification of rows for consistency.

c) Write a function that takes in a facility ID and outputs a dataframe with one or more rows, where each row is specific to the details of a single sport within a facility.

d) Apply the function to each facility name and then combine the rows to form a single dataframe.

e) Split the (latitude, longitude) attribute into two separate columns.

f) Split the address attribute into its components as listed above (note that not all addresses are complete so there may be more than one case to consider).

i. Note that the street number may have two words such as "Lot 1A" but a street number is assumed to have at least one digit character.

ii. Note that the street type may have two words such as "Street North".

iii. Assume the suburb/town attribute is the part of the address in capitals before the postcode.

g) Include the above header row for the dataframe.

h) Do a summary of the dataframe to look for unusual values, then correct them until satisfied with the result.

i) Sort the dataframe by facility_ID, then sports_played.

j) Write the result to a csv file.

3. How many facilities offer softball with at least 5 fields?

Part 2 -

The online hospitality company Airbnb has made publicly available a number of datasets. This part of the assignment makes use of the listings.csv dataset. It consists of a number of parameters related to properties available for lodging in the Melbourne metropolitan area.

Write R code to answer the following.

1. Find the average price of a listing by room type.

2. How many listings contain the following combinations of words (upper or lower case or mixed, either order) in the name column?

a. Spacious, cosy

b. Vibrant, bright

3. How many listings had last review date in the past 14 days up to and including January 6 2018?

4. How many host names (not host ids) are mentioned between 10 and 20 times (inclusive) in the listings?

5. List the five top neighbourhoods sorted in decreasing order by average number of reviews. In a third column show the average value of availability_365 for that neighbourhood.

6. Write a function that has listing id as input and calculates the (direct) distance in kilometres between the location of the listing and the Melbourne CBD with latitude/longitude coordinates (-37.8136o, 144.9631o). Look up and use the haversine formula to perform this using an earth radius of 6370km (you may use an R package). Hence determine the number of listings that are between 5 and 10km from the Melbourne CBD.

7. Suppose somebody wants to choose a listing based on the following criteria. Write one or more functions that inputs a listing id and calculates a score that is the sum of points as below:

a. Points for the location: 50 × (10 minus haversine formula distance in kilometres to the nearest table tennis facility in km) but not less than zero (this uses the sports_played and location attributes in the cleaned dataset from Part 1).

b. Points for price: (300 minus price) but not less than zero.

c. Points for the room type: 100 for Entire home/apt, 50 for Private room, 0 for Shared room.

d. Points for availability: (availability_365) divided by 5.

e. Points by popularity based on the number of reviews: 100 if at least in the top quartile, 0 if less than the median value, 50 otherwise.

Which id has the highest score according to the above system?

Part 3 -

Write a short report describing the two (processed) datasets from parts 1 and 2 through tables and plots with R including the following:

  • A summary of the variables using the summarytools package. Include at least two observations you feel are worth noting.
  • A histogram showing the distribution of the distance of listings from the Melbourne CBD (using the result from Part 2). Use a bin width of 1km.
  • A single map combining a subset of the data from each of the two datasets (e.g. it could be in a single LGA). You will need to use an R package that can map geospatial data.
  • Choose one sport and do a visualisation of your choice including one or more of the following variables from the dataset of part 2.

NumberFieldCourts FieldSurfaceType FacilityAge FacilityCondition FacilityUpgradeAge

Point out any interesting patterns (e.g. trends) you see from your plots.

Attachment:- Data Wrangling Assignment Files.rar

Reference no: EM132393635

Questions Cloud

What responsibilities do you have for your patients : As a health care manager, what responsibilities do you have to uphold these for your patients? Provide an example to illustrate your role and responsibility.
System development life cycle : Now that you assessed the business, identified some of the inefficient HR related processes plaguing Larson Property Management Company,
How you have determined budget associated with project costs : Explain to the management team and your project team how you have determined the budget associated with project costs. How are costs aggregated?
What activities you engage in to gain a better understanding : A potential client wants you to be the project manager for a project that involves moving to a new computerized human resource management system that supports.
Find the average price of a listing by room type : La Trobe University, Australia - BUS5DWR Data Wrangling and R Assignment Help and Solution. Find the average price of a listing by room type
Define recommendation for solving critical resource problem : You have been asked to replace the project manager who was heading up your firm's new compensation and benefits system. One of the reasons the project manager.
Where is the business focusing the majority of its resources : Where is the business focusing the majority of its resources? What advice would you offer to the owner or leadership of the business?
Critical national information infrastructure : It is the policy of the United States to prevent or minimize disruptions to the critical national information infrastructure in order to protect the public,
Snort generates alert every time detection rule is matched : Signature-based intrusion detection cannot identify previously unknown attacks. Snort generates an alert every time a detection rule is matched.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd