CS 5834 Intro to Urban Computing- Assignment Problem

Assignment Help Computer Engineering
Reference no: EM132378054

CS 5834 : Intro to Urban Computing

NYC Taxi Data Analysis and Modeling

In this homework, you will process the taxi data collected from New York city, use regression models to predict the trip fare amount, and use different classification models to predict whether the tip fare was less than 20% or more than that.

Problem 1. Download and process data.

1. The NYC taxi data can be found

In this data, the yellow and green taxi trip records include fields capturing pick- up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. You are free to choose a trip sheet (in csv format) of the yellow taxi in any month of 2017 for your homework.

2. Randomly sample 10,000 trip records to solve the Problems 2 and 3.

3. Create a dataset with the following attributes:

a. VendorID

b. Day/night. Please convert the ‘tpep_pickup_datetime' to day (for 1) or night (for 0).

c. Passenger_count

d. Trip_distance

e. PULocationID

f. DOLocationID

g. Payment_type

h. Payment_type_cat1,Payment_type_cat2,....

Note: Please convert the ‘payment_type' to dummy variables.

i. Fare_amount

j. Tip_amount

k. Tip_rate_20.

First, calculate ‘tip_rate' with ‘tip_rate'=Tip_amount'/'Fare_amount'. Second, if ‘tip_rate' < 0.2, set ‘Tip_rate_20' = 0, otherwise, set it to 1.

4. Save the dataset as a CSV file. The first line of the CSV file should be the attribute names described in the last question.

5. Plot the distribution of the Fare_amounts and Tip_amounts

Problem 2. Trip fare amount prediction

1. Build a linear regression model to predict the trip fare amount. You are free to use packages like sklearn or write your own codes.

a. Here is a link to the linear regression module of sklearn package.

b. Use attributes b, c, d, e, f, h as input features and attribute i as the output.

c. Your model should be evaluated with the 5-fold cross-validation and you have to report the averaged mean-squared-error (MSE) and standard deviation. You can use this link to calculate MSE.

2. Similarly, build a KNN regression model to predict the trip fare amount. The model should be evaluated with the 5-fold cross-validation. In each fold, 80% of the data should be used for training and 20% for testing.

You must choose the optimal value of K in between 1 and 10 based on half of the testing data, then calculate the MSE on another half of testing data with the best K. At last, report the averaged MSE and standard deviation.

3. Compare the results of the two models.

Problem 3. Tip rate classification.

Sample 1000 trip records from your data, and solve the following problems.

1. Use KNN model to predict the Tip_rate_20.

a. Set K in KNN to 5.

b. Use attribute b, c, d, h as input features.

c. Use attribute k as class labels.

d. Use Euclidean distance.

e. Run 5-fold cross validation to evaluate your model.

f. Report precision, recall and F-score of the classification.

g. Please follow this link to KNN in sklearn:

2. Use Decision Tree to predict the Tip_rate_20.

a. Build decision tree with attribute b, c, d, g.

b. Use attribute k as class labels.

c. Use 5-fold cross-validation to evaluate your model.

d. Report precision, recall and F-score of the classification.

e. Here is the link to the Decision Tree in sklearn package

Problem 4. Subway Services

Suppose you are the CTO for WMATA and are looking to improve your services. If you are not familiar with WMATA, they run the metro system in the greater Washington DC area.

Every traveler buys a metro card and then uses it on automated fare collection systems while both entering and exiting stations. Many hotels and online travel websites also sell the metro card (apart from sales at stations). A major problem you need to solve is to differentiate between tourist trips and normal commuters in your system.

1. Given your knowledge of ML, can you pose this is as one of the tasks we have seen before in class? Make sure you clearly describe how you will create your dataset and justify why your setup makes sense.

2. Will your answer change in anyway if WMATA collected the fare directly at the entry point only (so no card swipe at exit)?

3. Finally, assuming you have built this ML model to differentiate these commuters, how can you use your knowledge for improving the user experience?

There is no one ‘right' answer for the questions above; we are looking to see if you can design well and reason about your choices/responses. Please try to keep your answer brief (3-4 lines) for each question.

Attachment:- NYC Taxi Data Analysis and Modeling.rar

Reference no: EM132378054

Questions Cloud

What is critical thinking and do you believed : What is critical thinking and do you believed the payoff of critical thinking is the effort?
How is this proclamation problematic for marketers : Is this possible for marketers? Is it even desirable? How is this proclamation problematic for marketers?
What you found the most interesting : What you found the most interesting in Dr. Kotler's marketing talk?
Which theory or theories provide sound counter-arguments : Explain in detail which of the general theories in the chapter characterize your viewpoints on Free Will and/or Determinism: libertarianism, indeterminism.
CS 5834 Intro to Urban Computing- Assignment Problem : Virginia Polytechnic Institute and State University-US-CS 5834 Intro to Urban Computing Assignment Help and Solutions, Compare the results of the two models.
MAN6905 Databases and Business Intelligence : MAN6905 Databases and Business Intelligence Assignment help and solution, Edith Cowan University, Assessment help - what sales and marketing system is required
How is your organization impacted by demand conditions : Demand Conditions: How is your organization impacted by demand conditions? In other words, how is your company developed compared to other competitors?
Differences between a cash flow hedge and a fair value hedge : Define and differentiate the differences between a cash flow hedge and a fair value hedge, including when (in or under which particular or specific).
Interpret how to plan and execute search engine : Interpret how to plan and execute search engine related marketing strategy. Compare how different social media channels contribute to meeting marketing.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd