What are possible reasons for classification trees failure

Assignment Help Other Subject
Reference no: EM132286065

The dataset attached FlightDelays.csv contains information on all commercial flights departing the Washington, DC area and arriving at New York during January 2004. For each flight, there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.

FlightDelays.csvPreview the document

In R Your Job is To:

Step 1:

Preprocess the Data:

Transform variable day of week (DAY_WEEK) info a categorical variable.

Bin the scheduled departure time into eight bins (in R use function cut()).

Use these and all other columns as predictors (excluding DAY_OF_MONTH).

Partition the data into training and validation sets.

Step 2:

Once you've preprocessed the data, complete the following:

Fit a classification tree to the flight delay variable using all the relevant predictors. Do not include DEP_TIME (actual departure time) in the model because it is unknown at the time of prediction (unless we are generating our predictions of delays after the plane takes off, which is unlikely).

Use a pruned tree with maximum of 8 levels, setting cp = 0.001. Express the resulting tree as a set of rules.

Tell me: If you needed to fly between DCA and EWR on a Monday at 7:00 AM, would you be able to use this tree? What other information would you need? Is it available in practice? What information is redundant? Then:

Fit the same tree as you did initially, this time excluding the Weather predictor. Display both the pruned and unpruned tree. You will find that the pruned tree contains a single terminal node. Then tell me:

  • How is the pruned tree used for classification? (What is the rule for classifying?)
  • To what is this rule equivalent?
  • Examine the unpruned tree. What are the top three predictors according to this tree?
  • Why, technically, does the pruned tree result in a single node?
  • What is the disadvantage of using the top levels of the unpruned tree as opposed to the pruned tree?
  • Compare this general result to that from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree's failure to find a good predictive model?

Attachment:- FlightDelays.rar

Verified Expert

Simple linear regression analysis was used to predict the dependent variable using one independent variable.Initially, Pearson correlation coefficient was computed to determine any significant relationship between these two variables. Once there exists significant relationship, then, we need to proceed further to compute the regression equation which will be further used to predict the dependent variable using the independent variable as known information.

Reference no: EM132286065

Questions Cloud

Determine the amount of us dollars that new york co : Determine the amount of U.S. dollars that New York Co. will need in 2 years to make its payment.
Is one tool preferred more than the others in this industry : Describe a scenario when you might find a use for each of these tools. Is one tool preferred more than the others in this industry?
What is the spot rate of the pres : What is the spot rate of the pres assuming interest rate parity holds?
Why was marcel duchamp approach to art so groundbreaking : For each reading you will be responsible for responding to questions posed by the instructor. Pick two questions out of the three listed.
What are possible reasons for classification trees failure : Compare this general result to that from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree's failure to
Are they still vital tools for law enforcement investigators : Are line-ups, show-ups and the use of photo arrays a thing of the past? Are they still vital tools for law enforcement investigators?
Compute the multifactor productivity : Compute the multifactor productivity (MFP) (labor plus equipment) under the After buying the new equipment
Wap to calculate the total weekly calories for breakfast : A calorie counting program that calculates the total weekly calories for breakfast. The program should allow the user to enter the number of calories consumed.
What is the impact on person-focused pay programs : Discuss a skill set that the employees would have to learn. What is the impact on person-focused pay programs?

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd