What are possible reasons for classification trees failure

Assignment Help Other Subject
Reference no: EM132286065

The dataset attached FlightDelays.csv contains information on all commercial flights departing the Washington, DC area and arriving at New York during January 2004. For each flight, there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.

FlightDelays.csvPreview the document

In R Your Job is To:

Step 1:

Preprocess the Data:

Transform variable day of week (DAY_WEEK) info a categorical variable.

Bin the scheduled departure time into eight bins (in R use function cut()).

Use these and all other columns as predictors (excluding DAY_OF_MONTH).

Partition the data into training and validation sets.

Step 2:

Once you've preprocessed the data, complete the following:

Fit a classification tree to the flight delay variable using all the relevant predictors. Do not include DEP_TIME (actual departure time) in the model because it is unknown at the time of prediction (unless we are generating our predictions of delays after the plane takes off, which is unlikely).

Use a pruned tree with maximum of 8 levels, setting cp = 0.001. Express the resulting tree as a set of rules.

Tell me: If you needed to fly between DCA and EWR on a Monday at 7:00 AM, would you be able to use this tree? What other information would you need? Is it available in practice? What information is redundant? Then:

Fit the same tree as you did initially, this time excluding the Weather predictor. Display both the pruned and unpruned tree. You will find that the pruned tree contains a single terminal node. Then tell me:

  • How is the pruned tree used for classification? (What is the rule for classifying?)
  • To what is this rule equivalent?
  • Examine the unpruned tree. What are the top three predictors according to this tree?
  • Why, technically, does the pruned tree result in a single node?
  • What is the disadvantage of using the top levels of the unpruned tree as opposed to the pruned tree?
  • Compare this general result to that from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree's failure to find a good predictive model?

Attachment:- FlightDelays.rar

Verified Expert

Simple linear regression analysis was used to predict the dependent variable using one independent variable.Initially, Pearson correlation coefficient was computed to determine any significant relationship between these two variables. Once there exists significant relationship, then, we need to proceed further to compute the regression equation which will be further used to predict the dependent variable using the independent variable as known information.

Reference no: EM132286065

Questions Cloud

Determine the amount of us dollars that new york co : Determine the amount of U.S. dollars that New York Co. will need in 2 years to make its payment.
Is one tool preferred more than the others in this industry : Describe a scenario when you might find a use for each of these tools. Is one tool preferred more than the others in this industry?
What is the spot rate of the pres : What is the spot rate of the pres assuming interest rate parity holds?
Why was marcel duchamp approach to art so groundbreaking : For each reading you will be responsible for responding to questions posed by the instructor. Pick two questions out of the three listed.
What are possible reasons for classification trees failure : Compare this general result to that from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree's failure to
Are they still vital tools for law enforcement investigators : Are line-ups, show-ups and the use of photo arrays a thing of the past? Are they still vital tools for law enforcement investigators?
Compute the multifactor productivity : Compute the multifactor productivity (MFP) (labor plus equipment) under the After buying the new equipment
Wap to calculate the total weekly calories for breakfast : A calorie counting program that calculates the total weekly calories for breakfast. The program should allow the user to enter the number of calories consumed.
What is the impact on person-focused pay programs : Discuss a skill set that the employees would have to learn. What is the impact on person-focused pay programs?

Reviews

Write a Review

Other Subject Questions & Answers

  Assume that the environmental forces surrounding operation

Assume that the environmental forces surrounding the operation of (any generic college) dictated the desirability of raising the academic standards of that college. Discuss, using Lewin’s Force-Field Theory of Change: What might the forces be?.

  Develop a two-part team evaluation questionnaire

need to develop a two-part team evaluation questionnaire to administer to team members

  Databases in a business environment

Prepare a 2- to 3-page memorandum analyzing the use of databases in a business environment. Include what database applications should be used.

  Issues with receiving medical care

Issues with receiving medical care at the Veterans Health Administration have been ongoing for some time and remains a policy battle within the government.

  Explain how you would physically arrange your classroom

Discussion: Supportive Literate Environmen- Your essay should include a description on how you would physically arrange your classroom.

  How credible is the source and the author

What is purpose of the source, review article, original research? What topics are covered? This is generally 1 -3 sentences that summarize the author's main point. For more help, see this link on paraphrasing sources.

  Describing cultural and research-based models

Create a 10- to 12- slide Microsoft® PowerPoint® presentation describing cultural, research-based models and how they help clarify the organization's.

  Value of direct claim security is derived

The value of direct claim security is derived from which of the following?

  Research participants were informed that personality test

In a series of studies, research participants were informed that personality test results indicated they were the type likely to end up alone later in life. As a result, they became ________ likely to underperform on aptitude tests and ________ likel..

  Processes of intercultural communication

How does this asymmetrical flow of popular culture forms influence the processes of intercultural communication? Think of some examples.

  An environmental influence that either supports

An environmental influence that either supports or hinders language development is

  Compose a persuasive message to pete pham

Compose a persuasive message to Pete Pham that will convince him to add new trucks to the fleet and to consider you for a management role in the growing business.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd