Display the column names from the loan data set

Assignment Help Computer Networking
Reference no: EM131016595

Objectives of this project

Use Random Forests, Neural Networks and Support Vector Machines to predict loan status (default or not).

Understand the difference between in-sample fitting and out-of-sample predictive performance.

Use two cross-validation methods to assess analytic model performance.

1) Load the Loan.csv data set into R. It lists the outcome of 850 loans. The data variables include loan status, credit grade (from excellent to poor), loan amount, loan age (in months), borrower's interest rate and the debt to income ratio. Code loan status as a binary outcome (0 for current loans, 1 for late or default loans). Display the column names from the loan data set. Fit the loan data set using random forest function. Copy the trained random forest model and the confusion matrix from R and paste it below.

2) Randomly select 750 out of 850 loans as your training sample. Use the remaining 100 loans as your test set. Train the 2nd random forest model using the training set. Apply the 2nd model to the test set to predict loan status. Compare your predictions to the true loan statuses (using table function). Display the confusion matrix below. Based on this confusion matrix, what's the overall misclassification rate? [10 points]

3) Fit the loan data set using an artificial neural network. Use six neurons in the hidden layer of the ANN. Set maxit to 1000. Use table function to compare in-sample predictions to the true loan statuses. Display the confusion matrix below.

4) Use the training sample (750 randomly selected loans) to build the 2nd artificial neural network. Use six neurons in the hidden layer of the ANN. Set maxit to 1000. Use table function to compare out-of-sample predictions to the true loan statuses (use the remaining 100 loans as your test set). Display the confusion matrix below.

5) Use the training sample (750 randomly selected loans) to build a model of support vector machine. Use table function to compare the SVM's out-of-sample predictions to the true loan statuses (use the remaining 100 loans as your test set). Display the confusion matrix below.

6) Randomly shuffle the loan data set. Run 10-fold cross-validation to evaluate the out-of-sample performance of Random Forest, ANN and SVM. Based on your cross-validation results, which model has the best out-of-sample performance? Please briefly explain why.

7) Run leave-one-out cross-validation to evaluate the performance of random forest algorithm in predicting loan status. Why does it take much longer to run leave-one-out cross-validation than to run ten-fold cross-validation? Based on the result of your leave-one-out cross-validation, how many loans are misclassified by the random forest model?

Attachment:- Loan.csv

Reference no: EM131016595

Questions Cloud

How might a global capital market function differently : How might a global capital market function differently from the present-day international market? (hint: some factors to consider are interest rates, currencies, regulations, and financial crisis for some countries)
Which of the following is not an example of demonstrative : Which of the following is not an example of demonstrative evidence
Advantage of the discount : Wild Inc. receives a bill from Easton Inc. for $10,000. Easton has credit terms of 4/10, net 30. If Wild takes advantage of the discount, how much cash do they pay to Easton?
Discuss the purpose or intent of the journal article : Discuss the purpose or intent of the journal article - Identify and briefly describe the research methods used by the authors
Display the column names from the loan data set : Randomly shuffle the loan data set. Run 10-fold cross-validation to evaluate the out-of-sample performance of Random Forest, ANN and SVM. Based on your cross-validation results, which model has the best out-of-sample performance? Please briefly ex..
What is the prevalence of colon cancer : Set up and fill in the two by two table using these data. What is the prevalence of colon cancer in the study population? Compare the cumulative incidence of mortality in the optimistic group to the cumulative incidence of mortality in the pessimist..
Calculate the npv and irr of the project : The opportunity cost of capital for Williams Corp. is 10.2% and their marginal income tax rate is 35%. Calculate the NPV and IRR of the project. Should Williams invest in the new plant?
Which items might substitute for another item : Explain the characteristics for each phase of the product's life cycle. How can each phase of the life cycle impact the product's brand? You must use the text and at least one additional scholarly source.
Examine family relationships and communication patterns : Examine the impact of functional patterns and role structure on the family. Examine family relationships and communication patterns. Identify influences on family health promotion

Reviews

Write a Review

Computer Networking Questions & Answers

  Explain a peer-to-peer network verses client-server network

Explain a peer-to-peer network verses a client-server network. Because your system is strictly Windows, your document should concentrate on the networking features of Windows Server

  1 write a six to eight 6-8 page wireless deployment plan

1. write a six to eight 6-8 page wireless deployment plan for an education institution with a single campus but with

  Evluate the challenges e-business and e-commerce

assess the challenges e-business and e-commerce technologies present to the field of information technology. describe

  Determine the predominant electronic and physical threats

write a 200- to 300-word response to the following each and every questionq1. what are the predominant electronic and

  Computer network i am sorry that i didnt notice you reduced

i am sorry that i didn39393939t notice you reduced fees for me. ltbrgti am willing to pay 50aud ltbrgtbut i need this

  Determine size of address block to request from isp

Determine size of address block must you request from your ISP? How many class C equivalent addresses would you require? How many subnets would you have left over from your allocation?

  Technical means to guarantee data confidentiality

Assess illustrate why technical means to guarantee data confidentiality depend upon trust.

  Describe how transport layer protocols implement

Describe how error checking is handled in the data link layer. Define a check sum and describe how Transport layer protocols implement them to ensure data integrity. Define and describe sequencing.

  Design a small network for a new facility

design a small network for a new facility which has three offices, two conference rooms, a networked printer and a receptionist computer.

  Preparing domain and group structures

The company's IT department has asked you to prioritize recommendations for Windows Server® roles. High-priority roles may be installed immediately

  The company''s use of the web and the internet

Prepare functional specifications for the company's use of the Web and the Internet. Include links to and from other sites in your design. Prepare a list of technological specifications for implementation (i.e., what hardware and software are nece..

  Medical practice offices

Assume that you are asked to set up the IP address plan for a new facility that is a division of a larger, distributed organization. For the purposes of our discussions, let's consider the needs of a mid-sized medical outpatient office with 2-3 lab u..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd