Research credit scoring variables in loan-delinq-train data

Assignment Help Management Information Sys
Reference no: EM131073624

Assignment - Written Practical Report

Assignment learning objectives-

1. Demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes

2. Identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems

3. Comprehend and address complex ethical dilemmas that arise from evidence based decision making and business performance management

4. Demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.

Task 1 is concerned with developing and evaluating a model of key factors for predicting whether customers are likely to forfeit on a loan resulting in a loan delinquency for ACME Bank.

Task 2 is concerned with the key governance issues of security, privacy and ethics in the day to day use of a data warehouse which increasingly will incorporate unstructured big data.

Task 3 is concerned with performance management, and provides you with the opportunity to design and build a visual and interactive crime events dashboard to meet the decision making requirements of the City of San Francisco Police Department's Crime Analysis unit with drill down capability using the Tableau Desktop software.

Task 1-

The goal of Task 1 is to predict the likelihood of a customer forfeiting on a loan for ACME Bank, in other words the likelihood of a loan delinquency Hence the question we are trying to answer is: Is a customer likely to forfeit on their loan and become a loan delinquency for the ACME Bank.

In Task 1 of this Assignment you are required to follow the six step CRISP DM process and make use of the data mining tool RapidMiner to analyse and report on the credit scoring training data set loan-delinq-train.csv and credit scoring test data set loan-delinq-test.csv provided for Assignment. You should refer to the data dictionary for loan-delinq-train.csv (see Table below). In Task 1 and 2 of Assignment 4 you are required to consider all of the business understanding, data understanding, data preparation, modelling, evaluation and deployment phases of the CRISP DM process.

172_Figure.png

Data dictionary for loan-delinq-train data set variables

Variable Name

Description

Type

SeriousDlqin2yrs

Person experienced 90 days past due delinquency or worse

Y/N

 

RevolvingUtilizationOfUnsecuredLines

Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits

 

percentage

age

Age of borrower in years

integer

 

NumberOfTime30-59DaysPastDueNotWorse

Number of times borrower has been 30-59 days past due but no worse in the last 2 years.

 

integer

DebtRatio

Monthly debt payments, alimony,living costs divided by monthy gross income

percentage

MonthlyIncome

Monthly income

real

 

NumberOfOpenCreditLinesAndLoans

Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards)

 

integer

NumberOfTimes90DaysLate

Number of times borrower has been 90 days or more past due.

integer

NumberRealEstateLoansOrLines

Number of mortgage and real estate loans including home equity lines of credit

integer

 

NumberOfTime60-89DaysPastDueNotWorse

Number of times borrower has been 60-89 days past due but no worse in the last 2 years.

 

integer

NumberOfDependents

Number of dependents in family excluding themselves (spouse, children etc.)

integer

a) Research credit scoring variables in the loan-delinq-train data set to determine key factors influencing the likelihood that a customer will forfeit on a loan and become a loan delinquency. This will provide you with a business understanding of the dataset you will be analysing in Assignment Task 1. Identify which variable/s can be omitted from your credit scoring and loan forfeit data mining model and why. Comment on your findings in relation to the key factors likely to indicate that a customer will forfeit on a loan and become a loan delinquency for ACME Bank (about 1000 words).

b) Conduct an exploratory analysis of the loan-delinq-train.csv data set. Are there any missing values, variables with unusual patterns? How are the characteristics of the training data set loan- delinq-train.csv consistent with the test data set loan-delinq-test.csv? Are there any interesting relationships between the potential predictor variables and your target variable SeriousDlqin2yrs? Is a customer likely to forfeit on a loan and become a loan delinquency? (Hint: identify the variables that will allow you to reduce the data set into a smaller subgroup and more parsimonious model). Comment on what key variables in the data set loan-delinq-train.csv might influence differences in the likelihood of loan delinquency occurring for a customer of ACME Bank (About 250 words).

c) Run a decision tree analysis using RapidMiner. Consider what variables you will want to include in this analysis and report on the results. (Hint: Identify what is your target variable and what are your predictor variables?) Comment on the results of your final decision tree model (About 250 words).

d) Run a logistic regression analysis using RapidMiner, Again consider what variables you will want to include in this analysis and report on the results (Note for the logistic regression analysis you will need to use the weka extension and w-logistic operator). (Hint: Identify what is your target variable and what are your predictor variables?) Comment on the results of your final logistic regression model (About 250 words).

e) Based on the results of the Decision Tree analysis and Logistic Regression analysis - What are the key variables and rules for predicting whether a customer of ACME Bank is likely to forfeit on their loan and become a loan delinquency will have true (1) or false (0) outcome? (Hint: with RapidMiner you will need to validate your models on the loan-delinq_train.csv data using a number of validation processes for the two models you have generated previously using decision trees and logistic regression analysis models). Comment on your two predictive models for predicting the likelihood of a customer forfeiting on a loan to a false/positive matrix, and ROC chart (Note: these outputs can be easily obtained from the relevant performance operator in RapidMiner. Comment on the results of your final model (About 250 words).

Overall for Task 1 you need to report on the output of each analysis in sub task activities and briefly comment on the important aspects of each analysis and relevance to bank customer behaviours and propensity to forfeit on a loan and become a loan delinquency (Note: you will find the North text book an invaluable reference for completing the data mining process activities) (about 2000 words overall for Task note we have indicated for each sub task roughly how many words should be provided in a written explanation).

Note the important statistical outputs from your data mining model analyses in RapidMiner should be included as appendices in your Assignment report to provide support your conclusions reached regarding each analysis for Task 1 and are not to be included in the word count

Task 2-

a) Reflecting on the logical data warehouse you designed in Assignment Task 2 you should now consider how you will ensure the governance of this data warehouse which will include unstructured big data. Your discussion should focus on the controls that you would put in place to ensure that there is an appropriate level of security and privacy for the information captured, stored and retrieved for decision making when using the proposed data warehouse (about 1000 words).

b) Discus some of the key ethical concerns for the day to day use of a data warehouse given that increasingly some decision making might become machine-to-machine automated decision making in response to events in a workflow. Identify ways in which governments and legislation are keeping pace with this phenomena of big data and data driven decision making and the lessening of human intervention in this process (about 500 words).

Task 3-

San Francisco Police Department Crime Events Dashboard

San Francisco Police Department are responsible for enforcing law and order in the City of San Francisco. The 13th  most populous city in the United States with a population of over 850,000 in the main city boundary and a population density of over 7,000 people per sq. km. They would like to have a Crime Events dashboard built for the City of San Francisco with the aim of giving them a better understanding of the patterns that are occurring in relation to different crimes across the 10 Police Department districts in the city. In particular, they would like to see if there are any distinct patterns in relation to (1) types of crimes, (2) frequency of each type of crime across each of the 10 Police Department districts from the years 2003 through to 2015 (note that the year 2015 is not complete). This Crime Events dashboard will allow the San Francisco Police Department to manage and coordinate their efforts in catching the perpetrators of these crimes and be more proactive in preventing these crimes from occurring in the first place. The San Francisco Police Department hope that by being able to identify crime hotspots and trends for particular types of crimes across the 10 Police Department districts that they can be proactive and strategic in their efforts and actually reduce the occurrence of crime and make the city a safer place for its residents.

The San Francisco Police Department Crime Analytics Unit want the flexibility to visualize   the frequency that each type of crime is occurring over time across each of the 10 Police Department districts in the City of San Francisco. They want to be able to get a quick overview of the crime data in relation to the category, location and frequency with which each crime is occurring over time and then be able to zoom and filter on particular aspects and then get further details as required. The data has been extracted from the City of San Francisco Police Department crime events data sources for the purposes of this Assignment.

For Task 3 you need to create

(a) A visual dashboard (Crimes Event Dashboard) to satisfy the requirements of the City of San Francisco Police Department 's Crime Analysis Unit to be proactive and strategic in their efforts and actually reduce the occurrence of crime and make the City a safer place for its residents for the following data set (sfpd-crimedata-2003-2015.csv). This dashboard consists of four specified crime analysis reports to be viewed at the City of San Francisco Police Department District levels visually and in terms of the numeric data concerning crime events:

1. Top 10 most frequently committed crimes by year and by Police Department district

2. Top 10 least frequently committed crimes by year and by Police Department district

 3. The most improved Police Department District in terms of crime statistics (frequency of committed crimes) over the last thirteen years

4. A summary of the crime statistics for a given crime, Police Department district in the City of San Francisco for a given year

(b) Note for the challenge part of Assignment Task 3 it is possible to create a geomap representation of this crime data that can be imported and incorporated into your City of San Francisco Police Department Crime Analysis Dashboard but this will require you to have a look at openstreetmap data (https://www.openstreetmap.org/)and capture a visual layered map of the City of San Francisco and determine a way to import this data into Tableau using a format and vendor that Tableau recognizes using another map visualization tool such as Mapbox https://www.mapbox.com).

You should briefly discuss the key findings for each of these reports in your Crimes Event Dashboard

(c) Provide and discuss your rationale (drawing on the relevant literature) that has informed the graphic design and functionality that is provided in your dashboard for the City of San Francisco Police Department Crime Analysis unit, in terms of how it meets their requirements for four specified crime analysis reports (About 1000 words). You will need to submit your Tableau workbook in .twbx format which contains your dashboard as a separate document to your main report for Assignment.

Reference no: EM131073624

Questions Cloud

Market risk premium-what are the reward-to-risk ratios : Stock Y has a beta of 1.10 and an expected return of 15.60 percent. Stock Z has a beta of .70 and an expected return of 10 percent. If the risk-free rate is 4.0 percent and the market risk premium is 9.0 percent, what are the reward-to-risk ratios of..
Performance improvement models : Analyze one performance improvement model, such as plan-do-check-act, rapid cycle improvement, Six Sigma, or benchmarking. In your assignment, discuss the following:
Average return and standard deviation for market and stock : You are given the following information concerning a stock and the market: Returns Year Market Stock 2008 15 % 27 % 2009 14 30 2010 15 6 2011 –14 –24 2012 37 16 2013 15 25 1. Calculate the average return and standard deviation for the market and the ..
What marketing recommendations do you have for the future : Report on the firms current marketing activities. What marketing recommendations do you have for the future?
Research credit scoring variables in loan-delinq-train data : Research credit scoring variables in the loan-delinq-train data set to determine key factors influencing the likelihood that a customer will forfeit on a loan and become a loan delinquency. This will provide you with a business understanding of th..
Using the annuity factors and initial-operating : The company is choosing between machine A and B (they are mutually exclusive and the company can only pick one). The initial cost of machine A is $1,400,000 and it will last for 7 years before it needs to be replaced. Using the annuity factors, find ..
Find the probability of scoring a total : I need these problems solved thank you. I'm completely stuck on these. Please! Submit it to me in .doc 1.81 find the probability of scoring a total of 7 points (a) once, (b) at least once, (c) twice, in 2 tosses of a pair of fair dice.
What was main tactic used by plebeians to force patricians : What was the main tactic used by the plebeians to force the patricians to make political and economic concessions? Which of the following accurately compares Spain and Italy regarding ecclesiastical authority?
How does a cd give information : How does a CD give information? What happens if you scratch the front or the back of a CD?

Reviews

Write a Review

Management Information Sys Questions & Answers

  When choosing a store or business

When choosing a store or business, select a medium or large operation as it will be easier to complete the assignment. Look at the information the store or business uses in their daily business operations and identity four (4) key security risks to t..

  How can negative comments impact a job interview

While in an interview, Tom was asked to describe his professional experience with his current and previous employers. During his descriptions, Tom occasionally included a few negative comments about his current and previous employers. How can nega..

  Information technology - payroll and order entrypayroll and

information technology - payroll and order entrypayroll and order entry are types of transaction processing systems.

  Identify threats and vulnerabilities in it infrastructure

What are the differences between ZeNmap GUI (Nmap) and Nessus?Which scanning application is better for performing a network discovery reconnaissance probing of an IP network infrastructure?

  Comparing the communication strategies of the facebook pages

Comparison Report. Write a 3 to 4 page report comparing the communication strategies of the Facebook pages of two of the following companies: McDonald's, Burger King, J C Penney, Kohl's

  The importance of supply chain management to value

the importance of supply chain management to value deliveryprovide at least three reasons why supply chain management

  End user input in database design1 - to what extent should

end user input in database design1 - to what extent should end users be involved in the design of a database? how have

  Describe your audience purpose and context

Identify the narrative you wish to construct with the data. For example, pulling from World Bank data you could compare life expectancy rates for two nations in order to make a claim. Or, pulling from Acclaimed Music, you could show the rise and f..

  What is the role of a chief security officer

What benefits are associated with centralized governance of IT resources, and how do these differ from those associated with decentralized governance?

  What are the pros and cons of methods such as single sign-on

What are the pros and cons of methods such as Single Sign-On (SSO). Weigh the user benefits to security risks when considering the remote access methods.

  Information mobilization and deployment

Information networks as "enterprise glue": information mobilization and deployment - To what degree should organizations depend on the analysis of large databases and other IT resources to formulate basic strategy?

  Research software applications and information systems

Research software applications and information systems available for the various organizational departments within a company, such as accounting, finance, HR, marketing, and management.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd