Bankruptcy prediction model with machine learning

Assignment Help Python Programming
Reference no: EM133558475

Data Analytics Project: Understanding the business problem and proposing analytical solutions, undertaken individually 

 PURPOSE

This assignment aims to provide you with an opportunity to practice your skills in data wrangling, data analysis, data modelling,  visualisation and basic modelling while working with real-world data.

The learning outcomes for this assessment are:

LO 1 Critically analyse the role of business analytics in supporting decision making in a modern organisation, with a focus of working with different data formats and data wrangling techniques;

LO 2 Investigate and assess different analytics solutions in open source; environments to develop effective visualisations;Evaluate analytics models to uncover hidden patterns in business data and understand relationships between variables;

LO 4 Deconstruct and exemplify data communication strategies through reproducible reporting and collaborative practices with version control; and,

LO 5 Exemplify creative and innovative problem-solving of complex professional challenges through the application data analytics in the business domain 

TASKS 

Task 1: Linear Regression
Data sets are published in VUCollaborate, you can pick any data set, confirm with your local lecture and answer the following questions The Dataset for task 1: Employee Attrition (Please look up the assigned dataset for your group in VUCollaborate) 
The key to success in any organisation is attracting and retaining top talent. As an HR analyst, and one of your tasks is to determine which factors keep employees at my company and which prompt others to leave. Your manager will want to know which employees are at risk or leaving the company and what factors she/he can change to prevent the loss of good employees.

Attribute Information: 

Number of Attributes: 35 

Attributes: 

  • Age Attrition
  • BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
  • EnvironmentSatisfaction Gender
  • HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate
  • NumCompaniesWorked Over18
  • OverTime PercentSalaryHike PerformanceRating
  • RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotio n YearsWithCurrManager

Other specifics: 

  • Education1 'Below College' 2 'College' 3 'Bachelor' 4 'Master' 5 'Doctor'
  • EnvironmentSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
  • JobInvolvement1 'Low' 2 'Medium' 3 'High' 4 'Very High'
  • JobSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
  • PerformanceRating1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'
  • RelationshipSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
  • WorkLifeBalance1 'Bad' 2 'Good' 3 'Better' 4 'Best'

For the dataset, complete the following tasks:

Missing Values: 

1. Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument.

2. List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning? 

Decision trees or apply other visualisation technique  
Create a logistic regression model using the provided dataset. 
Which are the factors that matter when retaining employees? Explain. 
Which are the condition(s) that will likely lead to employee attrition? Explain.
Based on the conducted analysis, what other value added observations can you make? 

Task 2: Linear Regression
The Dataset for task 2: Future 500 (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher) 
The Future-500 dataset contains sample data from approx. 400 companies over a specific period from 2000 till 2014. It contains information regarding the name of the company, the type of industry it belongs to, the year of inception, the number of employees and its growth. It also contains information about the revenue, expenses and the profit the company makes.

The aim of the task is to predict profit.

Dataset Description: 

Column Name 

Type 

Description 

ID 

WholeNumber (Integer) 

Unique ID in the table 

Name 

Text 

Name of the company 

Industry 

Text 

Type of the industry of the company 

Inception 

Integer 

The commencement year of the company 

Employees 

WholeNumber (Integer) 

Number of employees in the company 

State 

Text 

State to which company belongs to 

City 

Text 

City to which company belongs to 

Revenue 

Decimal 

Revenue every company makes 

Expenses 

Decimal 

Expenses of each company 

Profit 

Decimal 

Profit per company 

Growth 

Percentage 

The percentage of growth shown by each company since its inception 

For the dataset, complete the following tasks: 
 
Missing Values: 
Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument. 
List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning? 
 
Linear Regression or apply other visualisation technique: 
1. Based on the dataset, list dependent (outcome variable) and independent variable(s) (input variables = factors that affect the outcome). Give reasons for your choice. 
2. Create a regression model based on the previous step. 
3. Which variables would be the best candidates for the independent variable? 
4. Comment on the correlation between the dependent and independent variable(s) used in the model? Support your answer with appropriate argument and evidence. 
5. Based on the regression model created, make 1 or more predictions for increasing profit. 
6. Based on these results, what other value added observations can you make? 
 
Task 3:
Descriptive analysis and visualization
The Dataset for task 3: Groceries (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher) 
The data shown in the dataset describes grocery purchases. The store wants to analyse purchases of these items for purposes of point - of - sale display, guidance to sales personnel in promoting cross sales, and guidance for piloting an eventual time of purchase electronic recommender system to boost cross sales.  
For the dataset, complete the following tasks: 
Data cleaning: 
Clean the data in excel,
Assign attributes and data type,
Import data in Python  

Visualising data: 
Discuss what is visualisation and why it is important in the business world (200 words). 
Show a bar chart with four attributes and explain 
Show a Scatter Plot with two attributes and explain  
Show a Box Plot with two attributes and explain 

Business analytics and visualization Workshop

Bankruptcy Prediction Model with Machine Learning

Introduction
Bankruptcy is the concept of financial accounts. If you are one of the data science enthusiasts who started with data science after commerce then you should be aware of what bankruptcy is. When a business or legal person fails to pay the debts of creditors and becomes insolvent at some point, this type of situation is called bankruptcy.

By using machine learning algorithms we can train a model to predict whether a company or a legal person will become bankrupt in future or not. In the section below, I will take you through a machine learning tutorial on how to train a model for the task of bankruptcy prediction of a company by using the Python programming language.

Download "Bank.csv" data set and follow these instructions.
Now let's import the dataset and the necessary Python libraries to start with the task of training a bankruptcy prediction model using Python:

The dataset contains 96 columns, let's have a look at the correlation before training the model:

As the "Bankrupt?" column is the target label so I will drop it from the training data:

Now let's split the dataset and use the logistic regression model to train the bankruptcy prediction model:

Now let's have a look at the accuracy score on the training set:

So the model is performing well on the training data by giving an accuracy of about 95%. This is how we can use machine learning in finance. You can do a lot more on this dataset to explore more use cases of machine learning in finance. I hope you liked this article on how to train a bankruptcy prediction model with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

Discuss this result with your group members (3-4 members in a group) and present the case to your lecturer.

Attachment:- Business analytics and visualization.rar

Reference no: EM133558475

Questions Cloud

How many minutes should it take you to run the same program : how many minutes should it take you to run the same program on a brand-new computer in two years?
What you suggest blake do to learn more about acupuncture : What would you suggest Blake could do to learn more about acupuncture as a treatment for back pain? Do insurance companies pay for acupuncture?
Primary objectives of data analytics : One of the primary objectives of data analytics to produce accurate predictions of great value to clients or constituents.
Current cultural environment of our country shaped : How has the current cultural environment of our country shaped the way that companies are looking at their own corporate cultural standards?
Bankruptcy prediction model with machine learning : BCO7000 Business analytics and visualization Workshop, Victoria University - Discuss this result with your group members (3-4 members in a group) and present
Writing request for proposal : Using your favorite search engine, look for a resource that provides guidance on writing a request for proposal (RFP).
Develop a three-level classification system for your email : Develop a three-level classification system for your email communications. Consider the type of emails you send and receive. Take into consideration
Discuss the advantages of mobile phones : Discuss the advantages of mobile phones and tablets as a marketing tool over traditional marketing channels.
Evaluate strengths and limitations of personality measure : Evaluate strengths and limitations of this personality measure. Provide a minimum of two examples and explain how this clinical personality measure could be use

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd