Bankruptcy prediction model with machine learning

Assignment Help Python Programming

Reference no: EM133558475

Data Analytics Project: Understanding the business problem and proposing analytical solutions, undertaken individually

PURPOSE

This assignment aims to provide you with an opportunity to practice your skills in data wrangling, data analysis, data modelling, visualisation and basic modelling while working with real-world data.

The learning outcomes for this assessment are:

LO 1 Critically analyse the role of business analytics in supporting decision making in a modern organisation, with a focus of working with different data formats and data wrangling techniques;

LO 2 Investigate and assess different analytics solutions in open source; environments to develop effective visualisations;Evaluate analytics models to uncover hidden patterns in business data and understand relationships between variables;

LO 4 Deconstruct and exemplify data communication strategies through reproducible reporting and collaborative practices with version control; and,

LO 5 Exemplify creative and innovative problem-solving of complex professional challenges through the application data analytics in the business domain

TASKS

Task 1: Linear Regression
Data sets are published in VUCollaborate, you can pick any data set, confirm with your local lecture and answer the following questions The Dataset for task 1: Employee Attrition (Please look up the assigned dataset for your group in VUCollaborate)
The key to success in any organisation is attracting and retaining top talent. As an HR analyst, and one of your tasks is to determine which factors keep employees at my company and which prompt others to leave. Your manager will want to know which employees are at risk or leaving the company and what factors she/he can change to prevent the loss of good employees.

Attribute Information:

Number of Attributes: 35

Attributes:

Age Attrition
BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
EnvironmentSatisfaction Gender
HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate
NumCompaniesWorked Over18
OverTime PercentSalaryHike PerformanceRating
RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotio n YearsWithCurrManager

Other specifics:

Education1 'Below College' 2 'College' 3 'Bachelor' 4 'Master' 5 'Doctor'
EnvironmentSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
JobInvolvement1 'Low' 2 'Medium' 3 'High' 4 'Very High'
JobSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
PerformanceRating1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'
RelationshipSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
WorkLifeBalance1 'Bad' 2 'Good' 3 'Better' 4 'Best'

For the dataset, complete the following tasks:

Missing Values:

1. Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument.

2. List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning?

Decision trees or apply other visualisation technique
Create a logistic regression model using the provided dataset.
Which are the factors that matter when retaining employees? Explain.
Which are the condition(s) that will likely lead to employee attrition? Explain.
Based on the conducted analysis, what other value added observations can you make?

Task 2: Linear Regression
The Dataset for task 2: Future 500 (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher)
The Future-500 dataset contains sample data from approx. 400 companies over a specific period from 2000 till 2014. It contains information regarding the name of the company, the type of industry it belongs to, the year of inception, the number of employees and its growth. It also contains information about the revenue, expenses and the profit the company makes.

The aim of the task is to predict profit.

Dataset Description:

Column Name	Type	Description
ID	WholeNumber (Integer)	Unique ID in the table
Name	Text	Name of the company
Industry	Text	Type of the industry of the company
Inception	Integer	The commencement year of the company
Employees	WholeNumber (Integer)	Number of employees in the company
State	Text	State to which company belongs to
City	Text	City to which company belongs to
Revenue	Decimal	Revenue every company makes
Expenses	Decimal	Expenses of each company
Profit	Decimal	Profit per company
Growth	Percentage	The percentage of growth shown by each company since its inception

For the dataset, complete the following tasks:

Missing Values:
Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument.
List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning?

Linear Regression or apply other visualisation technique:
1. Based on the dataset, list dependent (outcome variable) and independent variable(s) (input variables = factors that affect the outcome). Give reasons for your choice.
2. Create a regression model based on the previous step.
3. Which variables would be the best candidates for the independent variable?
4. Comment on the correlation between the dependent and independent variable(s) used in the model? Support your answer with appropriate argument and evidence.
5. Based on the regression model created, make 1 or more predictions for increasing profit.
6. Based on these results, what other value added observations can you make?

Task 3: Descriptive analysis and visualization
The Dataset for task 3: Groceries (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher)
The data shown in the dataset describes grocery purchases. The store wants to analyse purchases of these items for purposes of point - of - sale display, guidance to sales personnel in promoting cross sales, and guidance for piloting an eventual time of purchase electronic recommender system to boost cross sales.
For the dataset, complete the following tasks:
Data cleaning:
Clean the data in excel,
Assign attributes and data type,
Import data in Python

Visualising data:
Discuss what is visualisation and why it is important in the business world (200 words).
Show a bar chart with four attributes and explain
Show a Scatter Plot with two attributes and explain
Show a Box Plot with two attributes and explain

Business analytics and visualization Workshop

Bankruptcy Prediction Model with Machine Learning

Introduction
Bankruptcy is the concept of financial accounts. If you are one of the data science enthusiasts who started with data science after commerce then you should be aware of what bankruptcy is. When a business or legal person fails to pay the debts of creditors and becomes insolvent at some point, this type of situation is called bankruptcy.

By using machine learning algorithms we can train a model to predict whether a company or a legal person will become bankrupt in future or not. In the section below, I will take you through a machine learning tutorial on how to train a model for the task of bankruptcy prediction of a company by using the Python programming language.

Download "Bank.csv" data set and follow these instructions.
Now let's import the dataset and the necessary Python libraries to start with the task of training a bankruptcy prediction model using Python:

The dataset contains 96 columns, let's have a look at the correlation before training the model:

As the "Bankrupt?" column is the target label so I will drop it from the training data:

Now let's split the dataset and use the logistic regression model to train the bankruptcy prediction model:

Now let's have a look at the accuracy score on the training set:

So the model is performing well on the training data by giving an accuracy of about 95%. This is how we can use machine learning in finance. You can do a lot more on this dataset to explore more use cases of machine learning in finance. I hope you liked this article on how to train a bankruptcy prediction model with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

Discuss this result with your group members (3-4 members in a group) and present the case to your lecturer.

Attachment:- Business analytics and visualization.rar

Reference no: EM133558475

Questions Cloud

How many minutes should it take you to run the same program : how many minutes should it take you to run the same program on a brand-new computer in two years?

What you suggest blake do to learn more about acupuncture : What would you suggest Blake could do to learn more about acupuncture as a treatment for back pain? Do insurance companies pay for acupuncture?

Primary objectives of data analytics : One of the primary objectives of data analytics to produce accurate predictions of great value to clients or constituents.

Current cultural environment of our country shaped : How has the current cultural environment of our country shaped the way that companies are looking at their own corporate cultural standards?

Bankruptcy prediction model with machine learning : BCO7000 Business analytics and visualization Workshop, Victoria University - Discuss this result with your group members (3-4 members in a group) and present

Writing request for proposal : Using your favorite search engine, look for a resource that provides guidance on writing a request for proposal (RFP).

Develop a three-level classification system for your email : Develop a three-level classification system for your email communications. Consider the type of emails you send and receive. Take into consideration

Discuss the advantages of mobile phones : Discuss the advantages of mobile phones and tablets as a marketing tool over traditional marketing channels.

Evaluate strengths and limitations of personality measure : Evaluate strengths and limitations of this personality measure. Provide a minimum of two examples and explain how this clinical personality measure could be use

User Account

All Pages