Reference no: EM133558475
Data Analytics Project: Understanding the business problem and proposing analytical solutions, undertaken individually
PURPOSE
This assignment aims to provide you with an opportunity to practice your skills in data wrangling, data analysis, data modelling, visualisation and basic modelling while working with real-world data.
The learning outcomes for this assessment are:
LO 1 Critically analyse the role of business analytics in supporting decision making in a modern organisation, with a focus of working with different data formats and data wrangling techniques;
LO 2 Investigate and assess different analytics solutions in open source; environments to develop effective visualisations;Evaluate analytics models to uncover hidden patterns in business data and understand relationships between variables;
LO 4 Deconstruct and exemplify data communication strategies through reproducible reporting and collaborative practices with version control; and,
LO 5 Exemplify creative and innovative problem-solving of complex professional challenges through the application data analytics in the business domain
TASKS
Task 1: Linear Regression
Data sets are published in VUCollaborate, you can pick any data set, confirm with your local lecture and answer the following questions The Dataset for task 1: Employee Attrition (Please look up the assigned dataset for your group in VUCollaborate)
The key to success in any organisation is attracting and retaining top talent. As an HR analyst, and one of your tasks is to determine which factors keep employees at my company and which prompt others to leave. Your manager will want to know which employees are at risk or leaving the company and what factors she/he can change to prevent the loss of good employees.
Attribute Information:
Number of Attributes: 35
Attributes:
- Age Attrition
- BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
- EnvironmentSatisfaction Gender
- HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate
- NumCompaniesWorked Over18
- OverTime PercentSalaryHike PerformanceRating
- RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotio n YearsWithCurrManager
Other specifics:
- Education1 'Below College' 2 'College' 3 'Bachelor' 4 'Master' 5 'Doctor'
- EnvironmentSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
- JobInvolvement1 'Low' 2 'Medium' 3 'High' 4 'Very High'
- JobSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
- PerformanceRating1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'
- RelationshipSatisfaction1 'Low' 2 'Medium' 3 'High' 4 'Very High'
- WorkLifeBalance1 'Bad' 2 'Good' 3 'Better' 4 'Best'
For the dataset, complete the following tasks:
Missing Values:
1. Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument.
2. List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning?
Decision trees or apply other visualisation technique
Create a logistic regression model using the provided dataset.
Which are the factors that matter when retaining employees? Explain.
Which are the condition(s) that will likely lead to employee attrition? Explain.
Based on the conducted analysis, what other value added observations can you make?
Task 2: Linear Regression
The Dataset for task 2: Future 500 (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher)
The Future-500 dataset contains sample data from approx. 400 companies over a specific period from 2000 till 2014. It contains information regarding the name of the company, the type of industry it belongs to, the year of inception, the number of employees and its growth. It also contains information about the revenue, expenses and the profit the company makes.
The aim of the task is to predict profit.
Dataset Description:
Column Name
|
Type
|
Description
|
ID
|
WholeNumber (Integer)
|
Unique ID in the table
|
Name
|
Text
|
Name of the company
|
Industry
|
Text
|
Type of the industry of the company
|
Inception
|
Integer
|
The commencement year of the company
|
Employees
|
WholeNumber (Integer)
|
Number of employees in the company
|
State
|
Text
|
State to which company belongs to
|
City
|
Text
|
City to which company belongs to
|
Revenue
|
Decimal
|
Revenue every company makes
|
Expenses
|
Decimal
|
Expenses of each company
|
Profit
|
Decimal
|
Profit per company
|
Growth
|
Percentage
|
The percentage of growth shown by each company since its inception
|
For the dataset, complete the following tasks:
Missing Values:
Are there any anomalies (unusual data or missing values) in the given dataset? Support your answer with appropriate argument.
List two possible strategies to handle cases with missing values in data (if applicable) & provide appropriate reasoning?
Linear Regression or apply other visualisation technique:
1. Based on the dataset, list dependent (outcome variable) and independent variable(s) (input variables = factors that affect the outcome). Give reasons for your choice.
2. Create a regression model based on the previous step.
3. Which variables would be the best candidates for the independent variable?
4. Comment on the correlation between the dependent and independent variable(s) used in the model? Support your answer with appropriate argument and evidence.
5. Based on the regression model created, make 1 or more predictions for increasing profit.
6. Based on these results, what other value added observations can you make?
Task 3: Descriptive analysis and visualization
The Dataset for task 3: Groceries (Please download the assigned dataset for your group in VU Collaborate as directed by your teacher)
The data shown in the dataset describes grocery purchases. The store wants to analyse purchases of these items for purposes of point - of - sale display, guidance to sales personnel in promoting cross sales, and guidance for piloting an eventual time of purchase electronic recommender system to boost cross sales.
For the dataset, complete the following tasks:
Data cleaning:
Clean the data in excel,
Assign attributes and data type,
Import data in Python
Visualising data:
Discuss what is visualisation and why it is important in the business world (200 words).
Show a bar chart with four attributes and explain
Show a Scatter Plot with two attributes and explain
Show a Box Plot with two attributes and explain
Business analytics and visualization Workshop
Bankruptcy Prediction Model with Machine Learning
Introduction
Bankruptcy is the concept of financial accounts. If you are one of the data science enthusiasts who started with data science after commerce then you should be aware of what bankruptcy is. When a business or legal person fails to pay the debts of creditors and becomes insolvent at some point, this type of situation is called bankruptcy.
By using machine learning algorithms we can train a model to predict whether a company or a legal person will become bankrupt in future or not. In the section below, I will take you through a machine learning tutorial on how to train a model for the task of bankruptcy prediction of a company by using the Python programming language.
Download "Bank.csv" data set and follow these instructions.
Now let's import the dataset and the necessary Python libraries to start with the task of training a bankruptcy prediction model using Python:
The dataset contains 96 columns, let's have a look at the correlation before training the model:
As the "Bankrupt?" column is the target label so I will drop it from the training data:
Now let's split the dataset and use the logistic regression model to train the bankruptcy prediction model:
Now let's have a look at the accuracy score on the training set:
So the model is performing well on the training data by giving an accuracy of about 95%. This is how we can use machine learning in finance. You can do a lot more on this dataset to explore more use cases of machine learning in finance. I hope you liked this article on how to train a bankruptcy prediction model with machine learning using Python. Feel free to ask your valuable questions in the comments section below.
Discuss this result with your group members (3-4 members in a group) and present the case to your lecturer.
Attachment:- Business analytics and visualization.rar