Reference no: EM132265911
Data Science Assignments -
Assignment 1: Understanding Customer Churn at BondTelco
Congratulations, you are a newly employed data scientist at BondTelco. BondTelco is a retail provider of contract mobile phone services. Although quite a large company, BondTelco have not yet dabbled in Data Science, indeed, you are their first employee in this role.
Currently, management at BondTelco are concerned about the high rate of churn among their customers. To try and address this concern, the sales staff have been encouraged to ring up customers whose contracts are coming due, and offer them incentives to stay with the company. Unfortunately for management, this is an expensive process, as it involves offering incentives to all customers, whether they are likely to leave or not. What is really required is a way to predict whether a given customer is likely to leave or stay with the company. In this way, only customers who are likely to leave will be offered the incentives, thereby reducing costs.
The first question you have been asked to address is: 'Is there a way to determine in advance which customers are likely to leave when their contracts are up?'
The IT team provide you with access to the company database, which contains a table (BRUCEDBA.BondTelco_Customers) containing data on 20,000 previous customers, including whether they left or stayed with the company at the end of their contract period. The data in this table are:
COLLEGE : Is the customer college educated?
INCOME : Annual income
OVERAGE : Average overcharges per month
LEFTOVER : Average % leftover minutes per month
HOUSE : Value of dwelling (from census tract)
HANDSET_PRICE : Cost of phone
OVER_15MINS_CALLS_PER_MONTH : Average number of long (>15 mins) calls per month
AVERAGE_CALL_DURATION : Average call duration
REPORTED_SATISFACTION : Reported level of satisfaction
REPORTED_USAGE_LEVEL : Self-reported usage level
CONSIDERING_CHANGE_OF_PLAN : Was customer considering changing his/her plan?
LEAVE : Whether customer left or stayed
Your goal is to create a decision tree which can predict class membership of the LEAVE variable. The sales team will then use your tree's rules to determine whether a customer is likely to stay with or leave the company. Clearly, the better the tree prediction, the more retention costs can be reduced.
Assignment 2: Predicting the probability a Customer churns at BondTelco
Your previous work on Decision Trees impressed the management at BondTelco. Now that the management are talking an interest in Data Science, they have heard that there is a method of directly estimating the probability that a specific customer will churn.
The decision tree you created was certainly useful, but management would like to go a step further. Instead of just offering all customers who are likely to churn an incentive to stay, they would like to tier the levels of incentive. What is needed is an estimate of the probability that a given customer will leave.
Although they definitely don't want to interfere with your work, management have also heard about ideas like splitting the data into training and testing sets, and cross validation. They don't really know much about different techniques for doing this, and suggest that you should look into it, and see if it is a good approach for this particular problem.
You have access to the same data as Assignment 1.
Your goal is to create the best model you can to predict the probability a customer will LEAVE. You will need to be able to assess the models ability to predict on unseen data. The sales team will then use this model, and thus your probability estimate, to decide the kind of incentive the customer should be offered. Clearly, the better the prediction, the more the spending on retention costs can be optimized.
Assignment 3: Will a given customer churn at BondTelco?
Although your Decision Trees and Logistic Regression models have been making news around the company, as a solid Data Scientist, you know that to come up with a good model, you need to try a number of statistical learning techniques.
Your next job is to use knn classification to determine the probability that a given customer will churn.
You have access to the same data as Assignment 1.
Your goal is to create the best model you can to predict the probability a customer will LEAVE. You will need to be able to assess the models ability to predict on unseen data.
Attachment:- Assignment Files.rar