Reference no: EM132376972
Questions -
Question 1 - There is substantial value in understanding what influences churn rates. In particular, customer stickiness (the nature of your customers to continue to use your products or services, to "stick" with you) is a relevant aspect to consider. Which of the following visualizations provides evidence of "customer stickiness"?
a. plot(factor(Churn) ~ tenure, data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Tenure (months)")
b. plot(factor(Churn) ~ factor(gender), data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Gender")
c. plot(factor(Churn) ~ factor(SeniorCitizen), data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Senior Citizen")
d. plot(factor(Churn) ~ MonthlyCharges, data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Monthly charges")
e. plot(factor(Churn) ~ factor(gender), data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Gender")
f. plot(factor(Churn) ~ factor(SeniorCitizen), data=churndata, col=c(8,2), ylab="Churn Rate", xlab="Senior Citizen")
Question 2 - We would like to know if long term customers (with higher tenure) are receiving discounts in their monthly fees or not. Which of the following provides the best argument?
a. The company seems to price only based on service features and does not seem to price discriminate based on customer's characteristics (as those coefficients are not statistically significant).
res_tenure <- glm(MonthlyCharges~.-customerID-Churn-TotalCharges, data=churndata)
b. After accounting for a quadratic trend, and adding a new variable to the model, we see that the company also seems to price discriminate based on tenure (although with decreasing impact).
churndata$tenure.sq <- churndata$tenure^2
res_tenure.sq <-glm(MonthlyCharges~.-customerID-Churn-TotalCharges, data=churndata)
summary(res_tenure.sq)$coef[,c(1,4)]
churndata<-churndata[,(names(churndata)!="tenure.sq")]
c. Long term customers are more loyal to the company and therefore they are willing to pay more to stay with the company. This is exemplified by the positive correlation between the variables
cor(churndata$tenure,churndata$MonthlyCharges)
plot(MonthlyCharges~tenure, data=churndata, xlab='Monthly charges (dollars)',
ylab='Tenure (months)', main='Churn')
d. Long term customers have contracts which are older and hence more expensive as the technology industry reduces costs over time.
res_tenure.simple <- glm(MonthlyCharges~tenure, data=churndata)
summary(res_tenure)
This is confirmed by the positive coefficient which is statistically significant.
Question 3 - Based on the logistic regression model with all variables (except customerID) discussed in class,
result.logistic<-glm(Churn~.-customerID, data=churndata,family="binomial")
among the customers 101 to 110, what is the highest probability of churn and how much is it?
predict(result.logistic,newdata=churndata[101:110,])
which.max(predict(result.logistic,newdata=churndata[101:110,],type="response"))
max(predict(result.logistic,newdata=churndata[101:110,],type="response"))
a. It is customer number 110 and it is above 20%
b. It is customer number 106 and it is near 45%
c. It is customer number 110 and it is below 1%
d. It is customer number 106 and it is below 45%
Question 4 - Run the logistic regression model with all variables (except customerID) discussed in class
result.logistic<-glm(Churn~.-customerID,data=churndata,family="binomial")
Next, we will use the model to classify using different thresholds on the predicted probability (not necessarily .5). We will use a function in FPR_TPR.R that computes the true positive rate and false positive rate. The code below plots several choices.
plot( c( 0, 1 ), c(0, 1), type="n", xlim=c(0,1), ylim=c(0,1), bty="n", xlab = "False positive rate",
ylab="True positive rate")
lines(c(0,1),c(0,1), lty=2)
for ( val in seq(from = 0, to = 1, by = 0.05) ){
values <- FPR_TPR( (result.logistic$fitted >= val),result.logistic$y)
points( values$FPR , values$TPR )
}
Which of the following is not true?
a. Good performance should be above the diagonal.
b. If we choose the predicted probability threshold properly we can achieve FPR=.1 and TPR=.9 or better.
c. The points (0,0) and (1,1) are not interesting as they correspond to "always predict negative (no churn)" and "always predict positive (churn)" independently of the customer.
d. Although 100% accuracy seems impossible, using this predictive model we can achieve a false positive rate that is three times smaller than the true positive rate.
Question 5 - You are modeling the churn problem in your company. Currently, to propose an offer to avoid churn costs $5. If the offer is accepted the customer stays but you incur an additional $45 in costs. A customer has an expected value of $1000 if he/she stays with the company. Which of the following cost-benefit matrix models the setting you are concerned with?
a.
|
|
Churn
|
No Churn
|
|
Offer
|
45
|
1000
|
|
No Offer
|
0
|
0
|
b.
|
|
Churn
|
No Churn
|
|
Offer
|
-5
|
950
|
|
No Offer
|
0
|
1000
|
c.
|
|
Churn
|
No Churn
|
|
Offer
|
50
|
1000
|
|
No Offer
|
0
|
1000
|
d.
|
|
Churn
|
No Churn
|
|
Offer
|
50
|
950
|
|
No Offer
|
0
|
1000
|
Question 6 - In the Churn problem discussed and modeled in class, our decision is whether or not to make an offer to a customer while our prediction is whether a customer will churn or not. There are offers which are made available to all customers (e.g. via TV advertisement) and other offers which are exclusive (e.g. via phone calls). There are fundamental differences between these strategies. Which of the following is not true?
a. The "exclusive offers" allow the company to target most profitable customers while the "all-in offer" allows customers to self-select. The latter can attract a large number of "lemons" (i.e. bad clients) if not properly designed.
b. Both strategies cannot be offered simultaneously as they cannibalize each other.
c. The "exclusive offer" requires personal information and its deployment has a more limited reach compared to the "all-in offer". Thus the latter can be interesting to broad the customer base.
d. Even though offers could in principle cannibalize each other data analytics tools can be used to set prices and discounts to help reduce the impact of cannibalization.