PREDICTING

PREDICTING PROBABILITY OF CHURN

LOGISTIC REGRESSION

MODEL FIT

LOGISTIC REGRESSION MODEL FIT

MODEL VALIDATION

THRESHOLD SELECTION

&

PREDICTION

THRESHOLD SELECTION

&

PREDICTION

GOODNESS OF

MODEL

GOODNESS OF

MODEL

LOGISTIC REGRESSION MODEL FIT

Creation of Training and Test Logistic Regression

Splitting our main response variable Churn into a ratio of 80:20

First building logistic regression model while taking all variables,using churn as our main response variable

As this full model contains all variables as predictors,

there may be some variables which are not that much useful or significant in terms and just taking extra place in our model or may be they are affecting our model in some ways thus processing

is required,as model having less predictor is quite more accurate in prediction than model with all variables as predictors,therefore we go for further variable selection and processing.

Building our main logistic regression model while taking some variables as predictors,

using churn as our main response variable.

Now after going over with various key and combinations

of predictors in our main model while keeping in mind

there significant statistical impact and p values,also
using variable selection and thus finally come down

to this using this variables as our predictors.

This is our final logistic regression model.

We have used international plan,day minutes,customer service calls,voice mail plan,evening charge,international calls,international charge,night minutes,voice mail

message etc,this as our predictors due to there higher

impact on churn and also as there p values is less

than 0.05,hence they are statistical significant.

model

MODEL VALIDATION

To control the model from overfitting we used cross validaton method

model validation

THRESHOLD SELECTION & PREDICTION

Function was written to find the accuracy for different threshold probabilities.It was observed that although accuracy and specificity increases with threshold limits of following when we go looking in this manner [>=0.1],[>=0.15],[>=0.16],[>=0.2],[>=0.3],[>=0.4] but its sensitivity on other hand also started decreasing with that Hence we tested for different probabilities of [>=0.1],[>=0.15],[>=0.16],[>=0.2],[>=0.3],[>=0.4]out of which I go with threshold probability of [>=0.16] by keeping in mind all other measures values of specificity,sensitivity and accuracy etc,as we know accuracy

is main criteria but we can't ignore specificity and sensitivity as well here for our churn prediction

we have to keep extra eye on sensitivity which is correct classification of prediction 1

that is customer which are likely to churn

hence making sensitivity important criteria here.

Threshold Probability is final as 0.16 for the model
with ACCURACY as 77.51% (approx 78%)
with SPECIFICITY as 78.77%
with SENSITIVITY 70.10%.

Making Prediction

We make Prediction on the final model using test churn data

selection n prediction

GOODNESS OF MODEL

Check for multicollinearity

Variance Inflation Factor is used to check this

Variable value between 1-15 is considered fine,thus this check pass.

AUC & ROC Curve

Area Under Curve (AUC) and Receiver Operating Characteristic (ROC) curve is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR)against the false positive rate (FPR)

at various threshold settings.

AUC is 81.3%,Since area under the curve value is greater than 80% hence the model is good

key Drivers of the model
Showing the variables which contribute more to the model

goodness of model

PREDICTING PROBABILITY OF CHURN

MODEL FIT

&

PREDICTION

&

PREDICTION

MODEL

MODEL

LOGISTIC REGRESSION MODEL FIT

MODEL VALIDATION

To control the model from overfitting we used cross validaton method

THRESHOLD SELECTION & PREDICTION

GOODNESS OF MODEL

Check for multicollinearity

Variance Inflation Factor is used to check this

Variable value between 1-15 is considered fine,thus this check pass.

AUC & ROC Curve

Area Under Curve (AUC) and Receiver Operating Characteristic (ROC) curve is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR)against the false positive rate (FPR)

at various threshold settings.

AUC is 81.3%,Since area under the curve value is greater than 80% hence the model is good

key Drivers of the model Showing the variables which contribute more to the model

key Drivers of the model
Showing the variables which contribute more to the model