Skip to content

This project focuses on predicting customer churn in the telecom industry using machine learning techniques. The model is trained to identify factors that influence customer retention and accurately predict whether a customer is likely to stay or leave.

Notifications You must be signed in to change notification settings

Tanwar-12/No-Churn-Telecom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

DATA SCIENCE PROJECT:NO CHURN TELECOM

imageimage

BUSINESS CASE: PREDICT WHETHER A CUSTOMER WILL CHURN (LEAVE THE SERVICE) OR NOT.

TASK: CLASSIFICATION

INTRODUCTION OF PROJECT:

  • No-Churn Telecom is an established Telecom operator in Europe with more than a decade in Business. Due to new players in the market, telecom industry has become very competitive and retaining customers becoming a challenge.

  • In spite of No-Churn initiatives of reducing tariffs and promoting more offers, the churn rate (percentage of customers migrating to competitors) is well above 10%.

  • No-Churn wants to explore possibility of Machine Learning to help with following use cases to retain competitive edge in the industry.

PROJECT GOAL:

1. Understanding the variables that are influencing the customers to migrate.

2. Creating Churn risk scores that can be indicative to drive retention campaigns.

3. Introduce new predicting variable “CHURN-FLAG” with values YES(1) or NO(0) so that email campaigns with lucrative offers be targeted to Churn YES customers.

PROJECT IS DEVICE INTO CERTAIN STEPS:

1.Fetching data from data-base.

2.Domain Analysis.

3.EDA: [Univariate, Bivariate & analysis condition]

4.Data preprocessing/Feature Engineering.

5.Features Selection.

6.Model Creation.

7.Model Evaluation.

8.Model Comparison.

9.Conclusion

DOMAIN ANALYSIS

  • State - 2-letter code of the US state of customer residence.

  • Account Length - Number of months the customer has been with the current telco provider.

  • Area Code - 3 digit area code.

  • Phone - Phone number of customer.

  • International Plan - The customer has international plan or not.

  • VMail Plan - The customer has voice mail plan or not.

  • VMail Message - Number of voice-mail messages.

  • Day Mins - Total minutes of day calls.

  • Day Calls - Total number of day calls.

  • Day Charge - Total charge of day calls.

  • Eve Mins - Total minutes of evening calls.

  • Eve Calls - Total number of evening calls.

  • Eve Charge - Total charge of evening calls.

  • Night Mins - Total minutes of night calls.

  • Night Calls - Total number of night calls.

  • Night Charge - Total charge of night calls.

  • International Mins - Total minutes of international calls.

  • International calls - Total number of international calls.

  • International Charge - Total charge of international calls.

  • CustServ Calls - Number of calls to customer service.

  • Churn - Customer churn or not. (target variable)

    EXPLORATORY DATA ANALYSIS

UNIVARIATE ANALYSIS

USING HISTOGRAM

image

USING COUNTPLOT

image

image

image

BIVARIATE ANALYSIS

image

PAIRWISE SCATTER PLOTS image

BOXPLOTS image

image image

DATA PREPROCESSING

  • HANDLING NULL VALUES
  • HANDLING CATEGORICAL DATA

OUTLIERS HANDLING

image

SCALING THE DATA

Using MinMaxScaler

FEATURES SELECTION

CHECKING THE CORRELATION

image

MODEL COMAPRISION REPORT:

  • Logistic Regression :- 71.21%

  • Cross validation on logistic regression :- 86.26%

  • Logistic Regression with best hyperparameter :- 86.47%

  • Support Vector Machine :- 85.49%

  • Cross validation on SVM :- 89.06%

  • K-Nearest Neighbor :- 87.66%

  • Cross validation on KNN :- 87.35%

  • K-Nearest Neighbor with best hyperparameter :- 88.52%

  • Decision Tree Classifier :- 96.53%

  • Cross validation on Decision Tree Classifier :- 86.00%

  • Decision Tree with best hyperparameter :- 85.49%

  • Random Forest Classifier :- 97.51%

  • Cross validation on Random Forest Classifier :- 92.18%

  • Random Forest with best hyperparameter :- 97.83%

  • Gradient Boosting :- 91.23%

  • Cross validation on Gradient Boosting :- 92.31%

  • Gradient Boosting with best hyperparameter :- 98.05%

  • XGBoost :- 97.94%

  • Cross validation on XGBoost :- 94.63%

  • XGBoost with best hyperparameter :- 92.42%

  • Artificial Neural Network :- 13.52%

  • Cross validation on ANN :- 92.31%

  • ANN with best hyperparameter :- 97.94%

The best accuracy is given by the Gradient Boosting with best hyperparameter i.e. 98.05%

CONCLUSION

In the "NO-churn telecom" dataset comprising 4617 entries, the Gradient Boosting model, fine-tuned with the best hyperparameters, demonstrated an exceptional accuracy of 98.05%.

This signifies the effectiveness of Gradient Boosting in accurately predicting churn within the telecommunications dataset. The high accuracy underscores the model's robust performance and suitability for deployment in predicting customer churn.

Careful consideration should be given to potential applications in real-world scenarios, and further analysis may be conducted to ensure the model's reliability and generalizability.

About

This project focuses on predicting customer churn in the telecom industry using machine learning techniques. The model is trained to identify factors that influence customer retention and accurately predict whether a customer is likely to stay or leave.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published