PyCaret-ClassificationModels-LoanApprovalPrediction-AnalyticsVidhyaHackathon

Step 1: The data files used for this repository are stored with in this repository (train_loan.csv, test_loan.csv and sample_submission_loan.csv)

Step 2: The code to build Machine Learning Models is with in "PyCaret- Loan Approval Decision.ipynb" Notebook

PyCaret was setup with several parameters

numeric_imputation= mean (Mean value is used to impute the missing values for numeric variables)

normalization= True (is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information.)

transformation= True (Transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution)

feature_interaction= True (It is often seen in machine learning experiments when two features combined through an arithmetic operation becomes more significant in explaining variances in the data, than the same two features separately)

feature_selection= True (Feature Importance is a process used to select features in the dataset that contributes the most in predicting the target variable. Working with selected features instead of all the features reduces the risk of over-fitting, improves accuracy, and decreases the training time. In PyCaret, this can be achieved using feature_selection parameter)

remove_multicollinearity=True (Multicollinearity (also called collinearity) is a phenomenon in which one feature variable in the dataset is highly linearly correlated with another feature variable in the same dataset. Multicollinearity increases the variance of the coefficients, thus making them unstable and noisy for linear models. One such way to deal with Multicollinearity is to drop one of the two features that are highly correlated with each other. This can be achieved in PyCaret using remove_multicollinearity parameter within setup)

multicollinearity_threshold (Threshold used for dropping the correlated features. Only comes into effect when remove_multicollinearity is set to True)

ignore_low_variance= True (Sometimes a dataset may have a categorical feature with multiple levels, where distribution of such levels are skewed and one level may dominate over other levels. This means there is not much variation in the information provided by such feature. Such features are eliminated when this is set as True)

Step 3: The final submission file (with Accuracy value of 88%) that is submitted to the Hackathon is named under this directory "loan_approval_pycaret.csv"

Step 4: The saved model is loaded in the form of .pkl file which can be used for deployment. .pkl file is large to upload. It was compressed and stored with in repository as "Final CatBoost Classifier Model 28Jul2020.rar"

A PKL file is a file created by pickle, a Python module that enabless objects to be serialized to files on disk and deserialized back into the program at runtime. It contains a byte stream that represents the objects.

The process of serialization is called "pickling," and deserialization is called "unpickling." A PKL file is pickled to save space when being stored or transferred over a network then is unpickled and loaded back into program memory during runtime

Predict Loan Eligibility for Dream Housing Finance company Dream Housing Finance company deals in all kinds of home loans. They have presence across all urban, semi urban and rural areas. Customer first applies for home loan and after that company validates the customer eligibility for loan.

Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyCaret-ClassificationModels-LoanApprovalPrediction-AnalyticsVidhyaHackathon

PyCaret was setup with several parameters

numeric_imputation= mean (Mean value is used to impute the missing values for numeric variables)

normalization= True (is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information.)

transformation= True (Transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution)

feature_interaction= True (It is often seen in machine learning experiments when two features combined through an arithmetic operation becomes more significant in explaining variances in the data, than the same two features separately)

multicollinearity_threshold (Threshold used for dropping the correlated features. Only comes into effect when remove_multicollinearity is set to True)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Final CatBoost Classifier Model 28Jul2020.rar		Final CatBoost Classifier Model 28Jul2020.rar
PyCaret- Loan Approval Decision.ipynb		PyCaret- Loan Approval Decision.ipynb
README.md		README.md
loan_approval_pycaret.csv		loan_approval_pycaret.csv
sample_submission_loan.csv		sample_submission_loan.csv
test_loan.csv		test_loan.csv
train_loan.csv		train_loan.csv

krishcy25/PyCaret-ClassificationModels-LoanApprovalPrediction-AnalyticsVidhyaHackathon

Folders and files

Latest commit

History

Repository files navigation

PyCaret-ClassificationModels-LoanApprovalPrediction-AnalyticsVidhyaHackathon

PyCaret was setup with several parameters

numeric_imputation= mean (Mean value is used to impute the missing values for numeric variables)

normalization= True (is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information.)

transformation= True (Transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution)

feature_interaction= True (It is often seen in machine learning experiments when two features combined through an arithmetic operation becomes more significant in explaining variances in the data, than the same two features separately)

multicollinearity_threshold (Threshold used for dropping the correlated features. Only comes into effect when remove_multicollinearity is set to True)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages