Credit Risk Classification Challenge

Background

Credit risk classification presents a significant challenge due to the inherent imbalance in the dataset, where healthy loans significantly outnumber risky loans. This challenge involves using various techniques to train and evaluate models on imbalanced classes. The dataset comprises historical lending activity from a peer-to-peer lending services company, and the objective is to build a model that identifies borrowers' creditworthiness.

What You’re Creating

You will leverage the imbalanced-learn library to train a logistic regression model on two versions of the dataset: the original dataset and a resampled version using the RandomOverSampler module from imbalanced-learn.

For both datasets, you will:

Count the target classes
Train a logistic regression classifier
Calculate the balanced accuracy score
Generate a confusion matrix
Produce a classification report

Additionally, you will document a credit risk analysis report based on a provided template.

Files

To get started, download the following:

Module 12 Challenge files

Instructions

The instructions are divided into the following sections:

Split the Data into Training and Testing Sets

Open the starter code notebook.
Read lending_data.csv from the Resources folder into a Pandas DataFrame.
Create the labels set (y) from the “loan_status” column and the features set (X) from the remaining columns.
- Note: A value of 0 in the “loan_status” column indicates a healthy loan, while 1 indicates a high-risk loan.
Check the balance of the labels using the value_counts function.
Split the data into training and testing datasets using train_test_split.

Create a Logistic Regression Model with the Original Data

Fit a logistic regression model using the training data (X_train and y_train).
Predict the labels for the testing data using X_test and the trained model.
Evaluate the model’s performance:
- Calculate the accuracy score.
- Generate a confusion matrix.
- Print the classification report.
Answer: How well does the logistic regression model predict both the 0 (healthy loan) and 1 (high-risk loan) labels?

Predict a Logistic Regression Model with Resampled Training Data

To potentially improve model performance, you will resample the training data using RandomOverSampler:

Resample the data with RandomOverSampler to ensure equal numbers of labels.
Fit the LogisticRegression classifier on the resampled data and make predictions.
Evaluate the model’s performance:
- Calculate the accuracy score.
- Generate a confusion matrix.
- Print the classification report.
Answer: How well does the logistic regression model, trained with oversampled data, predict both the 0 (healthy loan) and 1 (high-risk loan) labels?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb
lending_data.csv		lending_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Classification Challenge

Background

What You’re Creating

Files

Instructions

Split the Data into Training and Testing Sets

Create a Logistic Regression Model with the Original Data

Predict a Logistic Regression Model with Resampled Training Data

About

Releases

Packages

Languages

blleshi/Credit_Risk_Classification

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Classification Challenge

Background

What You’re Creating

Files

Instructions

Split the Data into Training and Testing Sets

Create a Logistic Regression Model with the Original Data

Predict a Logistic Regression Model with Resampled Training Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages