Resume-Classifier-IC

📁📄 Welcome to My Resume Classifier Repository

This work presents a resume-classifier ML model which can categorize resumes from a input directory containing pdf files of resumes. I used "Blurr" to train the model and the script.py file can be run directly from the commandline terminal to get the output.

Data Collection

Data was collected from the link: https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset

In a tottal, there are 3447 data in the file named as Resume.csv.

Model Selection

I used Fastai and Blurr with "distilroberta-base" model as it it easier to use and time effiecient.

Model Training

Finetuned a distilrobera-base model from HuggingFace Transformers using Fastai and Blurr. The model training notebook can be viewed from the "notebook" folder of this repository. The model achieved an accuracy of approximately 96%.

As the size of the dataset was small in this project, it was splitted into 2 sets: train datset and validation dataset. I used the validation dataset for the test dataset in this model.

Model's Performance

The F1 scores of this model are:

F1 Score (Micro) = 0.5473145780051151
F1 Score (Macro) = 0.37185516078339154

ROC curve and AUC score:

Model Compression and ONNX Inference

The trained model has a memory of 300+MB. I compressed this model using ONNX quantization and brought it under 80MB.

Model Deployment and Running the Script from CMD terminal:

For model deplyoment, please use the script.py file. This Python script is executable from the command line as follows:

python script.py "path/to/dir".

The script takes a directory containing the resumes to be categorized as input. Using the trained model, the script categorizes each resume.
• For each resume, it moves the resume to the respective category folder.
• The script also creates and writes a CSV file named as categorized_resumes.csv containing two columns: filename and category.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
models		models
notebooks		notebooks
sample_input_directory		sample_input_directory
sample_output		sample_output
LICENSE		LICENSE
README.md		README.md
category_types_encoded2.json		category_types_encoded2.json
requirements.txt		requirements.txt
resume-classifier-quantized2.onnx		resume-classifier-quantized2.onnx
roc_curve.png		roc_curve.png
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume-Classifier-IC

Data Collection

Model Selection

Model Training

Model's Performance

Model Compression and ONNX Inference

Model Deployment and Running the Script from CMD terminal:

About

Releases

Packages

Languages

License

NasrinRipa/Resume-Classifier-IC

Folders and files

Latest commit

History

Repository files navigation

Resume-Classifier-IC

Data Collection

Model Selection

Model Training

Model's Performance

Model Compression and ONNX Inference

Model Deployment and Running the Script from CMD terminal:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages