Predicting-H1N1-and-Seasonal-Flu-Uptake

-> Rank 46 -- Driven Data Username : Karthi_DataScience

Implemented a machine learning model to predict the likelihood of individuals receiving H1N1 and seasonal flu vaccinations by implementing data analysis and preprocessing techniques, feature engineering, Hyperparameter tuning using Optuna, and advanced classification algorithms to develop the most efficient predictive model and ranked 46 in the Driven Data out of 6500+ competitors.

The main file of this project is Predicting H1N1 and Seasonal Flu Uptake - EDA and ML implementation (XGBoost and CatBoost).ipynb. This file includes ML model implementations of XGBoost and CatBoost. The best scores I got from XGB and CatBoost are 86.25 and 86.22.
The EDA.py has all the necessary Exploratory Data Analysis and EDA with labels.py has all the necessary Exploratory Data Analysis with labels
The missing value Analysis and imputation.py includes the analysis of missing values and methods for imputation.
I tried to implement Sequential Feature Selection, but unfortunately, I did not get good scores. So I didn’t concentrate much on Feature Selection and Started focusing on models. Feel free to explore the SFS - XGB file for Feature Selection.
XGBoost performed well in the H1N1 Vaccine, and CatBoost performed well in the Seasonal Flu Vaccine. So I came up with the idea to use a Stacking Classifier and implemented it. The guidelines for deploying the Stacking Classifier are mentioned below.

This README file provides information about the Stacking Classifier model used for scoring with an ROC-AUC score of 86.37. The model utilizes a combination of Logistic Regression as the base model and two main models, namely XGBoost and CatBoost.

The Stacking Classifier is a powerful ensemble learning technique that combines the predictions of multiple base models to make final predictions. It leverages the strengths of each base model and creates a meta-model that learns to combine their predictions effectively. The Stacking Classifier model, using Logistic Regression as the base model and XGBoost and CatBoost as the main models, achieved a score of 86.37.

- Base Model: Logistic Regression - Main Models: XGBoost and CatBoost - Score: 86.37

Feel free to modify and experiment with the model to achieve even better results. I tried one basic Feature Engineering combination, and in the beginning, it gave me good results. After increasing the trials in OPTUNA, I achieved the same results. Use the same parameters for the Stacking Classifier’s Main models.

Train a model until it reaches the saturation point and after that, go back to the EDA notebook. Do some analysis for feature engineering and start training.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Descriptive Analysis.ipynb		Descriptive Analysis.ipynb
EDA using labels.ipynb		EDA using labels.ipynb
EDA.ipynb		EDA.ipynb
Missing Value Analysis and Imputation.ipynb		Missing Value Analysis and Imputation.ipynb
Predicting H1N1 and Seasonal Flu Uptake - EDA and ML implementation (XGBoost and CatBoost).ipynb		Predicting H1N1 and Seasonal Flu Uptake - EDA and ML implementation (XGBoost and CatBoost).ipynb
README.md		README.md
SFS - XGB ( FORWARD FEATURE SELECTION ).ipynb		SFS - XGB ( FORWARD FEATURE SELECTION ).ipynb
submission_format.csv		submission_format.csv
test_set_features.csv		test_set_features.csv
training_set_features.csv		training_set_features.csv
training_set_labels.csv		training_set_labels.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-H1N1-and-Seasonal-Flu-Uptake

-> Rank 46 -- Driven Data Username : Karthi_DataScience

- Base Model: Logistic Regression - Main Models: XGBoost and CatBoost - Score: 86.37

About

Releases

Packages

Languages

Karthi-DStech/Predicting-H1N1-and-Seasonal-Flu-Uptake

Folders and files

Latest commit

History

Repository files navigation

Predicting-H1N1-and-Seasonal-Flu-Uptake

-> Rank 46 -- Driven Data Username : Karthi_DataScience

- Base Model: Logistic Regression - Main Models: XGBoost and CatBoost - Score: 86.37

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages