🫁 Lung Cancer Detection 🔍

Problem Statement

Given a CT Scan image we have to classify wheather the CT Scan image is Adenocarinoma cancer or not.

Solution Explaination

Click the below image to see vedio solution explaination.

Approch for the problem

Steps

Problem understand and gaining information about cancer.
Data collection and uploading zip file to google drive.
Creation of virutal environment.
Performing experiment on jupyter notebook using pretrained VGG 16 model.
Creation of project structure and project packaging.
Converting jupyter notebook code to modular coding with exception handling and logging.
Developing training pipeline components and pipeline itself.
Intgration of mlflow to track experiments and for recoding of parameter, result and preformance metrics.
Training model using training pipeline and tracking experiments using mlflow with dagshub as remote repository.
Storing trained model in local artifacts repository.
Developing prediction pipeline which classify wheather lung has adinocarsinoma cancer or not using chest/lung ct scan image.
Developing of application using streamlit which takes ct scan image from user then uses trained model to predict output then render back on ui.
Dockerizing application to deploy on cloud.
Deploying lung cancer detection application on AWS cloud.

Workflows for bulding training pipeline components

Update config.yaml
Update secrets.yaml (Optional)
Update params.yaml
Update the entity
Update the configuration manager in src config
Update the components
Update the stages
Update the Training pipeline
Update the dvc.yaml

Note : When we use mlflow before runing code we have to set the mlflow variables
Note : When we do pipeline versioning we have to have driver code in each stages itself

Project UI

Case 1 : Adinocarsinoma cancer image

Case 2 : Normal image

Integration of DVC

I have used DVC for versioning of training pipeline.

The below image in the mlflow dag which represent the dependency of the components.

Integration of Mlflow and Dags

https://dagshub.com/DarshanRokkad/Chest_Cancer_Classification

I used mlflow to manage my deep learning life cycle by logging the evalution metrics and plots.

I used dagshub as a remote repository with mlflow to store the logs and artifacts.

The below is project pipeline of the project.

Deployment of streamlit application on AWS cloud

I have used AWS ECR and AWS EC2 to deploy our application.

Project Structure

│  
├── .dvc                                               <-- used for data and pipeline versioning
│  
├── .github/workflow                                   <-- contains yml code to create CI-CD pipeline for github actions 
│          
├── artificats (remote)                                <-- contains dataset and trained models(in remote repository)
│          
├── config                                             <-- contains yaml file where we mention the configuration of our project
│          
├── images                                             <-- contains images used in readme file
│          
├── logs (remote)                                      <-- contains logs created during running of pipelines and components
│          
├── notebook                                           <-- contains jupyter notebook where experiments and research work is done
│  
├── secrets (remote)                                   <-- contains a yaml file which contains the api tokens, secreat keys, password and many more
│
├── src
│    │
│    └── lung_cancer_classifier (package)
│          │
│          ├── components
│          │     │
│          │     ├── __init__.py
│          │     │
│          │     ├── data_ingestion.py                 <-- this module downloads zip file dataset present in google drive and extracts zip file in local machine
│          │     │
│          │     ├── prepare_base_model.py             <-- this module pulls the vgg-16 base model and adds custom layers at the end then saves custom model
│          │     │
│          │     ├── model_trainer.py                  <-- this module take the custom model and train it with the training data and validates with validation data
│          │     │
│          │     └── model_evaluation.py               <-- this module test the trained model with the testing data and log the evaluation metrics and artifacts to dagshub using mlflow 
│          │
│          ├── config                                  <-- this folder contains module that have the configuration manager which is used to manage configuration of each components of training pipeline  
│          │
│          ├── constants                               <-- module contains path of the yaml file 
│          │
│          ├── entity                                  <-- has a python file which contains data class of each component of the training pipeline
│          │
│          ├── pipeline
│          │     │
│          │     ├── __init__.py
│          │     │
│          │     ├── training_pipeline.py              <-- module used to train the model in different stages
│          │     │
│          │     └── prediction_pipeline.py            <-- module takes the image from user through web application and returns the prediction
│          │
│          ├── training_stages                         <-- folder used to create stages by using the configuration manager and components 
│          │     │
│          │     ├── __init__.py
│          │     │
│          │     ├── stage_01_data_ingestion.py        <-- module used to create a data ingestion configuration object and then ingest data into local machine
│          │     │
│          │     ├── stage_02_prepare_base_model.py    <-- module used to create custom model by using vgg-16 as base model and modify/add few fully connected layers at last
│          │     │
│          │     ├── stage_03_model_trainer.py         <-- module used to train custom model using training and validation data
│          │     │
│          │     └── stage_04_model_evaluation.py      <-- module used to evaluate the trained model using test data
│          │
│          ├── utils                                   <-- module to which contians functions that are commonly used.
│          │
│          └── __init__.py                             <-- this python file contains logger
│          
├── .dvcignore                                         <-- similar to .gitignore 
│          
├── .gitignore                                         <-- used to ignore the unwanted file and folders
│          
├── LICENSE                                            <-- copyright license for the github repository 
│          
├── README.md                                          <-- used to display the information about the project
│          
├── app.py                                             <-- this is contains web page written in streamlit
│          
├── dvc.lock                                           <-- this is file is output of pipeline versioning
│          
├── dvc.yaml                                           <-- this is yaml file contains code to reproduce training pipeline
│          
├── params.yaml                                        <-- this yaml file contains the parameters and values used during model training
│          
├── requirements.txt                                   <-- text file which contain the dependencies/packages used in project 
│          
├── scores.json                                        <-- contains the score recorded after model evaluation
│          
├── setup.py                                           <-- python script used for building our project as a python packages
│          
└── template.py                                        <-- program used to create the project structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🫁 Lung Cancer Detection 🔍

Problem Statement

Solution Explaination

Approch for the problem

Steps

Workflows for bulding training pipeline components

Project UI

Integration of DVC

Integration of Mlflow and Dags

Deployment of streamlit application on AWS cloud

Project Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.dvc		.dvc
.github/workflows		.github/workflows
config		config
images		images
notebook		notebook
src/lung_cancer_classifier		src/lung_cancer_classifier
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
deployment_requirements.txt		deployment_requirements.txt
download_artifacts.py		download_artifacts.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
helpl.txt		helpl.txt
params.yaml		params.yaml
requirements.txt		requirements.txt
scores.json		scores.json
setup.py		setup.py
template.py		template.py

License

DarshanRokkad/Lung_Cancer_Detection

Folders and files

Latest commit

History

Repository files navigation

🫁 Lung Cancer Detection 🔍

Problem Statement

Solution Explaination

Approch for the problem

Steps

Workflows for bulding training pipeline components

Project UI

Integration of DVC

Integration of Mlflow and Dags

Deployment of streamlit application on AWS cloud

Project Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages