This is the code of the paper A. Schmidt, J. Silva-Rodríguez, R. Molina and V. Naranjo, "Efficient Cancer Classification by Coupling Semi Supervised and Multiple Instance Learning," in IEEE Access, vol. 10, pp. 9763-9773, 2022, doi: 10.1109/ACCESS.2022.3143345.
We try our best to make the code reusable and the experiments reproducible by giving a detailed instruction, description of dependencies, configurations and run commands:
To make this code run on your linux machine you need to:
- Install miniconda (or anaconda): https://docs.anaconda.com/anaconda/install/linux/
- Set up a conda environment and activate it:
conda env create --file environment.yaml
conda activate tensorlfow_2_3
- Download dataset, see dataset_dependent folders READMEs:
- ./dataset_dependent/camelyon16/README.md
- ./dataset_dependent/sicapv2/README.md
- Edit the configuration:
./config.yaml
for general settings./dataset_dependent/sicapv2/config.yaml
for dataset dependent settings
- Run the program:
python ./src/main.py
In the paper we show the following experiment results for patch-level Gleason grading (on SICAPv2) and WSI-level breast cancer classification (on Camelyon16):
To run the experiments, please follow this instructions:
- Follow the steps above to install dependencies and download the dataset
- Configure the path to the SICAPv2 dataset on your Pc:
- open ./dataset_dependent/sicapv2/config.yaml
- change the line
dir: path/to/dataset/
- Run the experiments
- Navigate into the base folder (cancer_classification)
- The subfolders
efficient_labeling
andcomplete_annotation
of./dataset_dependent/sicapv2/experiments/
contain the configurations of the experiments in Figure 2 of the paper. - To train the model with efficient labeling (EL) and P=5 use f.e.:
python src/main.py -dc ./dataset_dependent/sicapv2/experiments/efficient_labeling/P_5/config.yaml
- To test the model use the test configurations test_config.yaml, f.e.:
python src/main.py -dc ./dataset_dependent/sicapv2/experiments/efficient_labeling/P_5/test_config.yaml
- To see the output, see below description of logging
- Follow the steps above to install dependencies and download the dataset
- Preprocessing is necessary, see
dataset_dependent/camelyon16/dataset_scripts/README.md
- Preprocessing is necessary, see
- Configure the path to the preprocessed Camelyon16 dataset on your Pc:
- open ./dataset_dependent/camelyon16/config.yaml
- change both paths
dir: /path/to/cam16
anddata_split_dir: /path/to/cam16
to the path of thepreprocessed dataset
- Run the experiments with efficient labeling:
- Navigate into the base folder (cancer_classification)
- To train the model use the configurations of the subfolders of ./dataset_dependent/camelyon16/experiments/ f.e.:
python src/main.py -dc ./dataset_dependent/camelyon16/experiments/efficient_labeling/P_5/config.yaml
- To test the model use the test configurations test_config.yaml, f.e.:
python src/main.py -dc ./dataset_dependent/camelyon16/experiments/efficient_labeling/P_5/test_config.yaml
- To see the output, see below description of logging.
- Logging is done with mlflow (https://mlflow.org/docs/latest/tracking.html), it is already installed if you followed the installation guidlines above
- To see the experiment results, navigate into the base folder (cancer_classification). If you ran experiments, a mlruns folder should be present.
- Run
mlflow ui
- Open in your browser
localhost:5000
to see the results (training progress, metrics etc.) - The models of the experiments by default are stored in the experiment subfolders, f.e. ./dataset_dependent/sicapv2/experiments/efficient_labeling/P_5/models/