Skip to content

Code to compute the optimal request time for applications submitted for execution using HPC batch schedulers

License

Notifications You must be signed in to change notification settings

anagainaru/iSBatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

iSBatch (Batch Scheduler Interface)

iSBatch is a python package that generates resource requests for an application to be used when submitting it on an HPC cluster. Currently node hours is the only supported resource.

iSBatch requires past execution times for an application (it is recommanded to have at least 10 runs) as well as basic informaion on the HPC cluster and generates a sequence of requests to be used when submitting on the cluster (if the first request is an under-estimation of the real execution time, the second request should be used and so on)

Table of contents

Theory

iSBach is computing the optimal request time for a stochastic application based on historic information about previous runs. The following theorem is used to compute the sequence of requests:

Optimal sequence

Different alpha, beta, gamma values can be chosen to model large-scale systems. We use:

  • alpha=1, beta=0, gamma=0 to model the Cloud cost model (pay for what you reserve)
  • alpha=1, beta>=1, gamma=0 for the HPC cost model (pay for what you reserve - wait time in the queue - plus for what you use - node hours).

The HPC model is chosen by detault.

Usage

To use this code for generating walltime requests, include:

  • import iSBatch

Simple examples on how to use the modules in the library are shown in the examples folder. Running this file on an input log (like the ones provided as examples in the examples/logs folder) will return the execution time you need to use for submission on an HPC system described by the default parameters (with or without checkpointing at the end of the reservation).

user:~/iSBatch$ cd examples
user:~/iSBatch$ python get_sequence.py log_examples/CT_eye_segmentation.log
Request sequence: [(80802.0,), (99019.0,)]

To create your own scripts use the following steps:

1. Prepare the data

Create a list with historical resource usage information (this list can contain values in any time unit you want the requests to be; note that isBatch does not work with fractions of time unit).

For our example, history will be a list of walltimes for past runs. Create a ResourceEstimator object with the historical data (and optionally the interpolation method to be used or the checkpointing strategy)

wf = iSBatch.ResourceEstimator(history)

params = iSBatch.ResourceParameters()
params.interpolation_model = iSBatch.DistInterpolation,
params.CR_strategy=iSBatch.CRStrategy.AdaptiveCheckpoint
wf = iSBatch.ResourceEstimator(history, params=params)

If you wish to print the CDF of this data, the discrete data (e.g. unique walltimes) and the associated CDF values for each data can be extracted using the _get_cdf() function:

optimal_data, optimal_cdf = wf._get_cdf()

Example CDF Example discrete CDF and data (without using interpolation) - vertical blue lines represent the recommended request times

2. Compute the sequence of requests

The compute_request_sequence function returns the recommended sequence of requests given a historical data. Optionally, the function takes the cost model for the cluster (if none is provided the default HPC model is chosen). For more information about cost models, please inspect the documentation here

sequence = wf.compute_request_sequence()

For large historic datasets, computing the distribution using the discrete data will give good results. Otherwise, interpolation is needed.

Example sequence Example discrete vs interpolation CDF and sequences

A larger discussion about interpreting the CDF figures can be found in the documentation here

3. [Optional] Compute the cost of a sequence of requests for new data

Compute the cost of a given sequence on new data can be computed using the compute_sequence_cost function. The cost represents the average response time of each submission. This time represents all the failed reservation together with the sucessful one. For example, for two submissions one of 10 and another of 15 hours, the cost of the sequence [8, 11, 16] is the average between 8 + 10 (the first submission will fail when requesting 8hs and will succeed the second time) and 8 + 11 + 15 (the second submission fails twice).

cost = wf.compute_sequence_cost(sequence, new_data)

Papers

If you use the resources available here in your work, please cite one of our papers:

For details about how to compute the optimal sequence of requests, please consult our paper:
Reservation and Checkpointing Strategies for Stochastic Jobs
Ana Gainaru, Brice Goglin, Valentin Honoré, Guillaume Pallez, Padma Raghavan, Yves Robert, Hongyang Sun. [IPDPS 2020] (Paper: INRIA technical report)

For details about why interpolation is needed when the historic information is low read our paper:
Making Speculative Scheduling Robust to Incomplete Data
Ana Gainaru, Guillaume Pallez. [SCALA@SC 2019] (Paper: INRIA technical report)

For details on how to adapt the sequence of requests when backfilling is being used:
Speculative Scheduling Techniques for Stochastic HPC Applications
Ana Gainaru, Guillaume Pallez, Hongyang Sun, Padma Raghavan [ICPP 2019] (Paper: INRIA technical report)

About

Code to compute the optimal request time for applications submitted for execution using HPC batch schedulers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published