Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] create deploy/local_batch_job.py #806

Open
JeroenVerstraelen opened this issue Jun 17, 2024 · 1 comment
Open

[EPIC] create deploy/local_batch_job.py #806

JeroenVerstraelen opened this issue Jun 17, 2024 · 1 comment
Assignees

Comments

@JeroenVerstraelen
Copy link
Contributor

JeroenVerstraelen commented Jun 17, 2024

  • investigate batch_job.py, can it run locally?
  • Remove local_batch_job.py and use batch_job.py

local_batch_job.py is a script that takes in a process graph as input and writes its output to local disk. It can be started as a local spark driver via spark-submit. It runs inside a docker container that provides e.g. gdal support.

@JeroenVerstraelen
Copy link
Contributor Author

JeroenVerstraelen commented Aug 20, 2024

  • test local_batch_job.py with a simple 1+2 process graph, see if output is written correctly to local job directory.

  • ensure all integrations are in-memory or not available (=no errors)

  • test local_batch_job.py with load_stac, see if output is written correctly

  • test complex process graph on local run

    • This has to work inside the docker container for local deployments
    • Add a test that runs a process graph using local_batch_job.py and checks if the output is valid.
  • Can it start via spark-submit (batch_job)

    • as a local driver with local executors (threads) saving files on local disk
    • as a local driver with remote executors (cluster) saving files on local disk
  • investigate logging for executors/driver on local run. Should be clear for the user

OPTIONAL

  • add helper script in geopyspark that allows user to submit python-client datacube as batch_job

    • Users will want to use the openeo-python-client API to construct a process graph
    • local_batch_job.py is a script that takes in a process graph + some extra parameters and writes its output to disk.
    • We should make it easy to write a process graph to a file so it can be passed in as a parameter
  • check UDP from url (locally)

  • does random_forest work when running locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants