Clone this repository.
git clone git@github.com:OpenGeoscience/geoingest.git
cd geoingest
Copy the config file.
cp aws-credentials.mk.example aws-credentials.mk
Edit the information in the aws-credentials.mk. That file will be ignored when you push.
Since we have an assembly on s3 we can skip the assembly part.
Get Geotrellis as a submodule
git submodule init
git submodule update
cd geotrellis
./sbt "project spark-etl" assembly
Once this succeeds your assembly will be
geoingest/geotrellis/spark-etl/target/scala-2.11/geotrellis-spark-etl-assembly-1.1.0-SNAPSHOT.jar
If you don't have this assembly on s3 you can push it by doing
aws s3 cp geoingest/geotrellis/spark-etl/target/scala-2.11/geotrellis-spark-etl-assembly-1.1.0-SNAPSHOT.jar s3://my-bucket/
Geotrellis ingest requires 3 json files for specifying the ingest job. There is a very very small utility that is written for that purpose.
mkvirtualenv geoingest
pip install -r requirements.txt
pip install -e .
ingest s3://locationOfS3BucketWithTiffs layerName s3://locationOfCatalog
This will create json specifications in the current directory for the ingest job.
Now we need to push those json files to s3 so that geotrellis can read it.
make copy-json-specs
At this point it is very easy to launch our cluster.
make create-cluster
This will take roughly 10 minutes. After that we can ingest our layers.
To submit the ingest job just do:
make submit-remote-ingest