Skip to content
Mads Hansen edited this page Jan 22, 2023 · 13 revisions

Corb is a very useful tool for transforming a large number of documents in your MarkLogic database. And since corb depends on a couple jars and some configuration parameters that ml-gradle most likely already knows about, the task CorbTask was created to make it easy to run corb. Below is a basic approach for setting up your Gradle file to use CorbTask.

Add a "corb" configuration and a couple dependencies for it:

configurations { corb }
dependencies {
  corb "com.marklogic:marklogic-corb:2.4.0"
  // optional
  //corb 'org.jasypt:jasypt:1.9.2' // would be necessary to leverage JasyptDecrypter
}

You may not need a separate configuration - for example, if you're using the Gradle java plugin, and you already have XCC on your compile/runtime classpath, you could skip the "corb" configuration and just add a runtime 'com.marklogic:marklogic-corb:2.4.0' dependency.

Next, we need a "uris" module and a "transform" module for Corb to invoke. In my experience, it's common to name these files with the same prefix + "-uris.xqy" and "-transform.xqy". For this reason, CorbTask has a "modulePrefix" property that you can use. If you set this, CorbTask will assume you're following this convention. If you're not following that convention, you can instead set the "urisModule" and "transformModule" properties.

We also need to tell Corb where these modules are. One convention is to use "/ext/(app name)/corb/" as a path, but you're free to put them anywhere in your modules database that you'd like.

So with this in mind, let's look at a basic configuration of CorbTask:

task runCorb(type: com.marklogic.gradle.task.CorbTask) {
  moduleRoot = "/ext/sample/corb/"
  modulePrefix = "test"
  threadCount = 16
}

CorbTask will determine the XCC URL based on the host/XDBC port/username/password properties set in the mlAppConfig property that ml-gradle maintains. CorbTask also sets the modules database for you based on the name of the modules database as defined in mlAppConfig. It then tells Corb to run /ext/sample/corb/test-uris.xqy as the URIs module and /ext/sample/corb/test-transform.xqy as the transform module. CorbTask also sets a default of 8 for the thread count; in the example above, we override it to 16.

In a typical development cycle, you'll probably make tweaks to your URIs/transform modules. In that case, you'll want to load your modules before running CorbTask. One way to do that is via mlLoadModules:

task runCorb(type: com.marklogic.gradle.task.CorbTask, dependsOn: ["mlLoadModules"] {

Additionally, if your modules do not have any dependencies on other modules, you can invoke them in "ADHOC" mode to have Corb execute them from the filesystem without having to load them in the modules database.

If you find yourself writing a lot of Corb tasks in your Gradle file, you may end up repeating the same classpath, moduleRoot, and threadCount. In that case, you may want to make a helper class as shown below:

class MyCorbTask extends com.marklogic.gradle.task.CorbTask {
  @TaskAction
  @Override
  public void exec() {
    moduleRoot = "/ext/sample/corb/"
    threadCount = 16
    exec()
  }
}

And then your task definition becomes even simpler:

task runCorb(type: MyCorbTask, dependsOn: ["mlLoadModules"]) {
  modulePrefix = "test"
}

Finally, you may want a Corb task that allows you to specify the modules to run on the command line. This would be useful for ad hoc Corb jobs that you only need to run once. In my experience, it's more common to have Corb jobs that are run periodically, like as part of setting up an application, and thus you want them to be first-class tasks in your Gradle file. But you can easily use CorbTask to add such an ad hoc capability to your project:

ext {
  corbPrefix = "CHANGEME"
} 

task runCorb(type: MyCorbTask, dependsOn: ["mlLoadModules"]) {
  modulePrefix = corbPrefix
}

And you can then run that task as:

gradle runCorb -PcorbPrefix=test

The CorbTask provides several ways to set/override Corb options.

  1. Task member variables: All of the Corb options have corresponding task member variables, with a lowerCamelCase naming convention. For instance, the Corb option OPTIONS-FILE can be set on the CorbTask as optionsFile="/var/tmp/myOptions.properties".

  2. Project properties: All of the Corb options can be set as project properties, with the naming convention of a "corb" prefix and UpperCamelCase naming convention. For instance, the Corb option OPTIONS-FILE can be set with a project property corbOptionsFile or on the commandline with -PcorbOptionsFile=/var/tmp/myOptions.properties.

  3. Java System properties: All of the Corb options can be set as Java System properties, using the Corb option name. Those properties will be set when executing Corb -DOPTIONS-FILE=/var/tmp/myOptions.properties.

Clone this wiki locally