Skip to content

Bring your own tool

Laia Codó edited this page Jun 11, 2021 · 11 revisions

Virtual Research Environments integrate tools and pipelines to enforce a research community. We offer the possibility to integrate your application in the analytical platform. Please, read though this documentation and contact us for any doubt or suggestion you may have.

Table of contents

Why?

open Virtual Research Environment offers a number of benefits for developers willing to integrate their tools in the platform:

  • Open access platform publicly available
  • A full web-based workbench with user support utilities, OIDC authentication service, and data handlers for local and remote datasets or repositories.
  • Visibility for your tool and organization, with ownership recognition, tailored web forms, help pages and customized viewers.
  • The possibility to add extra value to your tool by complementing it with other related tools already in the platform.
  • A complete control of your tool through the administration panel, for monitoring, logging and managing your tool.

Requirements

The application or pipeline to be integrated should:

  • Be free and open source code
  • Run in non-interactively mode in linux-based operating system:
    • Ubuntu 18.04
    • Ubuntu 20.04
    • CentOS 8 Stream
    • consult with us for others O.S

How it works?

There are some steps to follow for achieving the integration of a your application as a new VRE tool. As a result, the VRE is able to control the whole tool execution cycle:

  1. Prepare the run web form where the user will specify arguments and inputs files for each run.
  2. Validate input files and arguments requirements
  3. Stage in input files to the run working directory (if required)
  4. Execute the tool in the cloud in a scalable manner
  5. Monitor and interactively report tool progress during the execution
  6. Stage out output files from the run working directory (if required)
  7. Register at the VRE the resulting output files to display them at the VRE

How to?

Essentially, VRE will need twoTool developers should follow four basic steps that provides MuGVRE with the sufficient information to accomplish the whole tool execution cycle, explained in detail at Bring your tool → Integration of tools . Once these steps are completed, MuGVRE will be able to:

Step 1 - Define your tool's input and output files. The information is used to build job execution files for testing the RUNNER.

Step 2 - Prepare a new VRE RUNNER wrapping your application.

Step 3 - Define the metadata of your tool into a tool-specification file

Step 4- Submit the RUNNER code and the tool-specification file to VRE administrators, who will install and register the new tool.

Step 5- Test and debug the new tool from the VRE user interface

Step 6- (optional) Prepare a web page to display a summary report on each execution

Step 7- Provide documentation for the new tool

Step 1 -- Define input files, arguments and output files of the new tool. Use them to build job execution files for testing the RUNNER

VRE job execution files are 2 JSON files. In production, these will be generated by the VRE server on each execution initiated by the user at the web interface. This data will be consumed by your tool RUNNER.

  • Run configuration file (i.e. config.json): contains the list of input file selected by the user for a particular run, the values of the arguments, and the list of expected output files.

  • Input files metadata: (in_metadata.json): contains the metadata of the input files listed in config.json, including the absolute file path.

Additionally, is handy to have a shell script with the command line of the RUNNER. The 2 previous files are passed in as arguments (test.sh).

Defining which are the input files and arguments that your tool will consume is essential to build these test data. There are 2 ways of creating it:

1.a - Manual approach:

Manually generate the 2 files following the corresponding JSON schema and taking as reference some examples

  • Examples:

      Test data of the RUNNER template : https://github.com/inab/vre_template_tool/tree/master/tests/basic
    
      Test data of the dpfrep RUNNER (example of a R-based tool):  https://github.com/inab/vre_dpfrep_executor/tree/master/tests/basic
    
  • Schemas:

      euCanSHare tool schemas: https://github.com/euCanSHare/vre/tree/master/tool_schemas/tool_specification
    

1.b - VRE web interface approach:

Through web forms that (1) allows the edition and validation of a JSON document gathering data about the input files and arguments, and (2) asks data related to your local development environment. The result is a downloadable config.json and in_metadata.json with the corresponding shell script ready to be locally executed (in step 2).

  • Where? https://vre.eucanshare.bsc.es/ -> Admin -> My Tools -> Development -> (+) Add new tool

  • Requirements: VRE user account with "tool developer" rights

Step 2 -- Prepare a new VRE RUNNER wrapping your application

VRE RUNNERs are pieces of code that work as adaptors between the VRE server and each of the integrated applications or pipelines. The RUNNER (1) consumes the VRE job execution files generated when a user submits a new job at the web interface, (2) runs the wrapped application, and (3) generates the list of output files, data that the VRE server will eventually register and display at users' workspace.

For preparing the RUNNER, the easiest option is to take as reference the RUNNER template repository and adapt some of its methods. The template includes a couple of python classes that parse VRE job execution files into python dictionaries. These are passed to the run method of the VRE_Tool class, function that you can customize at your convenience to call the application, module or pipeline to be integrated.

2.a - Fork or clone the repository of the RUNNER template in your local development environment.

How to: https://github.com/inab/vre_template_tool

2.b - (optional) Run the hello_word example. The RUNNER template is initially configured to "wrap" an application called hello.py. It demonstrates the overall flow of a VRE RUNNER.

How to: https://github.com/inab/vre_template_tool#run-the-wrapper

2.c - Pass the job execution files generated in step 1 as parameters of the VRE_RUNNER. These should contain the input files and arguments for a test execution of your tool. You can copy them into the test/ folder of the repository to replace the basic hello_word example.

Make sure that the absolute path of the working directory and the input files defined in these JSON files are accessible.

2.d - Implement the run method of the VRE_Tool so that the function executes the application, module or pipeline to be integrated. The input file locations and argument values as defined in the job execution files are going to be content of parameters received in the run method.

2.e - The RUNNER will be ready when the wrapped application is properly executed and the output files are generated in the working directory, under the location specified in output_files[].file.path, defined either in config.json or in out_metadata.json.

Step 3 -- Define the metadata of your tool into a tool-specification file

Notes:

VRE_RUNNER should have executable permissions

Make sure that all output files are generated at the root of the working directory ( defined at the argument with key execution. Read from `test/config.jso