pre-proc

PRIMAVERA pre_proc system for fixing metadata and data issues in netCDF files (particularly CMIP6 data). Pre_proc works by:

fixes (a Python class) are written to fix small individual errors in a file's metadata or data. Abstract classes have been developed, which allow new fixes to be rapidly written, often only taking a line or two of new code. The fixes can make use of tools such as ncatted to run the fixes, or pure Python fixes could also be written. Unit tests have been written for all of the existing tests to provide assurance of their quality.
a list of datasets (data requests) that exist is added to pre_proc's database
rules are written that determine which fixes are added to groups of datasets
the run_pre_proc.sh script is run to fix the data. The script takes a single argument, the path to a directory of netCDF files. The script identifies which (if any) fixes need to be applied to each file and applies the fixes in alphabetical order of class name.

To set up the system:

mkdir db
touch db/pre-proc_db.sqlite3
export DJANGO_SETTINGS_MODULE=pre_proc_site.settings
export PYTHONPATH=.
export DATABASE_DIR=/home/users/jseddon/primavera/pre-proc/db
python manage.py migrate
(If using the DMT) in the DMT code: ./scripts/make_esgf_json.py -l debug <filename.json> to get the data requests that files have been received for.
./bin/make_db_from_json.py -l debug <filename.json> to add the data requests. example_data_requests.json shows an example of this file.
./bin/add_file_fixes_to_db.py -l debug to add the file fixes to the database.
Run all of the fix_request scripts, e.g. ./bin/fix_requests/fix_request_0001.py -l debug to set up the file fixes required for each data request.

If additional data requests are added then all of the fix_request scripts will need to be run again.

It should now be possible to run the main processing script:

./bin/run_pre_proc.sh <data_dir>

A Rose suite has been developed to provide optional control and monitoring of pre_proc. u-av973 is the suite's id.

To add new data requests to the Rose suite:

In the DMT code: ./scripts/make_esgf_json.py -l debug <filename.json> to get the data requests that files have been received for.
./bin/make_db_from_json.py -l debug <filename.json> to add any new data requests to the pre-proc database.
In the DMT code edit and then run: ./scripts/make_rose_task_names.py -l debug <~/temp/rose_task_names.json>
Make the Rose suite aware of these new tasks: rose suite-run --reload --no-gcontrol
In the Rose suite (but under Python 3): bin/add_new_tasks.py -l debug <rose_task_names_new.json>

pre-proc uses the Django framework for database access. The database is an sqlite database that allows fixes to be mapped to data requests. To run the tests: python manage.py test.

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
HighResMIP-fix @ ae8b06c		HighResMIP-fix @ ae8b06c
bin		bin
cmor-fixer @ a3b72f4		cmor-fixer @ a3b72f4
pre_proc		pre_proc
pre_proc_app		pre_proc_app
pre_proc_site		pre_proc_site
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
example_data_requests.json		example_data_requests.json
manage.py		manage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pre-proc

About

Releases 1

Packages

Languages

License

PRIMAVERA-H2020/pre-proc

Folders and files

Latest commit

History

Repository files navigation

pre-proc

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages