Skip to content
Craig Jones edited this page Feb 21, 2015 · 1 revision

CF Conventions

The NetCDF CF conventions identify 8 different feature types, each one a way of describing data. IMOS NetCDF Data can be mapped to these feature types as previously identified at NetCDF Subset Service - IMOS Data Types

NODC Templates

The NOAA National Oceanographic Data Centre has produced sample templates following these conventions for each different type of dataset. Refer NODC feature Types and Templates. These templates provide a good example of the required format for output NetCDF files with regard to required CF metadata to describe the type of data contained. Not all of them are applicable to IMOS data. The following lists the NODC CDL template types that are relevant for generating multiple datasets of IMOS data in a single NetCDF file:

CDL Type Examples Notes
profile incomplete argo profiles orthogonal template for one profile only
time series incomplete temperature logger orthogonal template for one time series only
trajectory incomplete soop trv, argo trajectory
time series profile incomplete ADCP
trajectory profile incomplete gliders

IMOS NetCDF Conventions

IMOS NetCDF files are required to follow IMOS NetCDF conventions which are based on the NetCDF CF 1.6 conventions. Additional IMOS specific metadata is added to these conventions.

NetCDF conventions used to identify each different feature type are optional and have not been always used.

Following these conventions enables NetCDF clients/servers to provide additional functionality, such as subsetting, so adopting them is highly recommended.

IMOS templates for each feature type still need to be determined and will probably be inspired by the NODC templates tailored for IMOS specific needs.

The IMOS toolbox v2.4 is already providing a first attempt in this regard for profile and timeseries data. It has been inspired by the NODC orthogonal templates since it only has to generate a NetCDF file per dataset.

File Generation Requirements

Looking at the NODC templates, IMOS NetCDF Conventions and existing IMOS data, it is expected that the following information will be required to generate rich NetCDF files for each of the different NetCDF feature types.

Global attributes

Each NetCDF file generated requires metadata for the data it contains. This metadata is provided in the form of NetCDF global attributes and the required and recommended attributes are/will be prescribed by CF and IMOS conventions. Global attributes include information that describe the format of and conventions used in the file, how the file was generated as well as information about the data in the file and how it was collected.

From a file generation perspective, the generator will need to source the following information for each global attribute to be included in a particular file

Field Description
Name Name of the global attribute to include
Value Value of the global attribute to include
Type Type of the global attribute e.g. String, Integer, Double

It is expected that these attributes will include:

  • common attributes included in all IMOS generated files,
  • collection specific attributes such as the IMOS facility for which the collection applies
  • generator invocation specific attributes such as creation date, generator version, subsetting parameters, spatial and temporal extent of the subset,
  • template specific attributes such as the cdm_data_type and CF feature_type,
  • feature type instance attributes (where an individual file is being generated per feature type instance - see discussion on generating single files per subset versus single file per feature type instance below)

Coordinate Variables

A variable with the same name as a dimension is called a coordinate variable. It typically defines a physical coordinate corresponding to that dimension (eg. latitude, longitude, time, depth).

The coordinate variables used, their shape and the shape of variables recording other measurements form the basis of the classification of datasets into different feature types. For example a trajectory dataset is defined as "a series of data points along a path through space with monotonically increasing times". This translates into a specific set of requirements in terms of the coordinate variables used and the way in which measurements relate to them. In this example, latitude, longitude, depth (if applicable) and each measured property for a particular trajectory are dependent on time.

Each IMOS collection will be associated with a particular CF feature type and a feature type template or templates that can be used to generate NetCDF output, this will determine the coordinate variables used and how variables recording observations relate to them.

Note that vertical dimensions may be optional and can be recorded in different ways (e.g. depth, altitude, level). It is expected that the handling of vertical dimensions may also need to be configured on a collection by collection basis.

Variables

Each collection records various measurements related to spatial and temporal properties as described above. The properties measured, their types and values are specific to each collection. This information is currently recorded in collection specific tables used to support download via WFS. The variables measured, their types and values can be sourced from these tables, although it may be desirable to limit the netcdf files generated for specific feature type instances to the variables that apply to that instance. This information would need to be provided on a collection by collection basis.

Variable attributes

Variable attributes record metadata about each variable included in the NetCDF file in the same way that global attributes provide metadata about the contents of the file. For example, variable attributes normally include the standard name for the variable measured and the units of measure. Variable attributes are required by end users and software to understand what the variable measures and how it can be used. Required and recommended attributes are prescribed by CF and IMOS conventions.

Similar to global attributes, the information required for NetCDF file generation for a particular variable in a file is:

Field Description
Name Name of the attribute to include
Value Value of the attribute to include
Type Type of the attribute e.g. String, Integer, Double

TBC This information may be generic across data collections for a particular variable or specific to the collection or feature type instance included (nominal depth?)???

NetCDF File name

It is expected that the NetCDF file name would be generated from a combination of collection metadata, feature instance metadata (if generating individual files per feature instance) and subset parameters.

Filtering

The IMOS NetCDF subset service needs to support the same filtering available for the wfs download if we are to provide a subsetted NetCDF output option for non-gridded data in the portal

Multiple/Single Files

Its possible to create a single NetCDF file containing all data to be returned or to create a single file per feature type instance (i.e multiple files).

When creating a single NetCDF containing multiple feature type instances, a feature type instance coordinate variable is added and the feature type instance dimension is added to all variables. Feature type instance metadata is then added to the NetCDF file using feature type instance metadata variables that vary based on the feature type instance. This is in contrast to files containing a single feature type instance where this metadata can be added as global attributes.

There is no reason why both ways of generating NetCDF files can't be supported, but in the first instance we should focus on the format required for MARVL (Need to check what this is/if that can be our preferred format).

Proposed Design

Inputs

Generator configuration/metadata

Collection Configuration

Global attributes

Variables

Variable Attributes

Data

Subset request

Processing

Generator

Output

NetCDF file or Zipped NetCDF Files

Inputs

Generator configuration/metadata

This includes metadata about or information required to configure the generator itself :

  • Generator version
  • Database connection details (may be sourced from geoserver via feature type info if Geoserver based)
  • Metadata schema - schema where global attribute, variable and variable attribute information may be sourced

Global Attributes

select name, value, type 
  from netcdf.global_attribute
 where collection = 'anmn_ts' 
   and feature_id = 10

Options for MARVL Development

Implementation Notes

Ordering

Ordering of data in netcdf files is important - we've removed ordering for some collections for the csv download as it conflicted with efficient use of indexes when filtering data (e.g. anfog_dm) - for generation of netcdf files ordering will be required

Handling Different Feature Types

Feature type specific processing

Each feature type requires different handling in terms of the handling of spatial variables and other measured variables. Example information requirements (using possible sql) for generating files for each feature type is shown below.

Profile
Profile information
select time, time_qc, latitude, latitude_qc, longitude, longitude_qc
  from argo.profile
 where id = 10
Measurements
select variable_z_variable, variable_z_variable_qc, variable_1, variable_1_qc, variable_2, variable_2_qc, variable_3, variable_3_qc ...
  from argo.measurement
 where profile_id = 10
 order by variable_z_variable

Note: z_variable needs to be configured for each collection (db or otherwise)

Time Series
Timeseries information
select latitude, latitude_qc, longitude, longitude_qc, depth, depth_qc
 from anmn_ts.timeseries
 where id = 10

Note: depth here should be a nominal depth. The actual depth varying in time cannot be a dimension.

Measurements
select time, time_qc, variable_1, variable_1_qc, variable_2, variable_2_qc, variable_3, variable_3_qc
  from anmn_ts.measurement
 where timeseries_id = 10
 order by time
Trajectory
Trajectory Measurements
select latitude, latitude_qc, longitude, longitude_qc, time, time_qc, variable_z_variable, variable_z_variable_qc, variable_1, variable_1_qc, variable_2, variable_2_qc, variable_3, variable_3_qc ...
  from soop_trv.measurement
 where trajectory_id = 10
 order by time

Note: z_variable needs to be configured for each collection (db or otherwise)

Trajectory Profile
Trajectory information
select id, latitude, latitude_qc, longitude, longitude_qc, time, time_qc
  from anfog_dm.measurement_netcdf
 where trajectory_id = 10
 order by time
Profile Measurements
select variable_z_variable, variable_z_variable_qc, variable_1, variable_1_qc, variable_2, variable_2_qc, variable_3, variable_3_qc ...
  from anfog_dm.measurement_netcdf
 where trajectory_id = 10
   and profile_id = 20
 order by variable_z_variable
Timeseries Profile
Timeseries information
select id, time, time_qc
  from adcp.timeseries
 where timeseries_id = 10
Profile measurements
select variable_z_variable, variable_z_variable_qc, variable_1, variable_1_qc, variable_2, variable_2_qc, variable_3, variable_3_qc ...
  from anfog_dm.measurement
 where timeseries_id = 10
   and profile_id = 20
 order by variable_z_variable

Global Attributes

For example, using the a potential sql statement to extract data related attributes from the database:

select name, value, type 
  from netcdf.global_attribute
 where collection = 'anmn_ts' 
   and feature_id = 10

would return:

name value type
Institution IMOS String

for the timeseries with id 10 in the collection anmn_ts. feature_id here is used to refer to the unique identifier for a particular feature type instance. This would be the unique identifier for a particular profile, trajectory or timeseries as applicable.

Subsetting information may also need to be added as global attributes by the generator in order to reflect what was the subset the user asked for via the portal.

Note: 'netcdf.global_attribute' is an example of a potential view or table that could be used to source the required metadata.

Variables

We need to be able to source the variables to be included in the netcdf file.

For example:

select name, type
  from netcdf.variables
 where collection = 'anmn_ts' 
   and feature_id = 10

Note: 'netcdf.variables' is a logical view of required metadata for NetCDF generation purposes and doesn't necessarily relate directly to existing tables in the database.

Variable attributes

Similar to global attributes above except we need to be able to source attributes for a particular feature instance variable:

select name, value, type 
  from netcdf.variable_attribute
 where collection = 'anmn_ts' 
   and feature_id = 10
   and variable_name = 'TEMP'

Note: 'netcdf.variables' is a logical view of required metadata for NetCDF generation purposes and doesn't necessarily relate directly to existing tables in the database.

Would generate variable attributes if they are all the same for a variable across all datasets involved. Otherwise (different unit for same variable) would create measured parameter metadata variables.

Configuration

Each downloadable collection will need some configuration e.g.

  • Feature type (this could be sourced from global_attributes)
  • Schema
  • Collection metadata
  • Sql to get logical views above
  • Non-standard variable names/missing coordinate variables (could be sourced from variables source e.g. add mapping/missing columns)
  • Non-default table names (perhaps rename or use views to make them consistent instead) This configuration could be sourced from the database or config files. Required metadata will need to the database or by configuration

Continuous ragged / Indexed ragged formats

Continuous ragged and indexed ragged are space saving formats used in some of our source netcdf files. While these formats are supported by CF and are used in source files, their use in output formats would complicate processing and will not be considered here.

CSV Metadata

Example - anmn_ts

Tables

| timeseries | mapping| measurement | mapping | | ---------- | ---- | ----------- | | id | timeSeries variable| ts_id | timeSeries variable | file_id || index | index of measurement in original file - regenerate this| | site_code | global attribute | TIME | coordinate variable | platform_code | global attribute | TIME_quality_control | coordinate variable qc | deployment_code | global attribute | DEPTH | variable | LATITUDE | coordinate variable | DEPTH_quality_control | variable qc | LATITUDE_quality_control | coordinate variable qc| TEMP | variable | LONGITUDE | coordinate variable | TEMP_quality_control | variable qc | LONGITUDE_quality_control | coordinate variable qc | CNDC | variable | geom | n/a | CNDC_quality_control | variable qc | instrument_nominal_depth | coordinate variable ? | PSAL | variable | site_nominal_depth | global attribute ? | PSAL_quality_control | variable qc | site_depth_at_deployment | global atribute ? | PRES | variable | instrument | global attribute / instrument variable ? | PRES_quality_control | variable qc | instrument_serial_number | global attribute / instrument variable? | PRES_REL | variable | time_coverage_start | global attribute | PRES_REL_quality_control | variable qc | time_coverage_end | global attribute | | time_deployment_start | global attribute | time_deployment_end | global attribute | comment | global attribute | history | global attribute | toolbox_version | n/a? | depth_b | DEPTH applicable | sea_water_temperature_b | TEMP applicable | sea_water_electrical_conductivity_b | CNDC applicable | sea_water_salinity_b | PSAL applicable | sea_water_pressure_b | PRES applicable | sea_water_pressure_due_to_sea_water_b | PRES_REL applicable

Notes:

  • current tables mirror wms layer/wfs download
  • some variables not applicable for particular time series
  • mapping is to a file containing a single time series - refer to other possibilities below
  • doesn't include many required and desirable global/variable attributes specified by CF/IMOS conventions