Skip to content
srldl edited this page Jan 5, 2022 · 80 revisions

Standards for data input

  1. Use "." as decimal separator for numbers.
  2. Use Svalbard placenames (check correct spelling) whenever it applies.
  3. Write in English unless Norwegian is specifically demanded.
  4. Use the following template for titles if possible: <species abbr 4 letters>, f.ex. "KS_COEI_MILKYS_2019_Kongsfjorden"
  5. For additional parameters, please check parameter list and reuse parameter names if possible.
  6. Use underscore ("_") to connect words in title and for additional parameters. Use small letters throughout (except placenames and title).
  7. Only use additional parameters if you think it is experimental data unlikely to be copied by others. If so, please check the parameter list and reuse parameter names whereever possible.
  8. Please create metadata for the dataset. You can do this here: https://data.npolar.no/dataset/ Under links, please include the link for the fieldwork, address similar to https://data.npolar.no/ecotox/fieldwork//edit f.ex. https://data.npolar.no/ecotox/fieldwork/3a7ce1ca-cae2-4e0e-a2ec-423e9017a752/edit
  9. Please include reference. Although this will have to be added later, it is informative to connect a dataset with publications.
  10. Units. The database does not have own columns for units - hence use the following units:

volume - liter latitude,longitude - decimal degrees weight - kg girth - cm? length - cm? tarsus - cm? bill - cm? bill height - cm? scull - cm? wing - cm? egg length - cm, mm? egg width - blubber thickness back - tusk volume - tusk length - tusk girth - caudal length -

How to fill in the ecotox database

  1. Find your list of fields variables. Go to https://data.npolar.no/ecotox and choose "template" in the upper left corner. Name the template:

    <your initials>_<species>_<project name or description>

    Use underscore to separate words f.ex. KS_GLGU_SEATRACK_2019_Kongsfjorden

    Create a new template based on the variables needed. Check off the parameters in the list of predefined variables first. Save the result and check that select alternatives for your selected fields are included. See image below.

    Go through previously created additional fields

    Also, for parameters placename, latitude and longitude it is possible to use autocomplete from Svalbard placenames if selected.

    At the bottom of the page is the additional variables others have chosen. Please go through the list and use the same variable names if you need them. If a variable name occur often in excel sheets, the technical staff should be contacted to include it in the list of predefined variables.

    Control select alternatives

    Finally add your own additional variables as needed.

  2. Register the dataset at https://data.npolar.no/dataset if not already described. Describe the variables here if not self explanatory.

  3. Mostly cruise-based fieldwork: Fill in expedition information (parent of dataset) in https://data.npolar.no/expedition to combine your data with other researcher's data taken at the same time and place.

How to get data back API version 1

Via the API, https://api.npolar.no (R language)

#First we fetch the files from the API:
#Receive as JSON file
> library(jsonlite)
> fieldwork_json = fromJSON("https://api-test.data.npolar.no/ecotox/fieldwork/?q=&filter-matrix=egg&format=json&limit=all")
> lab_json = fromJSON("https://api-test.data.npolar.no/lab/ecotox/?q=&filter-matrix=egg&fields=id,database_sample_id,analyte,analyte_value,fat_percentage,unit&format=json&limit=all")

#Go down some levels in the list structure
#Result is one long column only with all entries applied.
#Now we have two lists converted to data frames
> fieldwork_df = fieldwork_json$feed$entries
> lab_df = lab_json$feed$entries

#To view the table headers
> names(lab_df)
To view the whole table (note the big V in View!)
> View(fieldwork_df)

#If the result will contain a lot of rows, it is important to reduce the 
#number of columns as much as possible.
#Try to remove columns one by one:
> fieldwork_df$schema <- NULL
#or pick the columns to keep:
> keeps_field <- c("people_responsible","NPI_sample_id","reference", "placename","latitude","longitude","species","event_date","id","age","species_identification","comment","project_group","egg_width","length","weight","bill_height","station_name")
> fieldwork_write= fieldwork_df[ , keeps_field, drop = FALSE]

#Only three columns can be used to pivot the analytes. Usually use at least
#"analyte_value", "analyte","database_sample_id", but more can be added
#like "fat_percentage", "unit" etc.
> keeps_lab <- c("analyte_value", "analyte","database_sample_id","unit","id")
> lab_write = lab_df[ , keeps_lab, drop = FALSE]
#Pivot the analytes from the y-axis to the x-axis
> library(tidyr)
> lab_pivot = reshape(lab_write, direction = "wide", idvar = "database_sample_id", timevar = "analyte")

#Rename fieldwork_write column id to database_sample_id so R can merge besed on this column
> library(tidyverse)
> fieldwork_col_conv = fieldwork_write %>% rename (database_sample_id = id)

#To avoid spending too long time in merging, I found it wise to sort the two
#database_sample_id columns
> lab_merge = lab_pivot[order(lab_pivot$database_sample_id),]
> fieldwork_merge = fieldwork_col_conv[order(fieldwork_col_conv$database_sample_id),]

#Try to merge the two datasets unless the data frames are too big.
#If so it must either be split in two or do a manually merge.
> merged = merge(fieldwork_merge, lab_merge, all=TRUE) 

#Finally, create the excel sheet - file here is named "file.xslx"
#Note that with larger datasets your PC may run out of memory.
> library(xlsx)
> write.xlsx(merged, 'file.xlsx', sheetName = "Sheet1", col.names = TRUE, row.names = TRUE, append = FALSE)

#If you run out of memory, try to save as a csv file instead.
#Afterwards you can import it into Excel
> write.csv(merged,"file.txt", row.names = FALSE)

#The file can be found in your working directory, if uncertain where use
> getwd()

Via maps (mapview)

Looking for a map presentation or want to look at the original Excel files? Try mapview: https://data.npolar.no/mapview/

How to get data back API version 2

Via R lang and Excel merge

# Download two databases - fieldwork with info about fieldtrips and registered measurements,
# lab with ecotox lab results. They are too large to be merged with R -it has to be done afterwards
# in Excel.

# Fetch libraries, if you don't have them they can be imstalled with command "install.packages('jsonlite')" etc.
library(jsonlite)
library(openxlsx)
library(dplyr)

fieldwork_json = fromJSON("https://v2-api.npolar.no/biology/fielddata/?page=..&includeData=true")
lab_json = fromJSON("https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/?page=..&includeData=true")

# Traverse JSON hierarchy
fieldwork_df = fieldwork_json$items$data
lab_df = lab_json$items$data

# Control by using View and see columns by using names
View(fieldwork_df)
head(fieldwork_df, 5)
names(fieldwork_df)
field_df_flat = flatten(fieldwork_df, recursive = TRUE)
lab_df_flat = flatten(lab_df, recursive = TRUE)

# The schemas both use Darwin core, hence some columns from lab database needs to be renamed before merge.
# I suggest renaming rather than delete for control. When you have confirmed that they are indeed duplicates
# to fieldwork_df.eventID, scientificName and rightsholder they can be removed.
lab_df_flat <- lab_df_flat %>% rename(eventIDLab = eventID)
lab_df_flat <- lab_df_flat %>% rename(scientificNameLab = scientificName)
lab_df_flat <- lab_df_flat %>% rename(rightsholderLab = rightsholder)
lab_df_flat <- lab_df_flat %>% rename(fieldNumberLab = fieldNumber)
lab_df_flat <- lab_df_flat %>% rename(dynamicProperties.matrixLab = dynamicProperties.matrix)
lab_df_flat <- lab_df_flat %>% rename(dynamicProperties.responsibleLab = dynamicProperties.responsible)

# Save as a excel file.
write.xlsx(field_df_flat, file = "Field.xlsx")
write.xlsx(lab_df_flat, file = "Lab.xlsx")

# Save as a csv file.
write.csv(field_df_flat, "field.txt", row.names = FALSE)
write.csv(lab_df_flat, "lab.txt", row.names = FALSE)

# The file can be found in your working directory, if uncertain where use
getwd()

# To merge the two import both csv files into Excel and merge the two sheets by linking fieldwork.eventID with lab.eventIDLab.
# Aka normally several lab.eventIDLab can be linked with one fieldwork.eventID.
# Since the files are so big, it can probably only be done by downloading a smaller subset.

# How to check you have received all rows? Try using the curl, see https://curl.se/ for download.
# curl -s 'https://v2-api.npolar.no/biology/fielddata/?type=feed&page=..' | wc -l
# curl -s 'https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/?type=feed&page=..' | wc -l 

Via the API v2, https://v2-api.npolar.no

This data is behind a login - this means that in order to get the data you need to use the same login as for NPDC aka https://data.npolar.no. You also need access to download ecotox data. Ask if you likely do not have these rights.

For ecotox there are two databases:

The field data: https://v2-api.npolar.no/biology/fielddata/

The lab ecotox results: https://v2-api.npolar.no/biology/fielddata/_all_/ecotox

1) Download a json with curl

Download the program Curl to get your data and store it on a file (called ecotoxdata in the example) in your current directory:

curl GET "https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=.."  -u siri.uldal@npolar.no  > fieldworkdata

curl GET "https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=.."  -u your.email.address@npolar.no  > ecotoxdata

Remember to substitute your.email.address@npolar.no with your npolar.no email address.

2) Import into R and create an excel file

How to get the search parameters

In order to get the data you will need the https link from the fieldwork database and the ecotox lab database along with the search you are looking for. Below are some example on how to construct these links.

Ecotox fieldwork database, search for glaucous gulls and matrix plasma:

https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=..&type=feed

https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&page=..

Get all data for project MOSJ (only the field database has this the parameter projectName) :

https://v2-api.npolar.no/biology/fielddata/_search?and=dynamicProperties.projectName:MOSJ&verbose=true&page=..

&page=.. means all data, not just the first page.

&verbose=true means include all metadata as well.

&type=feed means download as nd-json.

Get all search options (very technical):

https://v2-api.npolar.no/colony/seabird/_all_/observation/_doc

Seabird observations database is an open database (no login required) so in contrast to ecotox/fieldwork you can experiment with how to create valid searches in the browser url request. F.ex. search for colony Zukovskijfjella, counting from boat:

https://v2-api.npolar.no/colony/seabird/_all_/observation/_search?and=dynamicProperties.colonyName:Zukovskijfjella&and=dynamicProperties.platform:boat

Seabird observations database, search for counting from boat and only for the species black guillemot:

https://v2-api.npolar.no/colony/seabird/_all_/observation/_search?and=dynamicProperties.platform:boat&and=scientificName:Cepphus+grylle

The JSON schemas for fieldwork,ecotox and seabird observation databases (technical) lists all available varibles:

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/biology/fielddata/fielddata.v1.0.5.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/biology/fielddata/ecotox.v1.0.5.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/colony/seabird/colony.v0.1.0.json

https://gitlab.npolar.no/eds/other/v2-jsonschema/-/blob/development/colony/seabird/colony_observation.v0.1.0.json

Alternatively, find the parameters directly from the databases, at http://hanna.npolar.no:5601

Please note that all our biology databases adhere to the Darwin Core standard.

Importing a CSV file into Excel

Finally, when you have the CSV extracted from R, conversion to Excel is simple: See https://github.com/npolar/npdc-sighting/wiki/Import-from-CSV-to-Excel