Skip to content

Post hackathon summary

Kristin Vanderbilt edited this page Jun 20, 2019 · 4 revisions

John's notes

The Environmental Data Initiative (EDI) hosted a hackathon at the University of New Mexico, Albuquerque, NM, June 11-13, 2019 to collaboratively design and develop software supporting data visualization. Ten participants, including LTER information managers, graduate students, and post-docs, enthusiastically joined this effort.

The hackathon began with personal introductions and scoping of the project. The anticipated user of the new software was identified as a researcher who has found a dataset of possible interest and needs additional information to assess its “fitness for use”. Metadata supplied with the dataset may answer some questions, but basic statistics about each variable, along with plots of the data, will reveal missing data, number of values in categorical variables, and relationships between variables among other things. The group decided that the software would support datasets with metadata from data repositories in the DataONE network as well as datasets on the researcher’s personal computer that may not have metadata. Given the expertise of the coders present, the software was designed to be an R Shiny app from which a user can download a static report with statistics and plots of each variable in the dataset and/or interactively explore graphs of the data.

Participants worked together designing workflows and defining the list of supporting functions needed for the project. Two groups then formed to write code, one to work on the Shiny app and one to work on the static report. The Shiny group forked the ggplotgui library and added additional plotting functionality. Data and metadata ingestion used the NCEAS metajam package. The static report group wrote functions to produce statistical summaries and histograms of each variable in a dataset, with output tailored to whether the variable is numeric or categorical. Another function detects the presence of spatial data (latitude and longitude) in a dataset and plots the frequency of data collection at sampling locations using a heat map. Another novel function detects the presence of time variables and provides clock visualizations of the frequency of data occurring at different time steps.

This visualization app is undergoing additional development and testing before being made publicly available in a beta-release form next month. The finalized product will be an R package which may be installed on a users local computer and possibly run from a server and integrated into a research group or sites web page.

Clone this wiki locally