Fostering Open Science at WSL with the EnviDat Environmental Data Portal

EnviDat is the institutional research data portal of the Swiss Federal Institute for Forest, Snow and Landscape WSL. The portal is designed to provide solutions for efficient, unified and managed access to the WSL’s comprehensive reservoir of monitoring and research data, in accordance with the WSL data policy. Through EnviDat, WSL is fostering open science, making curated, quality-controlled, publication-ready research data accessible. Data producers can document author contributions for a particular data set through the EnviDat-DataCRediT taxonomy. The publication of research data sets can be complemented with additional digital resources, such as, e.g., supplementary documentation, processing software or detailed descriptions of code (i.e. as Jupyter Notebooks). The EnviDat Team is working towards generic solutions for enhancing open science, in line with WSL’s commitment to accessible research data.


INTRODUCTION
The Swiss Federal Institute for Forest, Snow and Landscape WSL, a research institute of the Swiss national network of federal institutes of technology and research institutions (ETH Domain), is developing the institutional environmental data portal EnviDat (Figure 1).EnviDat  Forest Inventory (NFI), ( 5) guidance to researchers regarding options for publishing curated data, (6) provision of versioning capabilities for datasets, and (7) contacts and exchange with the international RDM community.Furthermore, it is planned to add (8) solutions for integrated management, validation visualization and publication of streaming sensor data (such as meteorological measurements) and ( 9) solutions for visualization and management of GIS data.
WSL's goal to foster open science is supported by several of these services.In EnviDat, WSL scientists can register metadata about the data and provide access to the actual data.For the latter, they can upload the data to the EnviDat data repository or link it from an operational information management system or database.In September 2018, EnviDat achieved the publication of more than one hundred environmental datasets for Switzerland and beyond (Figure 2).As mentioned above, EnviDat is designed to provide unified and managed access to research data.
However, publishing data with appropriate metadata is not always enough to effectively foster open science.There is still the problem of sharing and reproducing the computations needed to process and visualize the data, comparable e.g. to the methodologies presented in research papers.
EnviDat already allows and recommends to WSL scientists to complement data publication with additional useful resources in digital form, such as supplementary documentations or processing software.In this context, Jupyter Notebooks emerged as a solution for documenting code in a wide range of programming languages2 often used by environmental researchers, such as Python, R, Octave, Scilab, Matlab, C, Java or Scala.

SHOWCASE AND RESULTS
To showcase the integration and usefulness of Jupyter Notebooks in a data portal, we present a geospatial analysis example for road density calculation.In ecology, the presence of roads is often used as an indicator for the degree of human disturbance in a region.The effect of roads on movement patterns and habitat selection for different species and on different scales has been analysed in numerous studies (Fahrig & Rytwinski, 2009).While roads may act as barriers to the movement of certain animal species (habitat fragmentation) and directly increase mortality by traffic, they can also facilitate movement or provide food for other species.In our example, we examine road densities in the neighbourhood of sample locations using Python.The data used (point sample locations and road geometries) can be retrieved from EnviDat, while the Python script is made available as a Jupyter notebook on EnviDat (Figure 3).
Jupyter notebooks are an open document format based on JSON that mix the code with descriptive narrative text, as well as rich output.Consequently, similar to the above Python Notebook example, WSL scientists and EnviDat data providers are now encouraged to share detailed descriptions of their code with the community.In the coming years, EnviDat aims for a deeper integration of user-uploaded Jupyter Notebooks.Small scale, proof-of-concept work has been started for opening Jupyter notebooks hosted in EnviDat in an executable environment based on BinderHub3 .Yet, the development of a generic solution that would offer WSL researchers the opportunity to access their code from anywhere on the existing EnviDat infrastructure is still underway.
is designed to provide unified and managed access to WSL's environmental monitoring and research data.WSL has a long tradition in data collection.WSL research datasets cover research themes ranging from forest ecosystems, snow and ice, landscape, biodiversity to natural hazards and include long-term monitoring datasets spanning over a century.Such datasets are particularly valuable for studying the terrestrial environment and for obtaining an integrated view of the Earth System.