GFÖ Pre-Conference Data Management Workshop (GFBio)
Introduction:Ecology is an inherently cross disciplinary science that has grown into a highly collaborative and data intensive science over the last decade. New data is created across all ecological relevant disciplines at an increasing rate driven by the development of new techniques e.g. with remote sensing using extensive sensor and satellite networks or in high throughput techniques for screening and gene sequencing. This opens many new ways for researchers to gain insights into the functioning of our ecosystems on a more fine granular temporal scale than ever before. While many of the data is very uniform in structure we also face highly heterogeneous data in ecology. This data is gathered in manual fashion by small projects with a specific focus and project design to allow answering very specific ecological research questions. Both of these types of data are valuable in the long term perspective and thus need to be preserved for future research. However, they have different requirements in terms of maintenance and preservation. Thus many data related tools and services arose over the last years to assist researchers throughout the whole life cycle of data from the data acquisition over the refinement and description of the data up to the publication of the data itself.
Figure 1: The data life cycle is a conceptual tool which helps to understand
the different steps that data follow from data collection up to the publication
and the creation of knowledge. Source: http://www.gfbio.org/
Furthermore the DFG recently funded the project called "The German Federation for Biological Data" (GFBio). The project plans to go in its second phase and works on tying together and complementing existing infrastructure, tools and services of national key players (e.g. Max-Plank Institutes, Museum of Natural Sciences) to provide access to environmentally related data and services under a consistent framework. The overall goal is to provide a sustainable, service oriented, national data infrastructure to facilitate the management and the sharing of data as well as to stimulate data intensive sciences answering environmentally relevant questions over large temporal and spatial scales.
Goals:The workshop will introduce the participants to the life cycle of data and give an overview about available tools and services that assist in each step along the cycle. The workshop includes practical and theoretical parts. We get in touch with tools and services which are adopted by the GFBio project are as well as with already existing international tools and services of interest in terms of data management. The list below is exemplary and might change slightly towards finalizing the workshop:
- Diversity Workbench Mobile (Data gathering)
- Right field, Open-Refine (Data gathering, Refinement)
- EML, DWC, Morpho, EML for R, DataUp (Metadata)
- GFBio (Data preservation)
- GFBio, KNB (Data discovery)
- ROpenScience (Data access and Anaysis)
- GFBio (Data Integration)
- Kepler, Pegasus (Workflows)
- GFbio (Data publishing)
Organisatory:The workshop takes place the weekend before the GFOE 2015 from 29th to the 30th of August. Each course day starts at 9am and ends at 4pm. (An exact time schedule with topics will be sent to the participants after registration, and when we have the room number the course takes place in). The workshop is limited to 12 participants and will cost 30 Euros per person.
Programming Skills are not needed for the course but some skills in R will be helpful in understanding the R Tools we present. What you should bring is some interest in the topic and if you like your own laptop. However we will work in a computer pool so Windows working stations will be available there for you!
The workshop is given by the two GFBio members
- Claas-Thido Pfaff (University of Leipzig, )
- Juliane Steckel (University of Göttingen, )