Data management & open data

In honor of Halloween and Allehelgensaften, I decided to regale the good readers with spooky, haunting tales related to data management, documentation, and data storage.

We have been discussing these a lot lately in various conversations, including at the very nice session on Data Management at Researcher Days. Data documentation, data storage, and data protocols always have the potential to become haunting horror stories. Fortunately, these have been largely constructive conversations. So, now it is time to let our imaginations roam to the darker and spookier corners appropriate for this time of year.

Data storage and backup is the area that has the most terrifying tales. The most haunting tale related to data occurred during my PhD. Along with 40 other PhD Fellows, we were visiting a large research facility near New York. One of the PhD students who was months away from submitting her dissertation left her laptop bag and laptop unsecured in her hotel room, where it was stolen. This is not ideal to have happen, but the other problem was that the backup copy (in these days, an external hard drive) was in the same bag as the laptop and was also stolen, making it useless. Terrifying? Absolutely. This set her back several months but I am happy to report that she did eventually complete her PhD. Computer theft has continued since then. More recently, another PhD student had their bag with laptop stolen on the train. Fortunately, they had their work backed up on a server from the Institute and lost only 2 weeks of work. Haunting? Definitely.

Data documentation and lack thereof also has the strong potential to cause nightmares. To use a recent example, a student on a collaborative project received data collected by another group several years ago related to soil properties, soil chemistry and environment, and greenhouse gas fluxes. But problems! The datasheet collected had no units listed. And worse, this old data was collected by a student no longer at the university! Spooky! It took several months to figure out the units and quality control status of these various datasets, an issue that could have been avoided with proper data documentation and the creation of metatdata.

Spooky and missing data can cause a real headache during analysis and can be avoided with clear chains of command, proper documentation about how to make sure that a system is running, and frequent checking on the data collected itself. As we automate measurements such as greenhouse gas fluxes (robots! Fun!), this also leads to more complicated systems that require babysitting and have several places where goblins and gremlins can wreak havoc, especially when people are on vacation and not paying attention. Data management plans can include a main responsible person and a substitute to check that the gremlins stay out of the system, document the instructions for running the systems, establish a logging system to document failures. Still, the data needs to be checked that everything looks okay and can be fixed with minimal down time.

 Data storage, proper data documentation are all important elements of data management plans. Developing a data management plan is now an important part of the PhD plan. Beyond PhDs, it is important to think and implement data management protocols at the project, group, section, and even departmental levels. While this can be viewed as an annoying piece of paperwork to complete, this is also an opportunity to think about how these terrifying situations can be avoided. The creation and implementation of good data handling practices can save us all from our nightmares of losing precious data and the numerous headaches associated with that. The creation of metadata that includes key information can serve as a bridge for researchers coming after and ensure impact of this data well into the future, as new uses for datasets are identified. So, make sure to back up your data immediately after reading this and Happy Halloween! Boo!