Maintaining Data on Maine-eDNA
By Caty DuDevoir
Data is the core of any scientific project. Proper data management practices and documentation are imperative to allow for the standardization and comparison of results. This is especially important on large projects, like the NSF EPSCoR Track-1 Maine-eDNA grant, which rely on the collaboration of dozens of researchers. Maine-eDNA recently completed its data management system, which marked not only an important step for the grant but for the environmental DNA (eDNA) field as a whole by providing a tool that can inform, be adopted, or iterated upon by other eDNA researchers across the globe.
Melissa Kimble, a Maine-eDNA graduate student advised by Kate Beard-Tisdale (Maine-eDNA Co-PI and University of Maine (UMaine) Professor of Computing & Information Science) who led the development of Maine-eDNA’s data management system explained, “How you arrange data is important. A database standardizes data structures and makes them linkable from one point to another. For example, I can relate data that was collected in the field to bioinformatic outputs.” These are the benefits of having a cohesive, structured format in a database. “eDNA is still in the experimental stage of things, so established protocols have been in flux. Within the last few years, we are coming to a nice consensus point where publications are targeting how to make data management more streamlined and what standard protocols to use.” Standards establish what data should be collected and stored. Where available, Kimble and collaborators incorporated standards set by the Genomics Standards Consortium to describe field collection, wet lab processing, and bioinformatics across the eDNA grant. Where there were no established standards, input from Maine-eDNA collaborators became critical.
When Maine-eDNA started it was understood that the research team would have to build its own data management system. “As something becomes more common and widespread, you have massive groups—like Maine eDNA—that need solutions that connect everyone all the time.” Kimble and her team started with discussions involving people across the entire grant to understand how to best document the data. Then, they created working groups, and from there, Kimble developed the code.
When developing the database, the team acknowledged the need for data integrity and efficiency. Data integrity ensures that “the state of the data stays the same from one transaction to the next.” Efficiency refers to how quickly users can get the data from the database. Balancing these factors is important for database development.
The resulting medna-metadata software is open access, “meaning anyone anywhere can submit to and gather data from it.” Kimble stated, “As things grow, it becomes more difficult to maintain data and ensure everyone has access, and accessibility becomes an issue. Everyone wants to access the data in the same state.” Having a data management system accessible through the web also allows researchers to avoid budgeting that into their grants. “Establishing the software to store the information makes it more accessible if it is free,” Kimble said.
The team recently published a paper entitled medna-metadata: an open-source data management system for tracking environmental DNA samples and metadata in Bioinformatics, a leading publication in the field. Along with the publication of the paper, Maine-eDNA launched a site with documentation and a site demoing the data management system. Kimble explained, “The paper, in short, describes the system for metadata collection in an environmental DNA study. It essentially shows standardized fields and how to format them so that you can have reproducibility of your methods, shareability, and documentation. It describes those systems, how everything is related, and what fields are available.”
Kimble looks forward to seeing how the data management system will continue to be refined and help serve eDNA as a discipline more broadly. With hundreds of researchers across the country utilizing eDNA in their research, having a centralized database allows for more efficiency, clarity, and general organization.