Developing Living and Breathing Documentation for the Research Community
By Daniel Timmermann
Continuing in its mission to support the University of Maine System research community and collaborators, UMaine’s Advanced Research Computing, Data Security, and Information Management (ARCSIM) unit works to create comprehensive and accessible training materials for anyone seeking to work with high performance computing and data analysis resources.
Recently, Laura Jackson, an integrative data scientist with ARCSIM, developed training materials and housed them at GitHub, for access by UMS researchers and collaborators. Using GitHub as the repository for instructional materials allows ARCSIM to provide a centralized resource that is, in essence, a living and breathing document. When a researcher needs help with their workflow, such as with the conda environment installation at TACC, supporting documentation can be created and added. Once created, the same materials are then accessible to anyone who needs help with the same problem in the future.
Because GitHub is a public repository, the information is truly available to anyone who can find it. This allows ARCSIM to serve the specific needs of the UMS research community while also providing an open resource to researchers around the globe.
New users can find accessing a national supercomputer center to be overwhelming and intimidating. Someone may look at the center’s existing documentation and not know exactly where to start or what is relevant to their specific project. The resources created by ARCSIM help them understand what they need to know, enhancing already existing documentation and, while applicable to all, are catered to meet the specific needs of the research community.
This project initially started when Jackson was setting up new users at TACC. “It started from a Word document which is hard to share with people because if they have to go back and reference it, maybe it is not the most up to date version,” explained Jackson. Computational tools, software, and processes tend to change quickly, and users require additional content not currently listed in the documentation version they have. It was essential that any documentation created by ARCSIM be capable of accommodating this rate of change.
The training content serves as an initial point of support, covering topics like getting setup, specific analysis software, and file and folder permissions that ensure someone is not accidentally making sensitive or unpublished data public. As the development progresses, ARCSIM seeks to create an increasingly comprehensive resource that maintains its accessible nature. This documentation also includes information on upcoming training opportunities applicable to UMS researchers.
ARCSIM is currently focusing on TACC and Ohio Supercomputer Center (OSC), as well as CyVerse, an NSF-funded Open Science Workspace for data analysis, but any systems may be introduced going forward, as the needs of the user community vary. The scope allows keeping pace with changes in software and processes, which will only become more critical as more new users begin to incorporate high performance computing resources into their research workflows. Ultimately, the documentation is built for members of the UMS research community, so they have supporting content that they can trust to be up to date and geared towards their needs.