King lab utilizes Texas Advanced Computing Center’s resources for data analysis
Benjamin King, an Assistant Professor of Bioinformatics in the Department of Molecular and Biomedical Sciences at the University of Maine, has utilized UMaine’s Advanced Research Computing (ARC) in order to access resources provided by Texas Advanced Computing Center (TACC).
Kayla Barton, one of King’s graduate students, was the first in King’s lab to use Stampede2 last year and get analysis pipelines up and running. “When we were starting out, we were helped by one of ARC’s team members, Kevin Wentworth,” King says. “ARC didn’t just give us account information. ARC has staff that researchers can talk with to get the training needed to get things up and running. Having expertise within ARC is equally important to having access to these resources.”
King’s lab has been using Stampede2, which is the flagship supercomputer of the Extreme Science and Engineering Discovery Environment (XSEDE), a single virtual system that scientists can use to interactively share computing resources, data, and expertise.
“My graduate student, Steven Allers, has been re-analyzing published data sets in order to build models for communities of bacterial species across space and time” King says. This research is an important part of the Maine-eDNA program, an NSF EPSCoR Track-1 grant.
According to King, the almost two-year-old program is rapidly collecting large sets of samples that will be used to create an invaluable resource for studies of complex biological communities. His lab has been re-analyzing data other researchers have collected on aquatic samples similar to what Maine-eDNA aims to capture. By analyzing these metagenomics data sets, King and his team will develop models that can act as an efficient training set and comparison for the program.
“Stampede2 has really worked well for us because of TACC’s support of Docker containers. Without containers, it’s difficult to install and use the analysis software on a Linux cluster,” King explains. “Installing analysis software, like QIIME2, is not like installing Word on your laptop with one installation file. Instead, it’s like a house of cards with all of these dependencies that you need to be aware of, not to mention the architecture of the Linux cluster.”
King’s lab also uses Stampede2 to study patterns of gene expression by analyzing high-throughput RNA sequence data sets. His Lab’s ongoing studies seek to understand how non-coding RNA, including microRNAs and long non-coding RNAs, regulate the function of the innate immune system. A major focus is on the role neutrophils have in the hyper inflammatory response to Influenza A virus infection using a zebrafish model developed at the University of Maine.
“Current versions of many of the commonly used high-throughput sequencing analysis tools are already installed on Stampede2,” King describes. “If a tool is not already installed, ARC and TACC are there to help”.
“Run times have been very short. The TACC help desk has been very responsive and helpful, and the extensive Stampede2 user guide is a great resource,” King says. “Overall, our experience with ARC and TACC has been great.”