Cloud computing project improves access to bioinformatics research tools, supports diversity and capacity building

Ben King is building a genomics training module in the Cloud for biomedical researchers.

The National Institute of Health (NIH) asked the assistant professor of bioinformatics at the University of Maine to lead the development of a training module on cloud computing for genomics.

King codirects the Bioinformatics Core for the Maine IDeA Network of Biomedical Research Excellence (INBRE), which is a collaborative network of 14 educational and research institutions in Maine that is led by Mount Desert Island Biological Laboratory (MDIBL). Since 2001, the NIH-funded Maine INBRE program has been strengthening Maine’s capacity to conduct competitive biomedical research with a focus on comparative functional genomics.

The goal of the project is to develop training materials for biomedical researchers to be able to utilize high-performance computing systems for bioinformatics research using the Google Cloud platform. Bioinformatics applies computational methods to extract knowledge from biological data. King developed a proposal for the project with his colleagues at MDIBL, Joel Graber and Jim Coffman, that was funded by the National Institute of General Medical Sciences (NIGMS) in August.

Using Cloud computing technologies, researchers can more easily design analysis environments without any infrastructure. This dramatically impacts the capacity of smaller laboratories to participate in research.

When King has hosted researchers in Maine at courses or week-long workshops, they are able to use the on-premise analysis servers preconfigured with all software and data needed to learn how to apply bioinformatics tools in their research. But once the course is over, those researchers do not always have access to the same analysis environment in their labs.

Cloud computing solves this problem.

“They can build that same environment at any point later on, and then continue on from what they were doing at a workshop,” King says.

Beyond just continuing their work in their own laboratories after they leave, the project facilitates collaboration on a new level. Researchers can create identical analysis environments.

“If you have some collaborative work with colleagues in a different state, you can share your code and all of a sudden they can build the exact same environment that you have been using,” he says.

King’s project recently was featured in the opening talk at a two-day, NIH virtual workshop. Susan Gregurick, associate director for data science and director of the Office of Data Science Strategy at the NIH, introduced King’s project as a program that supports diversity and capacity building in data science. Later in the workshop, King presented a demonstration of a training module prototype to over 130 workshop participants.

King’s pilot project utilized a team of consultants from Google, and computing resources provided by the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) program and NIGMS. It will serve as a model for states in which success rates for NIH grants historically have been low and could potentially be used for all NIH extramural researchers.

At smaller research institutions it can be hard to build what King calls analysis environments, which are powerful computer servers where all analysis software is installed and datasets are available. Some of these data sets are so vast that they are hard to replicate and download all that data onto on-premise, high-performance computing equipment.

“The goals are to build a community of practice around using Cloud computing for bioinformatics research and research training. There are different groups that have started, some of them are based in universities, others through different institutes at the NIH,” King says. “And part of that is a demonstration project that builds a training module to analyze data from what we call RNA sequencing. It’s something where we can walk users through the analysis workflow, but as they’re doing that, they’re learning about Cloud computing in that process.”

RNA-sequencing allows researchers to understand the biology of a cell by measuring differences in the expression of all genes under different conditions. For example, King’s research lab uses RNA sequencing to understand how the innate immune system responds to influenza A virus infection in order to develop ways to reduce tissue damage that can occur with severe infection.

Demand for collaborative research and shared analysis environments is increasing, while the opportunities presented by cloud computing provide the needed technological prowess to make large-scale collaborative research happen.

King has been working with the University of Maine’s Advanced Research Computing and Security Information Management group to enroll the University of Maine System into the NIH STRIDES program where NIH-funded researchers can obtain discounted rates with Google Cloud, Amazon Web Services, and Microsoft Azure platforms.

King hopes that his project will improve access to research tools and infrastructure, and ultimately increase collaboration among researchers.