Open Science Data Cloud

The Open Science Data Cloud provides the scientific community with resources for storing, sharing, and analyzing terabyte and petabyte-scale scientific datasets. The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complementary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions. It is a one-stop shop for making scientific research faster and easier.

CDIS is collaborating with the Open Commons Consortium, which is a 501(c)(3) not-for-profit corporation, on this project providing cloud based infrastructure supporting data scientists

Why is there a need?

With datasets growing larger and larger, researchers are finding that the bottleneck to discovery is no longer a lack of data but an inability to manage, analyze, and share their large datasets. Individual researchers can no longer download and analyze the important datasets in their scientific fields on their own computers. Cross-disciplinary analysis is even more difficult. The goal of the Open Science Data Cloud is to remove the bottleneck to discovery by providing researchers with access to a variety of key datasets across scientific disciplines and the computing infrastructure to allow scientists to easily manage and share their data and analysis.

How is this transformational?

By housing the entire scientific research pipeline, from the raw data and the entire computing environment to the analysis tools and results, the OSDC makes scientific research open, transparent, and reproducible. OSDC is also built on open source technology as a model for the scientific community to extend and grow.

Open science is the driving foundation of the OSDC. OSDC researchers can essentially “publish” the entirety of their research process through the OSDC alongside publishing their scientific results. Using the OSDC, the next researcher can then build directly upon previous work, making the possibility of scientific discovery and progress faster. Currently, researchers read scientific papers in their fields and may or may not be able to access the data analyzed and tools used to reconstruct, validate, and build upon published results. In the future, a researcher will read a scientific paper, sit down at her computer, and log onto the exact virtual machine needed to reproduce the entire scientific research pipeline that produced the results presented in the paper.

What does a layperson need to know?

Like astronomers rely on telescopes and biologists use microscopes, the OSDC is a “datascope,” an essential tool necessary to make scientific discoveries possible when analyzing big datasets.