CTDS is awarded a major federal contract to host a new COVID-19 medical imaging resource center

“A new center hosted at the University of Chicago — co-led by the largest medical imaging professional organizations in the country — will help tackle the ongoing COVID-19 pandemic by curating a massive database of medical images to help better understand and treat the disease.

Led by Maryellen Giger, PhD, of UChicago, along with leaders from the American College of Radiology (ACR)Radiological Society of North America (RSNA), and American Association of Physicists in Medicine (AAPM), the Medical Imaging and Data Resource Center (MIDRC) will create an open source database with medical images from thousands of COVID-19 patients. Funding is from the National Institute of Biomedical Imaging and Bioengineering at the National Institutes of Health (NIH).

The MIDRC is responding to an unmet need of the medical imaging community as doctors and scientists seek to better understand SARS-CoV-2, the virus that causes coronavirus disease 2019, or COVID-19, and its effects on the human body. By collecting and integrating images and their data via a dynamic, secure networked system, the MIDRC will provide a large-scale, open, common framework to enable technological advancements, guide researchers’ validation and use of AI (artificial intelligence), and translate clinical systems for the best patient management decisions.”

Full article from UChicago Medicine

E-seminar: From Combination Puzzles to the Natural Sciences with Dr. Forest Agostinelli

The Center for Translational Data Science is proud to announce that we will be hosting Dr. Forest Agostinelli on July 9th at 11am CST for a seminar on his work building AI agents that learn to solve puzzles and how this relates to the natural sciences. Dr. Agostinelli is an Assistant Professor at the AI Institute in the Department of Computer Science and Engineering at the University of South Carolina. He received his B.S. from the Ohio State University, his M.S. from the University of Michigan, and his Ph.D. from the University of California, Irvine under Professor Pierre Baldi. His group conducts research in the fields of deep learning, reinforcement learning, search, explainability, bioinformatics, neuroscience, and chemistry. In this talk, Dr. Agostinelli will present DeepCubeA, a deep reinforcement learning and search algorithm that can solve the Rubik’s cube, and six other puzzles, without domain specific knowledge. Next, he will discuss how solving combination puzzles opens up new possibilities for solving problems in the natural sciences. Finally, he will show how problems we encounter in the natural sciences motivate future research directions in areas such as explainable artificial intelligence and education. A demonstration of their work can be seen at http://deepcube.igb.uci.edu/.

Pride Month Presentation

On June 30th, the Center for Translational Data Science will be hosting UChicago’s Assistant Provost and Executive Director of the Center for Identity + Inclusion, Ravi Randhava, to speak at a special Pride Month Lunch and Learn. This presentation will focus on educating CTDS staff on LGBTQ history and campus/community resources. All of CTDS is welcome to attend this special remote session. Zoom information is available on the CTDS group calendar.

E-seminar: Childhood Cancer Data Lab: Researcher Audiences & Lessons Learned (So Far)

The Center for Translational Data Science is proud to announce that we will be hosting Dr. Jaclyn Taroni on June 11th at 11am CST for a seminar on different researcher audiences the Childhood Cancer Data Lab is intended to serve. Dr. Jaclyn Taroni has recently taken over the Director role at the Childhood Cancer Data Lab, a program of Alex's Lemonade Stand Foundation, after serving as the Principal Data Scientist for nearly 3 years. The Data Lab serves multiple pediatric cancer research audiences. In this talk, Dr. Taroni will cover the lab's current understanding of the needs of different audiences and share experiences building software products, creating training workshop content, and using large heterogeneous datasets to inform analyses of rare disease data.

Bronx Center for Science and Mathematics - Careers discussion

On April 23rd the Bronx Center for Science and Mathematics hosted Dr. Kyle Hernandez from the Center for Translational Data Science for a session on careers in bioinformatics. This presentation focused on defining the field of bioinformatics and its real-world application. He also discussed his career path and his work managing a team of bioinformaticians, research programmers, and clinical data specialists here at CTDS.

Summer 2021 Internship Applications are now open!

Shoreland.jpg

We are accepting applications for our 2021 Summer Internships! Interns will contribute toward biomedical research through analytical solutions and will develop technical skills across data engineering, data science, bioinformatics, and software engineering. Interns will have opportunities to learn from staff mentors with experience building petabyte-scale research infrastructure.

COV-IRT 1-Year Symposium & Panel Discussion

On March 31, 2021 the COVID-19 International Research Team hosted their virtual annual symposium. Keynote speakers included Dr. Duncan MacCannell from the CDC, Dr. Sharon Peacock from the University of Cambridge, David Jaffray and Andrew Futreal from MD Anderson Cancer Center, and Don Milton from the University of Maryland School of Public Health. Dr. Kyle Hernandez from the Center for Translational Data Science moderated a session with speakers from NASA Ames Research Center, Signature Science, Real Networks, and Baylor College of Medicine on modeling, masks, and microbiomes.

University of Georgia Institute of Bioinformatics Invited Seminar

On February 5th the University of Georgia Institute of Bioinformatics hosted Dr. Kyle Hernandez from the Center for Translational Data Science for a seminar on large scale bioinformatics in the Genomic Data Commons. His seminar defined the term “data commons” and discussed its application, focusing on the genomic data harmonization systems in the GDC. He also discussed the GDC Pipeline Automation System (GPAS), its core database and data import, as well as its workflow.

University of Arkansas - Data Science Career Panel

The University of Arkansas hosted Dr. Kyle Hernandez, Gina Kuffel, and Zhenyu Zhang from the Center for Translational Data Science on January 25th for a panel on the application of data science in a non-tenure track setting. Data scientists collect, process, and analyze data to examine and predict meaningful trends in a variety of fields. Creating insights from structured and unstructured information is a critical form of statistical application. Data scientists are often found in positions related to medicine, marketing, finance, and business.

Webinar: GDC Bioinformatics Pipelines

Date: Monday, September 30, 2019

Time: 2:00 PM - 3:00 PM (EDT)

Location: Web Conference (See WebEx information below)

Speakers:

Dr. Zhenyu Zhang, Ph.D, GDC Bioinformatics Manager, University of Chicago

Colin Reid, GDC User Services, University of Chicago

The GDC bioinformatics pipelines support the alignment of DNA and RNA sequence data against a common reference genome build, and the generation of derived data. GDC pipelines are implemented using data processing software and algorithms selected in consultation with the expert genomics community. This webinar will provide an overview of GDC bioinformatics pipelines and demonstrate how generated data is made available through GDC analysis tools.


How Data Commons Can Support Open Science

 

How Data Commons Can Support Open Science

April 23, 2019

By Robert L. Grossman

In the discussion about open science, we refer to the need for having data commons. What are data commons and why might a community develop one? I offer a brief introduction and describe how data commons can support open science.

Data commons are used by projects and communities to create open resources to accelerate the rate of discovery and increase the impact of the data they host. Notice what data commons aren’t: Data commons are not designed for an individual researcher working on an isolated project to ignore FAIR principles and to dump their data to satisfy data management and data sharing requirements.

More formally, data commons are software platforms that co-locate: 1) data, 2) cloud-based computing infrastructure, and 3) commonly used software applications, tools and services to create a resource for managing, analyzing and sharing data with a community.

The key ways that data commons support open science include:

  1. Data commons make data available so that they are open and can be easily accessed and analyzed.

  2. Unlike a data lake, data commons curate data using one or more common data models and harmonize them by processing them with a common set of pipelines so that different datasets can be more easily integrated and analyzed together. In this sense, data commons reduce the cost and effort required for the meaningful analysis of research data.

  3. Data commons save time for researchers by integrating and supporting commonly used software tools, applications and services. Data commons use different strategies for this. The commons themselves can include workspaces that support data analysis, other cloud-based resources can be used to support the data analysis, such as the NCI Cloud Resources that support the GDC, or data analysis can be done via third party applications, such as Jupyter notebooks, that access data through APIs exposed by the data commons.

  4. Data commons also save money and resources for a research community since each research group in the community doesn’t have to create their computing environment and host the same data. Since operating data commons can be expensive, a model that is becoming popular is not charging for accessing data in a commons, but either providing cloud-based credits or allotments to those interested in analyzing data in the commons or passing the charges for data analysis to the users.

A good example of how data commons can support open science is the Genomic Data Commons (GDC) that was launched in 2016 by the National Cancer Institute (NCI). The GDC has over 2.7 PB of harmonized genomic and associated clinical data and is used by over 100,000 researchers each year. In an average month, 1–2 PB or more of data are downloaded or accessed from it.

The GDC supports an open data ecosystem that includes large scale cloud-based workspaces, as well as Jupyter notebooks, RStudio notebooks, and more specialized applications that access GDC data via the GDC API. The GDC saves the research community time and effort since research groups have access to harmonized data that have been curated with respect to a common data model and run with a set of common bioinformatics pipelines. By using a centralized cloud-based infrastructure, the GDC also reduces the total cost for the cancer researchers to work with large genomics data since each research group does not need to set up and operate their own large-scale computing infrastructure.

Based upon this success, a number of other communities are building their own data commons or considering it.

For more information about data commons and data ecosystems that can be built around them, see:

  • Robert L. Grossman, Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data, Trends in Genetics 35 (2019) pp. 223–234, doi.org/10.1016/j.tig.2018.12.006. Also see: arXiv:1809.01699

  • Robert L. Grossman, Progress Towards Cancer Data Ecosystems, The Cancer Journal: The Journal of Principles and Practice of Oncology, May/June 2018, Volume 24 Number 3, pages 122–126. doi: 10.1097/PPO.0000000000000318

About: Robert L. Grossman is the Frederick H. Rawson Distinguished Service Professor in Medicine and Computer Science and the Jim and Karen Frank Director of the Center for Translational Data Science (CTDS) at the University of Chicago. He is also the Director of the not-for-profit Open Commons Consortium (OCC), which manages and operates cloud computing and data commons infrastructure to support scientific, medical, healthcare and environmental research.

Originally published at http://sagebionetworks.org.