ARDM June 6-7 in Chicago

Registration is now open for the inaugural ARDM workshop hosted by the Center for Translational Data Science. Accelerating Research Using Data Meshes and Data Fabrics (ARDM) will focus on the interoperability and integration of data platforms into data meshes, data fabrics, and other types of data ecosystems. 

In recent years data platforms, including data commons, data repositories, and databases have seen tremendous growth.  These platforms are tailored for biomedical data, environmental data, social determinants of health data, and other data relevant to improving health outcomes. This workshop will be an opportunity to reduce silos while maximizing data use and the potential for innovative discoveries. 

This workshop will cover:

  • Developing a data mesh: five pillars

  • Technical requirements and standards for adding a data commons or data repositories to a data ecosystem

  • Policy and governance for data ecosystems

  • Standard agreements for data ecosystems

  • Use cases and success stories

If you are interested in attending, please visit our events page for more information and to register.

Biomedical Research Hub selected as key partner for international genomic data standards initiative

The Global Alliance for Genomics and Health (GA4GH) has named the Biomedical Research Hub as one of ten genomic data initiatives with clinical connections as its newest Driver Projects. The collaborations will allow genomic data standards to make new inroads into medicine and biomedical research, including applying machine learning to data from diverse regions around the globe.

Read the full press release here: https://biologicalsciences.uchicago.edu/news/ga4gh-driver-projects

SC23: Exploring the Data Frontier

This month Dr. Robert Grossman was interviewed by Super Compute’s Communications team sharing insight into his impact on high-performance computing.

Dr. Grossman also discusses his current work in health and wellness including the Genomic Data Commons, Gen3, as well as future challenges of HPC.

“The HPC challenge is to build the data platforms that can manage, explore, analyze, and share biomedical data at the scale needed and with the governance, security, and compliance required so we can tease out interesting small effects.”

The Center for Translational Data Science will be an exhibitor at SC23 in Denver from November 12 -17. Super Compute is the International Conference for High Performance Computing, Networking, Storage, and Analysis. Stop by booth 525 to say hello and learn about our latest research projects!

To read the full interview with Dr. Grossman click here.

CTDS Gives Demo to UChicago Undergrad Students

CTDS Welcomes UChicago Students for Demo

On Wednesday, April 12th several staff members of CTDS (Aarti Venkat, Ph.D., Kyle Hernandez, Ph.D., Sara Volk de Garcia, Ph.D., Fay Booker Ph.D., and Hillary Carroll) gave a presentation to UChicago undergraduates exploring opportunities in computational biology where they can harness the power of computer science and analytics to answer key questions in the life sciences.

The demo was designed to inform students of possible career paths and covered topics including what Translational Data Science is, CTDS projects and roles, as well as challenges in the field.

Gen3 Community Event - How to Set Up a Gen3 Data Commons Using Helm Charts

We will take you through the current best practices for setting up and configuring your own Gen3 Data Commons in multiple clouds by using Helm Charts. Helm is a tool that streamlines installing and managing Kubernetes applications, which is a system for automating deployment, scaling, and management of containerized applications. The use of Helm will greatly simplify standing up, configuring, and maintaining your own Gen3 Data Commons. This is the first of a series of community events through 2023.

Gen3 Community Forum 2022

The Gen3 platform consists of open-source software services that support the emergence of healthy data ecosystems by enabling the interoperation and creation of cloud-based data resources, including data commons and analysis workspaces. Gen3 aims to accelerate and democratize the process of scientific discovery by making it easy to manage, analyze, harmonize, and share large and complex datasets in the cloud. With Gen3 use spreading globally, there is a demand to coalesce shared knowledge and activities into a community. A new Gen3 Community will meet for the first time at a virtual forum co-hosted by the Center for Translational Data Science at the University of Chicago and the Australian BioCommons from 4 pm to 7 pm, October 10 to 12 (CST time zone), and 8 am to 11 am October 11 - 13 (AEDT time zone). The forum will meet for three days, three hours each day, and will include presentations from various Gen3 operators, developers, and breakout sessions to craft ideas for new features. The inaugural Gen3 Community Forum will:

  • Share knowledge about Gen3, its architecture, and the Gen3 roadmaps and priorities.

  • Strengthen the connection between the core team and those developing, operating and using Gen3 platforms.

  • Design a set of ongoing community engagement activities.

  • Discuss and agree on key shared development priorities between the Gen3 core team and the community.

Further details of the program and free registration are now available.

University of Kentucky's Commonwealth Computational Summit 2022

On April 13, 2022 Dr. Robert Grossman, Director of the Center for Translational Data Science, gave an academic keynote talk at the Commonwealth Computational Summit. This was the 5th annual summit hosted by the University of Kentucky’s Center for Computational Science.

Talk Title: The Data Gap in Machine Learning and AI: Why Much of Machine Learning and AI is Still Data Limited, and Some of the Options Available.

Abstract: Although large amounts of online text, images and audio have provided enough data that deep learning models can be developed that significantly improve language translation, image recognition, speech recognition and related applications, developing and deploying machine learning and AI models that provide value and limit bias is still quite difficult in many application areas due to the lack of suitable data. This is especially the case in biology, medicine and health care. We discuss some of the reasons that many important AI problems are still data-limited and some of the approaches that have been taken to address this challenge. We use case studies from machine learning models in COVID-19 and cancer to illustrate some of the challenges and some of the options available.

Read more

CTDS Google Summer of Code Applications Now Live

We are excited to announce that the applications for the 2022 Google Summer of Code program are now open! The Center for Translational Data Science is among 203 open-source organizations accepted as Mentor Organizations to the program. Google Summer of Code is a global program focused on bringing new contributors into open-source software development. During this 12+ week program, accepted GSoC contributors spend a few weeks becoming familiar with the community norms and codebase while determining expected milestones with their mentor for the summer, then spend 12+ weeks coding on their projects. Contributors may register and submit project proposals on the GSoC site from now until Tuesday, April 19th at 18:00 UTC. Don’t miss out on this exciting opportunity!

Read More

CTDS Accepted to Google Summer of Code

We are excited to announce that the Center for Translational Data Science has been accepted as a Mentor Organization for the 2022 Google Summer of Code program. Google Summer of Code is a global program focused on bringing new contributors into open-source software development. This year a total of 203 open-source organizations were accepted to the program and will be mentoring GSoC Contributors. Applications for GSoC Contributors open on April 4th. Don’t miss this exciting opportunity to contribute to one of our projects.

Read More

President Biden announces Reignition of the Cancer Moonshot

As Vice President, in 2016, Joe Biden launched the Cancer Moonshot with the mission to accelerate the rate of progress against cancer. The Center for Translational Data Science (CTDS) has maintained involvement in two projects that received support as part of the Cancer Moonshot and have strategic importance for CTDS: the Genomic Data Commons and the Blood Profiling Atlas in Cancer (BloodPAC). Today, President Biden is reigniting the Cancer Moonshot with renewed White House leadership of this effort. President Biden announced new goals for the Cancer Moonshot: to reduce the death rate from cancer by at least 50 percent over the next 25 years and improve the experience of people and their families living with and surviving cancer—and, by doing these and more, to end cancer as we know it today.

View article on whitehouse.gov

Summer 2022 Internship Applications Opening Soon

Internship applications for our summer 2022 program will be opening soon. Interns will contribute toward biomedical research through analytical solutions and will develop technical skills across data engineering, data science, bioinformatics, and software engineering. Interns will have opportunities to learn from staff mentors with experience building petabyte-scale research infrastructure. Please check back soon for more details!

E-seminar: Developing high performance secure multi-party protocols for healthcare data analytics

On December 10, 2021, Dr. Xiao Dong gave a seminar on developing high performance secure multi-party protocols for healthcare data analytics. Xiao Dong recently joined Center for Translational Data Science (CTDS) as a data scientist, Prior to joining CTDS, he was a senior research specialist at the Center for Clinical and Translational Science at University of Illinois at Chicago. Xiao Dong received his PhD in Informatics from Indiana University at Bloomington in 2010. This seminar talks about Xiao’s main research works at UIC between 2019 and 2021.

Performing privacy-preserving data analytics across different healthcare institutions is both important and challenging. In this talk, we presented how state-of-the art cryptography techniques, such as multi-party secure computation and homomorphic encryption, can accomplish cross-institution data analytics using highly efficient and provably secure protocols. Use cases for these protocols include clinical trial cohort studies, comorbidity index calculation and high utilizer identification when patient’s records are spread across multiple institutions. This series of research has been presented several times at the American Medical Informatics Association’s Annual Symposium (2019) and Informatics Summit (2020, 2021).

CTDS at SuperComputing Conference 2021

Since 1994, the Center for Translational Data Science has hosted a booth at the annual SuperComputing Conference, though we have been present at the conference since 1992. The SC Conference Series is an annual international conference for high performance computing, networking, storage, and analysis. This year, the conference was both virtual and in-person in St. Louis, Missouri. Watch below as our Director, Bob Grossman, describes building data commons and data ecosystems using open source Gen3 software at SC21. This application helps make data readily available to the scientific community to improve patient outcomes and accelerate scientific discovery.

E-seminar: A Pragmatic Approach to Modeling Human Disease with Clinical Data

The Center for Translational Data Science will be hosting Dr. Theresa L. Walunas (she/her/hers) on Friday, September 17th at 12pm to host a talk titled “A Pragmatic Approach to Modeling Human Disease with Clinical Data.” Dr. Walunas is an Assistant Professor in the Division or General Internal Medicine and Geriatrics in the Department of Medicine at Northwestern University Feinberg School of Medicine.

Human diseases can be challenging to represent in animal models. But what is the alternative? "Real World" clinical data from electronic health records presents an opportunity to develop representations of complex human conditions, provided that the research community understands the strengths and challenges inherent to the data. Developing a pragmatic approach to using electronic health record data is the first step to unlocking its research potential. Data science and informatics professionals who understand this data and how to connect it to primary mechanistic and social science data will be essential to the future of research in this space.