Search

Search

Health Care

Health Care

πŸ‡ΊπŸ‡Έ

United States

πŸ‡ΊπŸ‡Έ

United States

Databricks Accelerates Genomic Drug Discovery and Analysis for Leading Pharmaceutical Company

Biogen, a biopharmaceutical company, addressed petabyte-scale genomic data processing bottlenecks by migrating to Databricks' cloud-based data intelligence platform on AWS.

Value Results Summary

Analyzed 2 million genomic variants in 15 minutes, compared to 700,000 variants in 2 weeks previously

Analyzed 2 million genomic variants in 15 minutes, compared to 700,000 variants in 2 weeks previously

Analyzed 2 million genomic variants in 15 minutes, compared to 700,000 variants in 2 weeks previously

Discovered 2 new drug targets for neurodegenerative diseases including Alzheimer's and Parkinson's

Discovered 2 new drug targets for neurodegenerative diseases including Alzheimer's and Parkinson's

Discovered 2 new drug targets for neurodegenerative diseases including Alzheimer's and Parkinson's

Enabled petabyte-scale analysis supporting research on 500,000 UK Biobank participants

Enabled petabyte-scale analysis supporting research on 500,000 UK Biobank participants

Enabled petabyte-scale analysis supporting research on 500,000 UK Biobank participants

Improved data processing efficiency and research productivity through cloud-based infrastructure and reduced operational constraints

Improved data processing efficiency and research productivity through cloud-based infrastructure and reduced operational constraints

Improved data processing efficiency and research productivity through cloud-based infrastructure and reduced operational constraints

Biogen, a leading biopharmaceutical company, develops therapies for neurological diseases by analyzing human genetic evidence to identify drug targets and understand disease biology. The company's research programs required processing petabytes of genomic data from the UK Biobank's 500,000 volunteer participants. However, Biogen's on-premises data infrastructure faced critical limitations: insufficient storage capacity, inadequate network bandwidth, and poor scalability. In 2018, these constraints led to a one-week outage of their high-performance compute cluster. To address these challenges, Biogen partnered with Databricks and DNAnexus to migrate their data infrastructure to Amazon Web Services (AWS) cloud and deploy Databricks for Genomics, a specialized runtime designed for large-scale genomic workflows.

The cloud migration and Databricks implementation transformed Biogen's data processing capabilities. By leveraging Delta Lake and open-source technologies, Biogen optimized their analytical pipelines to handle massive datasets with dramatically improved speed. A workflow that previously required 2 weeks to process 700,000 variants now annotates 2 million variants in approximately 15 minutes. The platform enabled Biogen to heavily partition genomic data by genetic location, integrate security controls through the Spark Hive Metastore, and maintain data quality and consistency across thousands of columns. Researchers could now focus on science rather than cloud optimization, accessing queryable databases for complex genetic analysis without operational delays.

These technical improvements directly accelerated therapeutic discovery. Biogen identified genes containing protein-truncating variants that impact human longevity and neurological status, leading to the discovery of 2 new drug targets for neurodegenerative diseases including Alzheimer's and Parkinson's. The company developed machine learning models to understand how genomic variants affect drug efficacy and functionality across their portfolio. By combining UK Biobank data with optimized cloud infrastructure, Biogen gained a unique advantage in understanding complex disease biology and designing targeted therapies, establishing a scalable foundation for future genomic research and drug development.

Similar stories

Keep exploring