Databricks Accelerates Genomic Drug Discovery and Analysis for Leading Pharmaceutical Company
Biogen, a biopharmaceutical company, addressed petabyte-scale genomic data processing bottlenecks by migrating to Databricks' cloud-based data intelligence platform on AWS.
Value Results Summary
Biogen, a leading biopharmaceutical company, develops therapies for neurological diseases by analyzing human genetic evidence to identify drug targets and understand disease biology. The company's research programs required processing petabytes of genomic data from the UK Biobank's 500,000 volunteer participants. However, Biogen's on-premises data infrastructure faced critical limitations: insufficient storage capacity, inadequate network bandwidth, and poor scalability. In 2018, these constraints led to a one-week outage of their high-performance compute cluster. To address these challenges, Biogen partnered with Databricks and DNAnexus to migrate their data infrastructure to Amazon Web Services (AWS) cloud and deploy Databricks for Genomics, a specialized runtime designed for large-scale genomic workflows.
The cloud migration and Databricks implementation transformed Biogen's data processing capabilities. By leveraging Delta Lake and open-source technologies, Biogen optimized their analytical pipelines to handle massive datasets with dramatically improved speed. A workflow that previously required 2 weeks to process 700,000 variants now annotates 2 million variants in approximately 15 minutes. The platform enabled Biogen to heavily partition genomic data by genetic location, integrate security controls through the Spark Hive Metastore, and maintain data quality and consistency across thousands of columns. Researchers could now focus on science rather than cloud optimization, accessing queryable databases for complex genetic analysis without operational delays.
These technical improvements directly accelerated therapeutic discovery. Biogen identified genes containing protein-truncating variants that impact human longevity and neurological status, leading to the discovery of 2 new drug targets for neurodegenerative diseases including Alzheimer's and Parkinson's. The company developed machine learning models to understand how genomic variants affect drug efficacy and functionality across their portfolio. By combining UK Biobank data with optimized cloud infrastructure, Biogen gained a unique advantage in understanding complex disease biology and designing targeted therapies, establishing a scalable foundation for future genomic research and drug development.








