• Case Studies
  • January 19, 2023

Hadoop to Databricks Migration for a Global Life Sciences & Healthcare Leader

Hadoop to Databricks Migration for a Global Life Sciences & Healthcare Leader
Hadoop to Databricks Migration for a Global Life Sciences & Healthcare Leader
  • Case Studies
  • January 19, 2023

Hadoop to Databricks Migration for a Global Life Sciences & Healthcare Leader

Leading life sciences and health care brand migrates its legacy Hadoop platform to DataBricks and reduces OPEX by 50% using a Hybrid Delivery Model

Client Overview

A leading life sciences and healthcare company operating one of the largest clinical laboratory networks in the world. The company offers world-class diagnostics to improve patient care and accelerate drug development.

Challenges

The client was facing challenges with highly complex and legacy Big-Data platform running on a Hadoop environment, impacting operations that needed real-time insights. Additionally, the existing platform presented significantly higher maintenance costs and limited flexibilities. These roadblocks were hindering their future ambitions of structuring a solid data-driven enterprise.

Project Goals

In consultation with our experts at MSRcosmos, the client decided to migrate to DataBricks. The goal was to undertake migration without any business rule changes, migrating Spark, ZDP, Flume, and Java code to Databricks.

The client also wanted to create a workflow orchestration mechanism to run Databricks jobs across following processes:

  • Administration
  • Data Migration
  • Data Processing
  • Security and Governance
  • SQL and BI Layer

The MSRcosmos Solution Approach

Our experts suggested a Hadoop migration solution designed using a combination of Databricks, AWS cloud, and Delta engine to ingest, store, process, enrich, and serve data and insights from different sources. Data lakehouse architecture was deployed, which included a data warehouse for structured data and a data lake for semi-structured and unstructured data.

As per the client’s requirement, a dedicated channel was established in AWS cloud for the migration of healthcare-sensitive data from an on-premise system using AWS Direct Connect and AWS VPC. AWS Data sync was used for automated service and efficient handling of huge data transfers. SAP HANA & Tableau were used as a semantic layer.

Results

In accordance with MSRcosmos’ staunch delivery commitments and quality delivery standards, the migration stood swiftly complete in just six months with minimum downtime. By modernizing the Data and AI ecosystem, the client was able to harness the true capabilities of Databricks. Consequently, the client was able to reduce OPEX by 50% using a Hybrid Delivery Model.