ETL for Pharma

Organizations challenged with overburdened EDWs need solutions that can offload the heavy lifting of ETL processing from the data warehouse to an alternative environment that is capable of managing today’s data sets. The first question is always, “how can this be done in a simple, cost-effective manner that doesn’t require specialized skill sets?”

Let’s start with Hadoop. As previously mentioned, many pharma organizations deploy Hadoop to offload their data warehouse processing functions. After all, Hadoop is a cost-effective, highly scalable platform that can store volumes of structured, semi-structured, and unstructured data sets. Hadoop can also help accelerate the ETL process, while significantly reducing costs in comparison to running ETL jobs in a traditional data warehouse. However, while the benefits of Hadoop are appealing, the complexity of this platform continues to hinder adoption at many organizations. It has been our goal to find a better solution

The new solution combines the Hadoop distribution from Cloudera with a framework and tool set for ETL offload from Syncsort

The technology behind the ETL offload solution simplifies data processing by providing an architecture to help users optimize an existing data warehouse. So, how does the technology behind all of this actually work?

The ETL offload solution provides the Hadoop environment through Cloudera Enterprise software. The Cloudera Distribution of Hadoop (CDH) delivers the core elements of Hadoop, such as scalable storage and distributed computing, and together with the software from Syncsort, allows users to reduce Hadoop deployment to weeks, develop Hadoop ETL jobs in a matter of hours, and become fully productive in days. Additionally, CDH ensures security, high-availability, and integration with the large set of ecosystem tools.

Syncsort DMX-h software is a key component in the solution or RA. Designed from the ground up to run efficiently in Hadoop, Syncsort DMX-h removes barriers for mainstream Hadoop adoption by delivering an end-to-end approach for shifting heavy ETL workloads into Hadoop, and provides the connectivity required to build an enterprise data hub. For even tighter integration and accessibility, DMX-h has monitoring capabilities integrated directly into Cloudera Manager.

With Syncsort DMX-h, organizations no longer have to be equipped with MapReduce skills and write mountains of code to take advantage of Hadoop. This is made possible through intelligent execution that allows users to graphically design data transformations and focus on business rules rather than underlying platforms or execution frameworks. Furthermore, users no longer have to make application changes to deploy the same data flows on or off of Hadoop, on premise, or in the cloud. This future-proofing concept provides a consistent user experience during the process of collecting, blending, transforming, and distributing data.