August 1, 2018

Deploy Highly Available AD Spanning Cloudera EDH Clusters on OCI

By: Zachary Smith | Principal Solution Architect


Hello, my name is Zachary Smith and I am a Solutions Architect working on Big Data for Oracle Cloud Infrastructure (OCI).

We are proud to announce that Availability Domain (AD) spanning Terraform automation is now available for use with Cloudera Enterprise Data Hub (EDH) deployments on OCI.  This deployment architecture includes enhanced security and fault tolerance, while maintaining performance. 

Cloudera Enterprise Data Hub: Availability Domain Spanning

Availability Spanning (HA) is  ideal for customers who want to maintain the performance of Cloudera EDH on OCI while leveraging the cloud constructs to enhance fault tolerance and availability.   This is achieved by deploying Cloudera EDH cluster hosts across all three Availability Domains in a Region.  This is achieved by distributing Zookeeper, NameNode, and HDFS services across nodes in each AD.

Cloudera Cluster Hosts on Private Subnet

With our continued focus on enabling enterprise customers to deploy secure environments in the cloud, we have now included in this architecture the deployment of Master and Worker cluster hosts on a private subnet not accessible directly from the Internet.   To achieve this, the Bastion host in the deployment is setup as a NAT gateway, which is leveraged by hosts on the private subnet to route Internet destined traffic to the Internet Gateway.  This provides enhanced security without sacrificing cluster performance.

Performance Testing Cloudera EDH on OCI


To test performance of Cloudera EDH on OCI, Terasort was chosen as a benchmark.  This benchmark is a standard for Hadoop, because it tests the I/O of all elements involved in a Hadoop deployment – Compute, Memory, Storage, and Network.  The following graph shows a comparison running a 10TB Terasort across two cluster types on each deployment architecture.  The first cluster type is Virtual Machine using 6x 1.5TB Block Volumes for HDFS, the second cluster type is Bare Metal using local NVMe for HDFS.   The cluster topology is the same for both architectures, consisting of 5 Worker nodes, 1 Cloudera Manager node, 2 Master nodes for cluster services and 1 Bastion.

Not only are the performance results extremely fast for sorting 10TB with 5 Workers, but you will also notice the sort times extremely close when comparing Single AD versus AD Spanning architecture.  These tests were run multiple times in a row, and the results returned almost identical times regardless of what time of day the job runs.  This is a great example of Oracle’s industry leading SLA for Cloud.

We have more improvements in this space, and a white paper which details a Reference Architecture for Cloudera Enterprise Data Hub on Oracle Cloud Infrastructure, and the use of these Terraform templates.

Have questions or want to learn more? Join us at the Cloudera Now Virtual Event Booth on 8/2 from 9am-1pm PDT. Register Now.

Let Us Know What You Think

We hope you will be as excited as we are about the improvements we’re making to the Cloudera plus Oracle solution. Let us know what you think!

Zachary Smith

Senior Member of Technical Staff




Principal Solution Architect

I have a background in Big Data Platform Architecture, and have been working on automation and support of Big Data frameworks on OCI.

More about Zachary Smith