X
November 7, 2020

Improving innovation performance on Energy Sector with the new NVIDIA A100 Bare Metal in Oracle Cloud Infrastructure

By: Pedro Marques Dias Valentim | Energy Industry Advisor

Share

The adoption of High-Performance Computing across the energy industry tackles many of the industry challenges the industry is facing from: where to invest capital constrained resources to maximize return on investment and reduce market and business risks; how to Explore & Appraise, Develop & Produce energy more effectively and efficiently; how to operate and maintain a very high utilization and resilient fleet of assets; or how to effectively develop new Products for highly specialized industries like Lubricants, Additives and Chemicals.

Independently of the challenges you are trying to tackle in the energy industry it can accelerate your digital transformation efforts by easily having access into a cloud powerful compute power and a rich development ecosystem.

Reinventing the Energy Industry with Cloud HPC

Energy companies should use Oracle Cloud High Performance Computing capabilities to:

- enable analysis of larger quantities of data from industrial IoT, facilitating insights not previously possible;

- reverse engineer data-driven models of complex asset systems to analyze behavior; automate components of the end-to-end R&D process; -

- gain the ability to increase capacity to meet spikes in activity and reduce waiting times; pay for resources consumed only as they are consumed;

- connect and use applications from an extensive list of Oracle partners with solutions dedicated on solving many of the industry challenges.

Every energy company wants to conceive more-ambitious projects, and ask ‘bigger’ questions, wants to drive faster time to value, and focus their engineers and R&D scientists on the strategic priorities, rather than managing infrastructure or fighting for capital budget on new hardware. Therefore, they need a cloud platform designed to enterprise workloads, which continuously innovates, with simple infrastructure to accelerate what’s possible.

Oracle Cloud Infrastructure has made available, ahead of other cloud vendors, a system featuring NVIDIA’s Ampere architecture, the next generation of data center computing hardware. The A100 shape has been tuned to give the best possible performance and adds to a growing list of performance systems that provide on-premises performance in the cloud.

Why is it relevant to the energy sector?

The A100 GPU includes many core architecture enhancements that deliver significant speed-ups for AI, HPC, and data analytics workloads compared to previous Nvidia’s V100 GPUs, not to mention to other manufacturers.

NVIDIA tests and comparisons between V100 and A100 are extremely clear on how the next generation of GPUs have impacts on AI performance.

Reaching an accurate model with the BERT-large workload (technique for Natural Language Processing) takes around 3.7 days using eight V100 compared to 14.8 hours using eight A100. Reducing the time to solution for areas of Artificial Intelligence, accelerate the development of innovative scientific journals and industry products. 1

But also on the execution of many of the workloads relevant to energy industry use cases.2

In that sense Energy companies have now available, at a several clicks distance, the capability to burst their increasing demand for HPC power with new and faster infrastructure, to test this new architecture and its business case impact on current R&D projects or to directly jump into this new market offering without spending efforts on previous HW investments.

Achieving therefore a much faster time to value, not only due to the A100 performance but also due to the administrative effort on bringing a new physical architectural component to the on-prem infrastructure.

A New and Improved Offering

Continuing the line of bare metal offerings (dedicated and non-shared compute instance), Oracle Cloud Infrastructure provides access to infrastructure without virtualization. Every aspect of the system has been upgraded to achieve A100’s 312 TeraFLOPS (TFLOPS) of performance.

Artificial Intelligence Workload Performance

Upgrading workloads to run on the A100 can drive significant compute cost savings and dramatically reduce model training turn-around time.

Specifications are one thing; empirical performance is another. To benchmark and compare the V100 to the A100, test followed NVIDIA’s Deep Learning Examples library. The workloads tested were BERT-Large for language modeling, Jasper for speech recognition, MaskRCNN for image segmentation, and GNMT for translation. All tests ran in the NVIDIA-prepared containers for PyTorch and Tensorflow and the tests were configured for 32-bit tensors.

Task

BM.GPU3.8

BM.GPU4.8

Speed-up Factor

BERT-Large

341 sequences per second

1773 sequences per second

5.2

Jasper

85 sequences per second

278 sequences per second

3.3

MaskRCNN

70 images per second

136 images per second

2

GNMT

96792 tokens per second

170382 tokens per second

1.75

Like Ian Buck, Vice President, NVIDIA Tesla Data Center Business states - “Our growing collaboration with Oracle is fueling incredible innovations across a wide range of industries and uses. By integrating NVIDIA’s new A100 Tensor Core GPUs into its cloud service offerings, Oracle is giving innovators everywhere access to breakthrough computing performance to accelerate their most critical work in AI, machine learning, data analytics and high-performance computing.”

Oracle’s BareMetal GPU Offering Specification Comparison

The BM.GPU4.8 gives a significant uplift in price performance over the BM.GPU3.8. Ten cents per GPU-hour, in addition to the new Ampere architecture and third-gen NVLink, the new version gains 176-GB GPU RAM, 1280-GB CPU memory, 25.6 TB of NVMe storage, 1550 total Gbps networking, and the ability to enable remote direct memory access (RDMA) for multisystem communication. RDMA allows for low latency connections between nodes and access to GPU memory without involving the CPU.

 

BM.GPU3.8

BM.GPU4.8

GPUs

8 V100 Tensor Core GPUs with 16 GB

8 A100 Tensor Core GPUs with 40 GB

CPUs

2 26 Intel Core at 2.0 GHz

2 32 Core AMD at 2.9 GHz

Memory

768 GB DDR4

2048 GB DDR4

Networking

2 25 Gbps

8 200 Gbps

SSDs

Up to 1 PB of block storage

4 6.4-TB NVMe SSD - Up to 1 PB Block

Price Per GPU-Hour

$2.95

$3.05

Try for Yourself

This new computational shape is already generally available! To learn more about how to apply this performance to your development life cycles, visit our GPUs on the Oracle Cloud Infrastructure page.

Resources

- Learn more about GPUs on Oracle Cloud Infrastructure

- Learn more about Oracle’s partnership with NVIDIA

References:

1 . Unified AI Acceleration for BERT-LARGE Training and Inference - NVIDIA A100 Tensor Core GPU Architecture document

2. A100 GPU HPC application speedups compared to NVIDIA Tesla V100 - NVIDIA A100 Tensor Core GPU Architecture document

Energy Industry Advisor

I inspire change in energy sector (Utilities & Oil&Gas) using Oracle technology as an enabler to drive innovation and business transformation.

I support Oracle EMEA Sales teams and Oracle Energy Customers on developing relevant solutions to tackle many of the Energy Industry challenges with enterprise grade solution and how they can deliver business value quickly by unlocking the power of Oracle’s complete range of services across SaaS, PaaS, IaaS and Industry Solutions.

I am currently developing work on two main strategically areas:
• How Oracle Cloud can help organizations accelerate primary energy generation, optimize operations, develop new products and prepare better for the future with High-Performance-Computing (HPC) capabilities.
• How can energy companies transform industrial operations at the Edge with IoT and AI, taking advantage of technology breakthroughs like “Cloud to Edge” and 5G.
#High-Performance-Computing #HPC #Edge-Computing #IoT #AI #Energy #Oil&Gas #Utilities

More about Pedro Marques Dias Valentim

Share