The Cloud: accelerating opportunities for HPC

Written by Cédric Gageat, on 11 June 2018

When considering the topic of Cloud in an HPC (High-Performance Computing) context, it can be tempting to make a direct comparison with an on-premise computing cluster and quickly arrive at objections such as a higher core cost on the Cloud, or the unavailability of certain technologies used in HPC. And thus, miss out on the entire value that the Cloud brings to institutions like CERN or companies like Schlumberger and Société Générale.

The mistake is to reduce the Cloud to a simple hosting offer, and thus, ignore all the new use cases made possible by the agility of the Cloud. Indeed, the Cloud brings more than hardware; it brings a managed service of resource allocation. Therefore, a paradigm shift must be made.

When resources are billed on a long-time scale, it is necessary to optimize their use to have the best capacity at the best cost. Thus, this implies choosing systems homogeneous in resources, sizing them according to anticipated needs, and ensuring that different needs can be orchestrated over time to saturate the available resources. On the contrary, in a Cloud approach, resources, perceived as unlimited, can be allocated on-demand according to the required load. The on-demand virtual cluster, or Cluster As A Service, allows each user to have their own computing infrastructure sized and adapted to solve their problem as efficiently as possible. Users get their results faster since their jobs run in parallel, and resources cost less since they are better utilized.

In the particular case of computing infrastructures dedicated to executing uncoupled and fault-tolerant tasks, there is an additional financial optimization. Indeed, to facilitate maintenance and load balancing of their data centers, cloud providers offer discounted machines on the condition that they can retrieve them at any time if necessary. By using this type of machine, the cost of computation is significantly reduced. It is sufficient in return to restart the interrupted task on another computing machine.

Recently, as part of a project around autonomous vehicles, ANEO has envisioned a fully managed and automated architecture of on-demand clusters. While awaiting the publication of a series of articles on the subject, we will be present on June 19 and 20 at the Teratec exhibition to discuss it.