AWS debuted several new computing options, some based on its own new custom silicon designs, as well as a staggering array of data organization, analysis, and connection tools and services. The sheer number and complexity of many of the new features and services that were unveiled makes it difficult to keep track of all the choices now available to customers. Rather than being the outcome of unchecked development, however, the abundance of capabilities is by design. New AWS CEO Adam Selipsky was keen to point out during his keynote (watch above) and other appearances that the organization is customer “obsessed.” As a result, most of its product decisions and strategies are based on customer requests. It turns out that when you have lots of different types of customers with different types of workloads and requirements, you end up with a complex array of choices. Realistically, that kind of approach will reach a logical limit at some point, but in the meantime, it means that the extensive range of AWS products and services likely represent a mirror image of the totality (and complexity) of today’s enterprise computing landscape. In fact, there’s a wealth of insight into enterprise computing trends waiting to be gleaned from an analysis of what services are being used to what degree and how it has shifted over time, but that’s a topic for another time. In the world of computing options, the company acknowledged that it now has over 600 different EC2 (Elastic Compute Cloud) computing instances, each of which consists of different combinations of CPU and other acceleration silicon, memory, network connections, and more. While that’s a hard number to fully appreciate, it once again indicates how diverse today’s computing demands have become. From cloud native, AI or ML-based, containerized applications that need the latest dedicated AI accelerators or GPUs to legacy “lifted and shifted” enterprise applications that only use older x86 CPUs, cloud computing services like AWS now need to be able to handle all of the above.
As with many instances, Hpc7g is targeted at a specific set of workloads—in this case High Performance Computing (HPC), such as weather forecasting, genomics processing, fluid dynamics, and more. More specifically, it’s designed for bigger ML models that often end up running across thousands of cores. What’s interesting about this is it both demonstrates how far Arm-based CPUs have advanced in terms of the types of workloads they’ve been used for, as well as the degree of refinement that AWS is bringing to its various EC2 instances. Separately, in several other sessions, AWS highlighted the momentum towards Graviton usage for many other types of workloads as well, particularly for cloud-native containerized applications from AWS customers like DirecTV and Stripe. One intriguing insight that came out of these sessions is that because of the nature of the tools being used to develop these types of applications, the challenges of porting code from x86 to Arm native instructions (which were once believed to be a huge stopping point for Arm-based server adoption) have largely gone away. Instead, all that’s required is the simple switch of a few options before the code is completed and deployed on the instance. That makes the potential for further growth in Arm-based cloud computing significantly more likely, particularly on newer applications.
Of course, some of these organizations are working toward wanting to build completely instruction set agnostic applications in the future, which would seemingly make instruction set choice irrelevant. However, even in that situation, compute instances that offer better price/performance or performance/watt ratios, which Arm-based CPUs often do have, are a more attractive option. The new architecture is designed to scale across thousands of cores, which is what these enormous new models, such as GPT-3, require. In addition, Inferentia2 includes support for a mathematical technique known as stochastic rounding, which AWS describes as “a way of rounding probabilistically that enables high performance and higher accuracy as compared to legacy rounding modes.” To take best advantage of the distributed computing, the Inf2 instance also supports a next generation version of the company’s NeuronLink ring network architecture, which supposedly offers 4x the performance and 1/10 the latency of existing Inf1 instances. The bottom line translation is that it can offer 45% higher performance per watt for inferencing than any other option, including GPU-powered ones. Given that inferencing power consumption needs are often 9 times higher than what’s needed for model training according to AWS, that’s a big deal. The third new custom-silicon driven instance is called C7gn, and it features a new AWS Nitro networking card equipped with fifth-gen Nitro chips. Designed specifically for workloads that demand extremely high throughput, such as firewalls, virtual network, and real-time data encryption/decryption, C7gn is purported to have 2x the network bandwidth and 50% higher packet processing per second than the previous instances. Importantly, the new Nitro cards are able to achieve those levels with a 40% improvement in performance per watt versus its predecessors. Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech.