Graviton2 on Fire — Benchmarking Amazon’s Arm Processor on EMR
As a software engineer, I spent my career making Internet slightly better place for millions of users, touching their daily lives and making positive impression some way or the other. Writing protocol test suite for Internet backbone, building services for millions of subscribers, processing web content for improved search experience and lately expanding product selection that people can buy or sale on Internet, the journey took me through different phases of digital revolution. The Cloud Computing, Web Search Technology, Hadoop, and NoSQL are some of those which pushed the envelope of digital technology in last two decades. As the industry making steady progress on multiple fronts, the fast and powerful silicon chips achieved technological breakthrough in the field of Artificial Intelligence (AI). Thanks to the rapid innovation in solid state physics, modern machine can read and speak, play superior chess and arguably drive a car better than us. With machines claiming more and more cognitive load from us, we started giving them even more complex tasks to accomplish. Silicon became a major driving force of such a virtuous cycle of AI in recent time. The NVIDIA A100 Tensor Core GPU, Google Cloud TPU, Amazon Inferentia and Graviton2 carry the testimony of technology disruption the digital world is going through. In this article, I made an attempt to compare the Amazon Graviton2 powered EC2 with existing compute class instance offerings by AWS.
What is Graviton2
Graviton is custom AWS server-grade chip powered by Arm processor. In 2018 re:Invent, Amazon took a step towards making computation faster and cheaper for AWS customers by offering Graviton powered Elastic Compute Cloud (EC2) in addition to a variety of Intel and AMD processor based EC2. Launching its own silicon for cloud-native workload gave Amazon the ability to rapidly innovate, build and iterate on behalf of customers. Graviton2, the next generation chip with Arm Neoverse 64-core processor was launched in 2019 re:Invent, which outperformed older generation Graviton by an impressive margin. Faster than closest x86 family processors, Graviton2 claimed to provide up to 40% price-performance benefit to the customers.
Benchmarking Graviton2
Ever since its launch, Graviton2 was benchmarked against different type of cloud-native workloads like HTTPS load balancing with Ngnix, Memcached, X264 video encoding etc. AWS published performance benchmark in its announcement blog. Similar benchmarking performed by AWS partners like Treasure Data, independent reviewer like AnandTech and OSS community like KeyDB. The consistent performance boost across the board along with better price point was my motivation to do the same for our workload. We classify billions of products sold online using ML, Search and Similarity measures, so our data processing pipeline is always CPU bound. Thanks to AWS EMR for launching Graviton2 support in Q4 2020, Spark on EMR is my choice of distributed data processing engine. For this experiment, we chose a fairly simple CPU bound (java regex-based matching) workload on 2 TB data on EMR cluster with 10 EC2 instances. I used AWS S3 as the IO storage of data.
I built the code using JDK8 in Amazon Linux 2 platform running on x86(Intel Xeon) and aarch64(Graviton2) desktops. In all my runs, everything else kept identical except the EC2 instance type for obvious reason. I also chose closest compute class EC2 instances like C4(current generation Intel x86 instance), C5(new generation Intel x86 instance), M5a(new generation AMD x86 instance) and C6g(new generation Graviton2 aarch64 instance) for my runs. More details here –
AWS EMR Version: 5.31
Apache Spark Version: 2.4
Build OS: Amazon Linux 2
Java Version – OpenJDK 8(Amazon Corretto)
EMR Cluster Topology: 1 master & 10 core instances
Input Data Size: 2 TB
Output Data Size: 350 MB
EC2 variants used: c4.8xlarge, c5.9xlarge, m5a.8xlarge & c6g.8xlarge
We got little over 4% speedup on Graviton2(C6g) EC2 compared to newer generation compute class(C5) EC2, even with less vCPU (32 vs 36). One possible reason is each vCPU in Graviton2 is a physical core, whereas Intel Xeon powered C5 runs with 16 physical cores with SMT. The biggest benefit of C6g powered EC2 comes from its price point, 29% cheaper than C5, giving us 32% price-performance gain. The other variants like M5a or C4 performed significantly slower than the C5 or C6g, making them poor choice for CPU bound workload.
Another interesting observation is the Area Under a Curve (AUC) of cluster CPU utilization metrics. Graviton2 cluster reported lowest AUC with average CPU utilization at 75%. That essentially indicates the lowest tax on CPU under the same workload, again possibly due to the lower overhead for parallel worker threads on dedicated core and associated L2 cache. Lower utilization means lower power consumption. Together with lowest TDP (estimated 80–110W), Graviton2 turned out to be the most energy efficient in the class.
One other observation is worth mentioning here is Network bandwith between EC2 and S3. All the EC2 flavors under test do have ample network io bandwidth, so network never came as a bottleneck in my experiments.
Conclusion
With the significant price-performance benefit over latest x86 processor class, Graviton2 turned out to be the clear winner among wide array of compute flavors in AWS EC2. It looks ready to take on a range of server workloads, including CPU-hungry AI applications. At the same time, it looks doing the same job with lower carbon footprint. However, the transformation is not possible without greater support from FOSS communities, software vendors, AWS services and AWS SDK. While the Graviton software ecosystem is slowly gaining momentum, it looks right time to ride the wave. Lastly, the home-grown Graviton chip gave Amazon much needed momentum to innovate, build and operate on behalf of the customers. With Graviton on fire, it is still day one in Amazon.
Disclaimer: Since AWS EMR does not support compute instance C5a(compute class EC2 powered by AMD EPYC 7002 series processor), we chose the closest offering, M5a(general purpose EC2 powered by AMD EPYC 7000 series processor). The performance may vary due to clock speed difference between processor variants.