In a talk, now available online, NVIDIA Chief Scientist Bill Dally discusses a significant shift in how computer performance is delivered in a post-Moore's law era. According to Dally, each new processor requires ingenuity and effort to invent and validate fresh ingredients, which is a departure from the past when engineers relied on the physics of smaller, faster chips.
Dally, who leads a team of over 300 researchers at NVIDIA, highlights the remarkable 1,000x improvement in single GPU performance on AI inference achieved over the past decade. This advancement, dubbed "Huang's Law" after NVIDIA's founder and CEO Jensen Huang, was a response to the rapid growth of large language models used in generative AI. Dally explains that the hardware industry had to keep up with this demand.
In his talk, Dally explains the factors behind the 1,000x gain. The largest contribution came from finding simpler ways to represent the numbers used in computer calculations, resulting in a sixteen-fold increase. The latest NVIDIA Hopper architecture utilizes a dynamic mix of eight- and 16-bit floating point and integer math, specifically designed for generative AI models. Dally also discusses the performance gains and energy savings achieved through advanced instructions that optimize GPU work organization.
Additionally, the NVIDIA Ampere architecture introduced structural sparsity, a technique that simplifies AI model weights without compromising accuracy. This innovation contributed to a 2x performance increase. Dally also mentions the impact of NVLink interconnects and NVIDIA networking in compounding the 1,000x gains in single GPU performance.
Despite migrating GPUs to smaller semiconductor nodes, Dally notes that this technology only accounted for a 2.5x increase in total gains over the decade. This is a significant departure from the past, where Moore's law predicted a doubling of performance every two years through shrinking chips.
However, Dally remains optimistic about the future. He believes that Huang's law will continue to drive advancements, with opportunities to simplify number representation, increase sparsity in AI models, and design better memory and communication circuits. Dally concludes by stating that it is an exciting time to be a computer engineer, as each new chip and system generation presents new opportunities for innovation.

