The new equation for ultimate AI energy efficiency.
Part V of our series, “Real Perspectives on Artificial Intelligence” features Rick Calle, AI business development lead for M12, Microsoft’s venture fund.
How energy-intensive is the AI infrastructure today? And what does that mean for the future of discipline?
Today’s AI algorithms, software and hardware combined are 10X to 100X more energy-intensive than they should be. In light of Microsoft’s recent announcement of its carbon negative commitment, my challenge to the industry is clear: let’s improve AI hardware and software so that we don’t overheat our planet.
The computing industry is always optimizing for speed and innovation, but not necessarily considering the lifetime energy cost of that speed. I saw an inflection point around 2012 when the progression of AI hardware and algorithmic capabilities began to deviate from Moore’s law. Prior to that, most AI solutions were running on one, maybe two processors with workloads tracking to Moore’s law. A steady progression of workloads from the Perceptron in 1958 to systems like Bidirectional LSTM neural networks for speech recognition in the mid-2000s.
Training AI models with multiple GPUs changed everything. After Alex Krizhevsky and team designed the AlexNet model with two GPUs in 2012, the computing power and electrical energy involved in training AI models took off at an entirely different pace: over 100X compounding every two years. Theirs was certainly not the first Convolutional Neural Network (CNN), but their “SuperVision” entry swept the field, winning the 2012 ImageNet competition by a huge margin. The next year nearly all competitors used CNNs and trained with multiple processors!
Fast forward to 2019, and quickly developing innovative neural networks for Natural Language Processing may require hundreds or thousands of distributed GPUs — like self-attention encoder-decoder models that employ Neural Architecture Search (NAS) methods. According to a recent University of Massachusetts Amherst study, the amount of CO2 emitted from energy generation plants to power the computation involved in creating a new state-of-the-art AI model, was the equivalent of five automobile lifetime’s worth of CO2 emissions. If that’s what it takes to train only one new AI model, you can see that it is just not compatible with prioritization of sustainability.
I believe we can incentivize the AI industry to make a change in the overall lifetime energy budget for AI workloads, and identify startups that are already committed to this cause.
Where do you see the biggest opportunities for the highest impact energy savings?
My colleagues and I think it’s joint optimization of three things: energy-efficient AI hardware, co-designed efficient AI algorithms and AI-aware computer networks.
The challenge is that the energy consumption of AI models is likely the last thing an AI algorithm developer is thinking about (unless they’re focused on mobile phones). Usually the early optimizations are foremost around performance. AI engineers often think: “what’s my peak accuracy” and “how fast can I train the model” — both of which need faster computing and more energy.
I support a new success metric to help incentivize the AI industry and startups to reduce energy and CO2 emissions at data center scale. We need to shift the focus to higher throughput and lower lifetime total cost of ownership of a system for given computing workloads. I stress “system” because often hardware marketing metrics forget to mention the energy cost of extra processors, memory, and networks required for an AI training system.
Success Metric = Workload Throughput ÷ [ ($ Cost of System) + ($ Cost of Lifetime Energy of System) ]
Throughput measures how fast we can compute the required AI algorithms. In the phraseology of the late Harvard Business School Professor Clayton Christensen, workload throughput is the “job” that matters at the end of the day. Not peak Floating Point Operations Per Second (FLOPS) which are magical, mystical marketing numbers only loosely related to getting the computational “job” done.
The denominator of this ratio is the computing hardware cost plus the lifetime energy cost of operating that hardware including cooling and any extra network and processors required.
With this new ratio, AI designers have far more degrees of freedom to optimize software, hardware and algorithms. For example, the power consumption of an AI chip itself — whether it is 50 watts or 450 watts — doesn’t matter as much. The lifetime energy consumption of many chips to deliver a certain workload throughput is what matters most. If we can maximize this success ratio, then by definition energy and CO2 emissions are reduced as well.
Why change the “performance” mindset that has been the status quo for so long?
AI has an existential problem. As its models continue to get larger, more computationally complex, and more accuracy is desired to reach human performance levels, the energy required to train those models increases exponentially. At some point if things continue as they have, researchers won’t be able to get enough computers or energy to create the new AI algorithms we want.
I’m really worried about that potentially stalling AI innovation. Not many research labs can string together 4,000 leading-edge processors and run them for weeks. They just don’t have the resources to deploy exascale computers. So at some point — without change — we have the potential to reach a ceiling of innovation. I’d hate to see another AI winter.
If our AI industry innovates around the success metric, then we will benefit from AI that is more compatible with sustainability, yet meets performance goals with lower lifetime energy hardware, more efficient AI algorithms and lower energy infrastructure. +