This week, Microsoft and Nvidia announced that theytrained what they claim is one of the largest and most capable AI languagemodels to date: Megatron-Turing Natural Language Generation (MT-NLP). MT-NLPcontains 530 billion parameters — the parts of the model learned fromhistorical data — and achieves leading accuracy in a broad set of tasks,including reading comprehension and natural language inferences.
But building it didn’t come cheap.Training took place across 560 Nvidia DGX A100 servers, each containing 8Nvidia A100 80GB GPUs. Experts peg the cost in the millions of dollars.
Like other large AI systems, MT-NLPraises questions about the accessibility of cutting-edge research approaches inmachine learning. AI training costs dropped 100-foldbetween 2017 and 2019, but the totals still exceed the compute budgets of moststartups, governments, nonprofits, and colleges. The inequity favorscorporations and world superpowers with extraordinary access to resources atthe expense of smaller players, cementing incumbent advantages.
For example, in early October,researchers at Alibaba detailed M6-10T, a language model containing 10 trillionparameters (roughly 57 times the size of OpenAI’s GPT-3) trained across512 Nvidia V100 GPUs for 10 days. The cheapest V100 plan available throughGoogle Cloud Platform costs 2.28 per hour, which would equate to over 300,000(
Google subsidiary DeepMind isestimated to have spent $35 million training a systemto learn the Chinese board game Go. And when the company’s researchers designeda model to play StarCraft II, theypurposefully didn’t try multiple ways of architecting a key component becausethe training cost would have been too high. Similarly, OpenAI didn’t fix amistake when it implemented GPT-3 because the cost of training made retrainingthe model infeasible.
Paths forward
It’s important to keep in mind thattraining costs can be inflated by factors other than an algorithm’s technicalaspects. As Yoav Shoham, Stanford University professor emeritus and cofounder ofAI startup AI21 Labs, recently told Synced,personal and organizational considerations often contribute to a model’s finalprice tag.
“[A] researcher might be impatient towait three weeks to do a thorough analysis and their organization may not beable or wish to pay for it,” he said. “So for the same task, one could spend100,000 or 1 million.”
Still, the increasing cost oftraining — and storing — algorithms like Huawei’s PanGu-Alpha,Naver’s HyperCLOVA, and theBeijing Academy of Artificial Intelligence’s Wu Dao 2.0 isgiving rise to a cottage industry of startups aiming to “optimize” modelswithout degrading accuracy. This week, former Intel exec Naveen Rao launched anew company, MosaicML, to offer tools, services, and training methods thatimprove AI system accuracy while lowering costs and saving time. MosaicML —which has raised $37 million in venture capital — competes with CodeplaySoftware, OctoML, Neural Magic, Deci, CoCoPie, and NeuReality in a marketthat’s expected to grow exponentially in the coming years.
In a sliver of good news, the cost ofbasic machine learning operations has been falling over the past few years. A2020 OpenAI survey found thatsince 2012, the amount of compute needed to train a model to the sameperformance on classifying images in a popular benchmark — ImageNet — has beendecreasing by a factor of two every 16 months.
Approaches like network pruning priorto training could lead to further gains. Research has shown that parameterspruned after training, a process that decreases the model size, could have beenpruned before training without any effect on the network’s ability to learn. Calledthe “lottery ticket hypothesis,” the idea is that the initial values parametersin a model receive are crucial for determining whether they’re important.Parameters kept after pruning receive “lucky” initial values; the network cantrain successfully with only those parameters present.
Network pruning is far from a solvedscience, however. New ways of pruning that work before or in early trainingwill have to be developed, as most current methods apply only retroactively.And when parameters are pruned, the resulting structures aren’t always a fitfor the training hardware (e.g., GPUs), meaning that pruning 90% of parameterswon’t necessarily reduce the cost of training a model by 90%.
Whetherthrough pruning, novel AI accelerator hardware, or techniques likemeta-learning and neural architecture search, the need for alternatives tounattainably large models is quickly becoming clear. A University ofMassachusetts Amherst study showed thatusing 2019-era approaches, training an image recognition model with a 5% errorrate would cost $100 billion and produce as much carbon emissions as New YorkCity does in a month. As Spectrum’s editorial team wrote in a recentpiece: “we must either adapt how we do deep learning or face a future of muchslower progress.
Microsoft and Nvidia team up to train one of the world's largest language models
Microsoft and Nvidia claim to have trained one of the world’s largest natural language models, containing 530 billion parameters.
AI technology could reshape the U.S. government, but should it?
Federal spending on AI in the U.S. rose by 50% between 2018 and 2020, making it the fastest rate of growth for any emerging technology.
A MESSAGE FROM SAMSUNG
The data economy: How AI helps us understand and utilize our data
Whether pulled from the cloud, your phone, TV, or an IoT device, the vast range of connected streams provide data on just about everything that goes on in our daily lives. But what do we do with it? HARMAN’s Chairman Young Sohn sits down with international journalist Ali Aslan to discuss the symbiotic relationship of AI & data and the ethical use of both - -- including bias, privacy, and security.
AI lab DeepMind becomes profitable and bolsters relationship with Google
While this could be great news for DeepMind, which has always hemorrhaged money, the AI lab’s financial reports are also notably vague.
Facebook quietly acquires syntheticdata startup AI.Reverie
Facebook has quietly acquire AI.Reverie, astartup that developed a platform and tools for synthetic data generation.
DeepMind is developing one algorithm to rule them all |
---|
Deep learning powers some of the most iconic AI apps, but deep learning models need retraining to be applied in new domains. |
Google delivers collection of smart device 'essentials' for the enterprise |
---|
Google has announced Intelligent Product Essentials, an array of components for managing and leveraging data from smart devices. |
Facebook introduces dataset and benchmarks to make AI more 'egocentric' |
---|
Facebook’s latest long-term research project, Ego4D, focuses on developing AI with an ‘egocentric,’ first-person perspective. |
Americans Need a Bill of Rights for an AI-Powered World |
---|
The White House Office of Science and Technology Policy is developing principles to guard against powerful technologies—with input from the public. (via WIRED) |
China has won AI battle with U.S., Pentagon's ex-software chief says |
---|
China has won the artificial intelligence battle with the United States and is heading towards global dominance because of its technological advances, the Pentagon’s former software chief told the Financial Times. (via Reuters) |
These neural networks know what they’re doing |
---|
MIT researchers have demonstrated that a special class of deep learning neural networks is able to learn the true cause-and-effect structure of a navigation task during training. (via MIT News) |
Duke Professor Wins $1 Million Artificial Intelligence Prize, A ‘New Nobel’ |
---|
Cynthia Rudin becomes second recipient of AAAI Squirrel AI Award for pioneering socially responsible AI (via Duke Pratt School of Engineering) |