For only a year and a half, perhaps the most powerful artificial intelligence language in the world was GPT-3, which was created in OpenAI, founded by Elon Musk. Megatron – Turing Natural Language Generator (MT – NLG, or Megatron – Turing Natural Language Generator) is now the largest and most powerful language generation model in the world. The 530 billion parameters processed by Megatron – Turing are three times those of GPT – 3.
The number of parameters characterizes the amount of data used in AI education, and thus the quality of the data it produces. The 175 million parameters of the AI GPT-3 were also very large compared to its predecessor, GPT-2, which handled only one and a half billion parameters. Many parameters also had an effect: GPT-3 had capabilities that no one expected, such as the ability to write a program, assemble or replace missing parts from images.
Megatron Turing tripled on this. For training, Nvidia provided 560 servers, each containing eight video cards with a capacity of 80 GB. A dataset called Pile was used for training, which includes the full Wikipedia and PuibMed database of medical articles and the full GitHub source code manager, among others. The 825 GB text stack has been sorted for higher quality, and the data has been added to Common Crawl, a non-profit organization that collects billions of web pages in a data-mining-ready format.
Photo: OpenAI
The end result of the $85 million training is a language model capable of completing sentences, interpreting text, arguing, drawing linguistic inferences, and interpreting words. Like GPT-3, unexpected abilities will only appear when Megatron-Turing is widely used. However, the latter is yet to come, as it has not yet been announced when corporate developers will be able to try it out.