ChatGPT has been trending for a while now, with some calling it the ‘google killer‘. The hero of the hour has been trained GPT-3one of the most famous large language models (LLMs) developed by OpenAI. GPT-3 has 175 billion parameters, making it one of the largest language models ever created. It can generate human-like text and perform a wide variety of tasks, including translating, summarizing, and even writing code.
While OpenAI hit that sweet spot with GPT-3, DeepMind, Google, Meta, and other players have also developed their own language models, some with 10 times more parameters than GPT-3.
Here is a list of the best alternatives to GPT-3 that you can try to build your own natural language processing tasks like chatbots.
Read: These 8 possible use cases of ChatGPT will blow your mind!
Developed by a group of more than 1,000 AI researchers, Bloom is an open-source multilingual language model that is considered the best alternative to GPT-3. It was trained on 176 billion parameters, which is one billion more than GPT-3 and required 384 graphics cards for training, each with over 80 gigabytes of memory.
The language model, developed through the BigScience Workshop by HuggingFace, has been trained in 46 languages and 13 programming languages and is also available in several versions with smaller parameters.
Developed by Google, GLaM is an expert mix (MoE) model, meaning it consists of several sub-models that specialize in different inputs. It is one of the largest models available with 1.2 trillion parameters across 64 experts per MoE layer. During inference, the model triggers only 97 billion parameters per token prediction.
Deep Mind developed Gopher with 280 billion parameters and specializes in answering scientific and humanities questions much better than other languages. DeepMind claims the model can beat language models 25 times its size, and compete with GPT-3 logical reasoning problems. Smaller versions with 44 million parameters are also available for easier research.
Read: Google GLaM vs DeepMind Gopher
NVIDIA and Microsoft teamed up to create one of the largest language models with 530 billion parameters. The model is trained on the NVIDIA DGX SuperPOD-based Selene supercomputer and is one of the most powerful English-language models. Megatron-Turing natural language generation (NLG) is a 105-layer, transformer-based LLM that outperforms advanced models at zero, one, and low shot settings with pinpoint accuracy.
Another model developed by DeepMind, and touted as the GPT-3 killer, Chinchilla is a compute-optimal model and is built on 70 billion parameters, but with four times more data. The model outperformed Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on several downstream evaluation tasks. It requires very little computing power for fine tuning and inference. The researchers found that instead of increasing the number of parameters, scaling the number of training tokens, or text data, is the key to better performing language models.
Another language model developed by Google, Palm, trained on 540 billion parameters, is a compact decoder-only transformer model trained with the Pathways system. This language model was the first to use the Pathways system to train large-scale models with 6144 chips, while also being the largest TPU-based configuration. The model outperformed 28 out of 29 NLP tasks in English compared to other models.
Read: David vs. Goliath: Does Chinchilla fare well against Google AI’s PaLM?
Google took a neural network-based NLP pre-training technique and developed it BERT (Bidirectional encoder representations of transformers). The model has two versions: Bert Base uses 12 layers of transformer blocks and 110 million trainable parameters, while Bert Large uses 24 layers and 340 million trainable parameters.
Developed by Google with 137 billion parameters, LaMDA was a revolution in the world of natural language processing. It was built by fine-tuning a group of Transformer-based neural language models. For the pre-training, the team created a data set of 1.5 trillion words, which is 40 times more than previously developed models. LaMDA has already been used for zero-shot learning, program synthesis, and BIG bench workshops.
Open Pretrained Transformer (OPT), built by Meta, is a language model with 175 billion parameters. It is trained on openly available datasets allowing for greater community involvement. The release comes with the pre-trained models along with code for training. The model is currently licensed non-commercially and is available for research purposes only. The model was trained and deployed using 16 NVIDIA V100 GPUs, which is significantly lower than other models.
Amazon also unveiled its large language model with 20 billion parameters. Alexa Teacher Models (Alexa™ 20B) is a seq-2-seq language model with low-learning SOTA capabilities. What makes it different from others is that it has an encoder and decoder to improve machine translation performance. With 1/8 number of parameters, Amazon’s language model outperformed GPT-3 on SQuADv2 and SuperGLUE benchmarks.