reuf.blogg.se - Transformers cybertronian language translator

This meant you couldn’t just speed up training by throwing more GPUs at the them, which meant, in turn, you couldn’t train them on all that much data. This is where Transformers changed everything. They were developed in 2017 by researchers at Google and the University of Toronto, initially designed to do translation. But unlike recurrent neural networks, Transformers could be very efficiently parallelized. And that meant, with the right hardware, you could train some really big models. GPT-3, the especially impressive text-generation model that writes almost as well as a human was trained on some 45 TB of text data, including almost all of the public web. While the diagram from the original paper is a little scary, the innovation behind Transformers boils down to three main concepts: Transformer diagram from the original paper So if you remember anything about Transformers, let it be this: combine a model that scales well with a huge dataset and the results will likely blow you away. Let’s start with the first one, positional encodings. Let’s say we’re trying to translate text from English to French. Remember that RNNs, the old way of doing translation, understood word order by processing words sequentially.

But this is also what made them hard to parallelize. Transformers get around this barrier via an innovational called positional encodings.