This is absolutely the must-read post of the week. The intro:
The long reign of word vectors as NLP’s core representation technique has seen an exciting new line of challengers emerge: ELMo
, and the OpenAI transformer
. These works made headlines
by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks. Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.
I don’t follow NLP super-closely, but apparently these breakthrough results have been piling up over the course of 2018. I also hadn’t been deeply familiar with just how influential ImageNet was:
Transfer learning via pretraining on ImageNet is in fact so effective in computer vision that not using it is now considered foolhardy.
If this transition is real, it’s significant: advances of this import come along rarely. From the conclusion:
In light of the impressive empirical results of ELMo, ULMFiT, and OpenAI it only seems to be a question of time until pretrained word embeddings will be dethroned and replaced by pretrained language models in the toolbox of every NLP practitioner.