An Introduction to Transfer Learning and HuggingFace
“NLP never stops”
On 16th March 2020 NLP Zurich has organized its first webMeetup hosting Thomas Wolf from HuggingFace and with the attendance of Data Science Milan’s members.
“An Introduction to Transfer Learning and HuggingFace”, by Thomas Wolf, Chief Science Officer, HuggingFace
Transfer learning is a technique which consists to train a machine learning model for a task and use the knowledge gained in it to another different but related task.
It’s a popular approach used to train deep learning in computer vision and natural language tasks where are used pre-trained models to save computational time required to develop neural networks for these problems.
Usually in the traditional supervised learning approach, machine learning models are trained on a labeled data set for the same task and domain expecting it to perform well on unseen data of the same task and domain.
Given data for some other task or domain, are required again labeled data of the same task or domain that can be used to train a new model and expect it to perform well on this new data.
The traditional approach breaks down when there aren’t sufficient labeled data for the task or domain we care about to train a reliable model. You can’t reuse an existing model to perform a new task, as the domain differ.
The idea behind Transfer Learning is to try to store the knowledge gained in solving the source task in the source domain and apply it to another similar problem of interest, as Thomas explained, is the same concept of the learning process by experience. We can learn something and we can use this knowledge to solve a similar task.
So why we should use Transfer Learning in NLP?
● Many NLP tasks, such as question & answering, share common knowledge about language (underlying semantics…)
● The opportunity to reuse the huge quantity of unlabeled texts from the web
● Lack of annotated data
Transfer Learning can be viewed as a result in the state of the art from many NLP tasks, an example of application comes from the Named Entity Recognition, started around 15 years ago.
There are several kinds of transfer learning in everyday NLP. Sequential transfer learning is the technique with biggest improvements so far. The general practice is to pretrain representations on a large unlabelled text corpus using a method at choice and then to adapt these representations to a supervised target task using labelled data.
Many currently successful pretraining approaches are based on language modeling. Benefits of language modeling are that it reduces the need for annotated data and that many languages have enough text available to learn consistent models. In addition, language modeling is versatile and enables learning both sentence and word representations with a variety of objective functions.
Empirically, language modelling works better than other pretraining tasks such as translation or autoencoding.
For adapting a pretrained model to a target task, there are several workflow directions. The easiest is to remove pre-training task head if it’s not useful for te target task. Another workflow is to keep the pre-trained model internals unchanged adding more linear layers on top of a pre-trained model, or to use the model output as input to a separate model, which is often beneficial when a target task requires interactions that are not available in the pre-trained embedding. Last option is to modify the pre-trained model internal architecture in order to adapt to a different target task such as one with several input sequences to initialize as much as possible of a structurally different target task model.
The goal of hugging face is to democratize NLP, they started as a conversational AI company and the best way to achieve this objective was to catalyse and democratize research-level work in NLP, as a whole, by breaking barriers around frameworks by knowledge sharing, as develop & open-source tools for Transfer Learning in NLP. Examples of open-source tools are: Transformers library and Tokenizers library. Have a look at the video.
References:
The State of Transfer Learning in NLP
HuggingFace Tutorials:
Written by Claudio G. Giancaterino