The Embeddings that came in from the Cold

Data Science Milan
3 min readDec 7, 2020

“from word2vec to prod2vec”

On 24th November 2020 Data Science Milan has organized a webMeetup hosting Jacopo Tagliabue and Christine Yu to talk about the scalable solution to train “product embeddings” in e-commerce shopping, a new way in which neural network understand and process products.

“The Embeddings that came in from the Cold”, by Jacopo Tagliabue, Lead AI Scientist at Coveo, and Christine Yu, ML Developer at Coveo

Jacopo started introducing the embedding concept. Word Embedding is a method of extracting features out of text, to represent words in a lower-dimensional space by numeric vectors input that can be used into a machine learning model. The goal of Word Embedding is to preserve syntactical and semantic information, unlike methods such as Bag of Words (BOW), and TF-IDF that rely on the word count in a sentence without saving any syntactical or semantic information, and with the inconvenience of a sparse matrix if most of the elements are equal to zero. The Word Embedding approach use the dense distributed representation for each word, it means individual words are represented as real-valued vectors in a predefined vector space. The idea behind this method is the “distributional hypothesis”, that is the meaning of a word is determined by the context in which appears, words that have similar context will have similar meanings and similar representation in the semantic space.

Speaking about product embedding in e-commerce catalog, the step is to move from word2vec to prod2vec, it means conceptualize products as a list of items as with words. The e-commerce catalog can be represented in a three-dimensional space in which similar products are close, meanwhile unrelated products are far from each other, each point represents a product, and the color reflects the category (sport for instance) of that product.

In the e-commerce space very popular products, such as sneakers, are typically surrounded by semantically similar items and can be clustered together, with good embeddings, instead less popular products have worse embeddings and new products have no embeddings at all, because no online users have browsed them.

The challenge is the “cold start embeddings”, that is to generate good embeddings also for rare and new products. This challenge can be overcome exploiting meta-data contained in the product catalog. The strategy is to map meta-data of popular products learning from the relationship between meta-data in the catalog and target position in the three-dimensional space, between the product embedding and the category (ex. sport).

If the neural network used to train the model is able to predict the category based on the input for that embeddings, that means the input product embeddings are good to provide some meaningful things about the products.

Popular products are trained with a multi-input encoder (using BERT) and this mapping is applied to rare/new products with similar meta-data generating simulated vectors. The benefit from this approach it doesn’t require replacing existing infrastructure.

Recording&Slides:

video

slides

References:

https://www.aclweb.org/anthology/2020.ecnlp-1.2/
https://arxiv.org/abs/2007.14906
https://dl.acm.org/doi/10.1145/3383313.3411477

Interesting related links:

http://blog.coveo.com/solving-the-cold-start-problem/

Written by Claudio G. Giancaterino

--

--

Data Science Milan

Blog and summary of events of the Data Science Milan community.