Application of Embeddings

Embeddings represent various important aspects of entities. Usually, similar entities are represented by vectors which are close by in the embedding space.

Embeddngs are a perfect tool for similar entity search. Entities which behave in a similar way are usually close by in the embedding space. All that is needed is to compute the cosine distance between embeddings. Or, for visual inspection, we can apply a good dimensionality reduction technique - for example, UMAP.

In the following example, we will use the dataset from the REES46 Marketing Platform https://rees46.com/ which is freely available at https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store. The same example can be found in the Tutorial within Cleora platofrm. We embed the product brands.

Consider these 3 companies, all producing projectors:

We can see that their Cleora embedding vectors are located very close to each other. This means that with Cleora, we can easily discover that they operate a similar business.

cinemood optoma

xgimi

Other types of companies are located elsewhere in the embedding space. For example, companies Rossignol and Nordica, which produce winter sports gear, are located in a totally different cluster.

cinemood optoma

Embeddings as Input to ML Models

Embeddings are used in the most wide-spread approaches towards modeling recommendation, propensity, churn, and other predictive tasks for user behavior.

Usually, 2 types of embeddings are needed: user embeddings and item/category/brand/... embeddings. Item embeddings are computed straight away by algorithms such as Cleora. User embeddings are constructed from item embeddings they have interacted with. Depending on available resources, some of the following approaches are possible:

Fixed user and item embeddings give many advantages performance-wise. For example, it is possible to store similarity scores between user representations and item embeddings, instead of whole embeddings. Similarly, it is possible to store similarity scores between pairs of item embeddings. Such similarity scores can be stored for a limited number of most similar pairs. Thanks to this, storage of very long embeddings becomes unnecessary, and similarity scores are computed only once for each embedding version, which saves time and resources.

Interesting approaches are presented for example in these papers: