Your Own Initial Embeddings

Cleora generally handles embedding initialization on its own. However, it is possible that your relational data has some additional properties, which you want to use to enhance Cleora embeddings. For example, your products could have images or photos, which are represented with some embedding vectors obtained from methods such as SimCLR or CLIP. You can use these embeddings for initializing Cleora - this way, the Cleora embeddings will express not only behavioral relations, but also knowledge about image similarities.

Note 1: Initial embeddings can be given only to entities in Column 2 in the Input File.

Note 2: You can use any embedding dimension. Cleora will adjust.

Note 3: You can give initial embeddings only to SOME entities. Cleora will utilize the provided embeddings when available, and for other entities, it will initialize embeddings using its standard initialization method.

Option 1: .tsv file

Prepare a .tsv file in which the first column contains entity identifiers (corresponding to Column 2 in the Input File), and other columns contain embeddings.

>>> your_embedding_matrix.shape
(171002, 129)
# there are 171002 entity ids
# the embedding size is 128, the first column is the entity id

>>> np.savetxt("your_initial_embeddings.tsv", your_embedding_matrix, delimiter="\t")

Option 2: .npz file

>>> your_entity_ids_arr.shape
(171002, )
# there are 171002 entity ids
>>> your_embedding_matrix.shape
(171002, 128)
# the embedding size is 128
np.savez("your_initial_embeddings.npz", embeddings=your_embedding_matrix, entity_ids=your_entity_ids_arr)

IMPORTANT: your_entity_ids_arr should correspond to your_embedding_matrix and maintain the same ordering; the first entity ID from your_entity_ids_arr will match the first row of your_embedding_matrix.

Examples of Bad Initial Embeddings

Here are examples of some bad practices regarding initial embeddings: