Usage

The complete Cleora interface is just a few simple options, which usually run well on default settings.

Screenshot

File Source

The method of uploading the input file. Currently Cleora supports the following upload methods:

Embedding initialization

Do you have your own initial embeddings? - the default answer is NO, which means that Cleora will initialize the embeddings. However, if you have meaningful entity embeddings you might want to try them out as Cleora initial embeddings. These own embeddings can be for example text embeddings computed from item descriptions, or image embeddings computed from image embeddings. For more information, see Your Own Initial Embeddings.

If you have own initial embeddings and select YES, there are three options for upload: from local disk, from Google Drive, and from previously loaded files.

Dimensions

The dimensionality of the embeddings you will receive. Usually, a number such as 512 or 1024 is a good idea. Larger embeddings can store more information, but they often need more training data to be relevant.

Note: This option disappears if own initial embeddings are loaded. In such a case, embedding dimensionality will be the same as the uploaded initial embeddings.

Max Number of Iterations

The number of iterations of the Cleora algorithm. For the usual sizes of enterprise relational data (10,000 - 10,000,000 users) the best value is usually between 2 and 4. However, we also encountered situations where the best results were obtained with 1 or 15 iterations. The intuition is that too small number can result in uninformative embeddings, however too many iterations can result in overfitting.

We recommend starting with smaller values, as overfitting will result in meaningless embeddings.