6

AWS released an interesting feature as part of the SageMaker service called Object2Vec that lets you make an embedding for search out of pretty much anything: documents, users, products, recommendations, time series data, DNA, etc. The official documentation doesn't have a detailed explanation of the machine learning, which I'd like to understand thoroughly enough that I could implement Object2Vec directly in Tensorflow or PyTorch.

Searching for "Object2Vec" only brings me back to the same AWS documentation because it seems AWS invented the term. On the other hand, when I search "Word2Vec" I get a wide variety of explanations from different sites. Are there other names for Object2Vec that I could use in my search as I try to learn more about this model architecture?

I'd also be happy with an answer that thoroughly explains Object2Vec so that I can skip my search entirely, though I probably have too many questions to put into a single post:

  1. Are the two encoders and comparator updated jointly during training? Is a bad mistake by the comparator back propagated through the encoders?

  2. Are the outputs of the two encoders fed into a third shared embedding layer? I would need this universal embedding if I wanted to implement my own approximate cosine similarity search.

  3. How closely related is Object2Vec to Word2Vec? Doc2Vec?

  4. What are some common flaws of Object2Vec that I should be aware of?

Ryan Zotti
  • 5,927
  • 6
  • 29
  • 33

0 Answers0