FindSimilar items in a complex dataset

Question

I'm a Machine Learning newbie, but I want to learn more about this interesting topic using a practical example, on which I would appreciate any theoretical and practical help:

I have a database of "recipes" (~100,000). Each recipe is represented by an object of the following format:

{
  "id": "XXX-000-123",
  "origin": "asia",
  "spicyness": 2,
  "mainIngredient": "rice",
  "ingredients": 
  [
    {
      "name": "chicken",
      "amount": 300,
      "unit": "g"
    },
    {
      "name": "garlic",
      "amount": 10,
      "unit": "g"
    },
    {
      "name": "coconut-milk",
      "amount": 0.3,
      "unit": "l"
    }
  ]
}

The number of ingredients can range from 1 ... 20

Now, I want to create an algorithm which takes an object like this as an input and returns a list of similar recipes from my database.

My questions now are:

Is this scenario suitable for a ML approach? Could you point me to any existing algorithms to have a closer look at? Any suggestions for a practical implementation of the given scenario?

score 1 · Answer 1 · answered Sep 13 '19 at 09:28

Is this scenario suitable for a ML approach?

Your problem seems to be closer to Information Retrieval than machine learning.

Could you point me to any existing algorithms to have a closer look at? Any suggestions for a practical implementation of the given scenario?

Search engines, the data structure you are looking for is called inverted index. There are many standard libraries for Information Retrieval, ranging from lower-level stuff like Python's Whoosh to full-blown search engine frameworks like Elasticsearch or Solr.

Note that most Information Retrieval system implementations are based on Bag of Words model, so they mostly cover exact matches, and do not take semantics into account (for example they wouldn't know that 'yolks' and 'eggs' are similar ingredients). If you'd like to model semantics, then you'd need to do some sort of semantic search, which is much harder problem than standard IR (if you want to know more, read some reviews on neural networks in IR).

FindSimilar items in a complex dataset

1 Answers1