Recommendation system

Question

enter image description here

Hello!

In the context of collaborative filtering we must find the elements of the svd matrices that minimize the objective function attached. Since the majority of the actual rankings are missing (the r's) how will the heuristic search proceeds?

Thanks in advance,

Andreas

This site supports $\TeX$ formatting. Please past the quote as text, with $\TeX$ formulas rather then screenshot. — Tim, Jan 11 '20 at 14:58

Tim · Accepted Answer · 2020-01-11T15:20:58.033

You are referring to matrix factorization, where the $n \times m$ matrix of ratings $R$ is factorized to $n \times k$ matrix of parameters for users $P$, and $k \times m$ matrix for items $Q$, so that

$$ R \approx PQ $$

The state-of-art way of estimating the parameters is alternating least squares algorithm, where you start with setting $Q$ to some random values, then solve for $P$, fix values of $P$ to the estimates, and solve for $Q$, and repeat the two previous steps. Alternatively, you can just use stochastic gradient descent, or variants of it. Aberger reviews thos approaches and gives some benchmarks in Recommender: An Analysis of Collaborative Filtering Techniques.

Missing values are handled by iterating over the observed ratings only. As discussed by Koren et al (2009) in Matrix factorization techniques for recommender systems:

Earlier systems relied on imputation to fill in missing ratings and make the rating matrix dense. However, imputation can be very expensive as it significantly increases the amount of data. In addition, inaccurate imputation might distort the data considerably. Hence, more recent works suggested modeling directly the observed ratings only, while avoiding overfitting through a regularized model. To learn the factor vectors ($p_u$ and $q_i$), the system minimizes the regularized squared error on the set of known ratings:

$$ \underset{(u,i) \in \kappa}{\min} \;\sum_{u,i} \;(r_{ui} - p_u^T q_i)^2 + \lambda(\|p_u\|^2 + \|q_i\|^2) $$

Here, $\kappa$ is the set of the $(u,i)$ pairs for which $r_{ui}$ is known (the training set).

What you say is how to start the gradient descent (GD) with initial values. As far as i know GD is a local search algorithm so we must compare the value of the actual rankings with the prouct of p & q. If this is improved we proceed if not we change values. The question is since the majority of the values of the actual rankings are missing how can we do the comparison. I read that if we substitute the r with averages the result is biased. In a blog i found this:"R's are missing but we dont give a crap. We solve the same opt problem". How can we solve the problem while the r are missing? — Andreas Zaras, Jan 11 '20 at 10:14

Recommendation system

1 Answers1

Linked