0

How would you learn a function with the emphasis on feature interactions?

I have the standard assignment problem:

$$ \max_{x_{ij}} \sum_{(i, j)} w_{ij} x_{ij}, $$

where $w_{ij}$ is the weight of the edge between $i \in A$ and $j \in B$ and $x_{ij} \in \{0, 1\}$ s. t. $\sum_i x_{ij} = 1 \ \forall j $ and $\sum_j x_{ij} = 1 \ \forall i$. And the equal number of persons and objects, $|A| = |B|$.

The weights are random variables and I'm estimating $w_{ij}$ from the historical data.

The data for any particular $(i, j)$ is not available. Instead, for every element $i \in A$ and $j \in B$, I have independent variables $\textbf{x}_i$ and $\textbf{x}_j$ that define these elements. The historical weights are the function of these vectors as well: $w(k, l) = f(\textbf{x}_k, \textbf{x}_l)$ for some elements $k \in A_H, l \in B_H$.

Now, I can estimate $\hat{f}$ and get $\hat{w}$ for the current pairs $(i, j)$.

Suppose the underlying process is just a sum $w(i, j) = f_A(\textbf{x}_i) + f_B(\textbf{x}_j)$. Then $\hat{w}(i, j) = c_i + c_j$ and $\sum_{(i, j)} \hat{w}_{ij} x_{ij}$ is a constant that does not depend on the constrained $x_{ij}$.

The optimal assignment makes sense if the process involves interactions between $i$ and $j$ and the estimator captures them. Something like $w(i, j) = f_A(\textbf{x}_i) + f_I(\textbf{x}_i, \textbf{x}_j) + f_B(\textbf{x}_j)$.

For that matter, the assignment problem needs only $f_I(\textbf{x}_i, \textbf{x}_j)$. But it is unclear how to reduce the error rates of this particular part. I looked into tree-based methods:

  • Adding interactions manually help GBM packages pick the interactions that work, but this approach requires manual input for every model, or the number of features grows exponentially.
  • Non-greedy split finding algorithms find some interactions, but I'm not familiar with their limitations.

The best strategy seems to be this (GBRT-based):

  1. Learn $f_A(\textbf{x}_i) + f_B(\textbf{x}_j)$ with original features by greedy algorithms and boosting.
  2. Add standard interactions between features and learn $f_A(\textbf{x}_i) + f_I(\textbf{x}_i, \textbf{x}_j) + f_B(\textbf{x}_j)$.
  3. Find the difference between error rates in (1) and (2) to measure the variance related to interactions and contribution of $f_I$ to the weights.
  4. Solve the assignment problem. Get the optimal total value $V^*$.
  5. Compare $V^*$ in (4) with the value under random assignment $E[V_R]$ and errors in (3). This gives the approximate benefits of the optimal assignment.

I still don't have the explicit $\hat{f}_I$ this way, but it seems possible to calculate the values it returns.

Do you know better ways of learning weights for assignment problems?

Anton Tarasenko
  • 289
  • 4
  • 11
  • Related questions: (1) https://stats.stackexchange.com/questions/186063/modeling-worker-performance-parameters-for-optimum-allocation-of-tasks-to-worker (2) https://stats.stackexchange.com/questions/328348/office-desk-rota-assignment-problem (3) https://datascience.stackexchange.com/questions/9328/machine-learning-worker-performance-features-for-optimum-allocation-of-tasks-to – Anton Tarasenko May 07 '19 at 17:45
  • More of related questions: (1) What are best practices in identifying interaction effects? https://stats.stackexchange.com/questions/4901/what-are-best-practices-in-identifying-interaction-effects (2) Including the interaction but not the main effects in a model https://stats.stackexchange.com/questions/11009/including-the-interaction-but-not-the-main-effects-in-a-model – Anton Tarasenko May 15 '19 at 17:36

0 Answers0