How would you learn a function with the emphasis on feature interactions?
I have the standard assignment problem:
$$ \max_{x_{ij}} \sum_{(i, j)} w_{ij} x_{ij}, $$
where $w_{ij}$ is the weight of the edge between $i \in A$ and $j \in B$ and $x_{ij} \in \{0, 1\}$ s. t. $\sum_i x_{ij} = 1 \ \forall j $ and $\sum_j x_{ij} = 1 \ \forall i$. And the equal number of persons and objects, $|A| = |B|$.
The weights are random variables and I'm estimating $w_{ij}$ from the historical data.
The data for any particular $(i, j)$ is not available. Instead, for every element $i \in A$ and $j \in B$, I have independent variables $\textbf{x}_i$ and $\textbf{x}_j$ that define these elements. The historical weights are the function of these vectors as well: $w(k, l) = f(\textbf{x}_k, \textbf{x}_l)$ for some elements $k \in A_H, l \in B_H$.
Now, I can estimate $\hat{f}$ and get $\hat{w}$ for the current pairs $(i, j)$.
Suppose the underlying process is just a sum $w(i, j) = f_A(\textbf{x}_i) + f_B(\textbf{x}_j)$. Then $\hat{w}(i, j) = c_i + c_j$ and $\sum_{(i, j)} \hat{w}_{ij} x_{ij}$ is a constant that does not depend on the constrained $x_{ij}$.
The optimal assignment makes sense if the process involves interactions between $i$ and $j$ and the estimator captures them. Something like $w(i, j) = f_A(\textbf{x}_i) + f_I(\textbf{x}_i, \textbf{x}_j) + f_B(\textbf{x}_j)$.
For that matter, the assignment problem needs only $f_I(\textbf{x}_i, \textbf{x}_j)$. But it is unclear how to reduce the error rates of this particular part. I looked into tree-based methods:
- Adding interactions manually help GBM packages pick the interactions that work, but this approach requires manual input for every model, or the number of features grows exponentially.
- Non-greedy split finding algorithms find some interactions, but I'm not familiar with their limitations.
The best strategy seems to be this (GBRT-based):
- Learn $f_A(\textbf{x}_i) + f_B(\textbf{x}_j)$ with original features by greedy algorithms and boosting.
- Add standard interactions between features and learn $f_A(\textbf{x}_i) + f_I(\textbf{x}_i, \textbf{x}_j) + f_B(\textbf{x}_j)$.
- Find the difference between error rates in (1) and (2) to measure the variance related to interactions and contribution of $f_I$ to the weights.
- Solve the assignment problem. Get the optimal total value $V^*$.
- Compare $V^*$ in (4) with the value under random assignment $E[V_R]$ and errors in (3). This gives the approximate benefits of the optimal assignment.
I still don't have the explicit $\hat{f}_I$ this way, but it seems possible to calculate the values it returns.
Do you know better ways of learning weights for assignment problems?