2

I am estimating parameters for a conditional random field using a structured support vector machine.

The data consists of a flat graph of $i%$ city blocks, where $y_i$ is the assignment of the the $i$th block to a particular neighborhood. The parameters of interest are the $\mathbf{w}$ which determine an affinity/disaffinity $\epsilon_{i,j}$ between neighboring blocks which affect whether they will be assigned to the same or different neighborhoods.

The scoring function for the model is as follows:

\begin{align} &\operatorname{E}(\mathbf{y}, \mathbf{s}, \mathbf{w}) = \sum_{<i j>}^{\mathcal{N}}\epsilon_{i,j}(y_i, y_j, \mathbf{s}_{i,j}, \mathbf{w}) + c\sum_i^M\operatorname{I}(y_i \neq y_i^*) \end{align}

where \begin{equation} \epsilon_{i,j}(y_i, y_j, \mathbf{s}_{i,j}, \mathbf{w}) = \begin{cases} 0 &y_i = y_j \\ \phi(\mathbf{s}_{i,j}, \mathbf{w}) &y_i \neq y_j \\ \end{cases} \end{equation}

and

\begin{align} \phi(\mathbf{s}_{i,j}, \mathbf{w}) = & w_0 + w_1\text{Rail}_{i,j} + w_2\text{River}_{i,j} \end{align}

and $\operatorname{I}$ is the indicator function, $y_i^*$ is the observed value of $y_i$, and $c$ is a regularizer. $\text{Rail}_{i,j}$ is a dummy variable that indicates whether two blocks are separated by railroad tracks. Similar for $\text{River}_{i,j}$

My academic discipline is used to seeing confidence intervals around parameter estimates, and I have been trying to figure out how to satisfy that expectation.

Given the dependence of the data and the difficulty of generating observations from the fitted model, bootstrapping does not seem like it will work.

Subsampling seems promising, but, for my data, some of of the important independent variables are rare--most blocks are not separated by the river or rail lines. Many subsamples may not have any variation on my independent variable, and therefore the associated parameter estimates will be ill defined. In another question, stratified sampling was suggested, but I have not seen that done with subsampling.

How might I provide confidence intervals/credible intervals for parameter estimates from a structured support vector machine's results of conditional random field model?

fgregg
  • 1,110
  • 1
  • 9
  • 18

0 Answers0