2

I'm implementing a scale-space with o octaves and s scale levels in each octave. Each octave is half the size the previous. I have keypoints found in each octave, I need to approximate the real location of the keypoint to their location in the original image (before the reduction). I need it to be more accurate then just doubling the coordinates. I know how to approximate sub-pixel in each octave, but I need "between" the octaves.

Thanks matlabit

matlabit
  • 837
  • 1
  • 7
  • 16

1 Answers1

5

First, you need to ensure that the pixel has minimal value in its 8-neighborhood. SIFT, SURF and other keypoint detectors filter these pixels in step called "non-maximal suppression". This is basically a necessary condition for second-order approximation we will use to determine sub-pixel location.

If you imagine the scale space slice $I$ as a 3D surface, the pixel lays in a valley sorrounded by pixels with higer value. The shape can be well approximated by a parabola surface and the corrected pixel position is at its minimum.

The pixel value $I(p)$ at location $p=(x,y)^{\textbf{T}}$ can be approximated by second order Taylor expansion:

$$I(p+\delta)\approx I(p) + \delta^{\textbf{T}}g+\frac{1}{2}\delta^{\textbf{T}}H\delta$$

where $g$ and $H$ are gradient (2-vector of first order derivatives) and Hessian ($2\times 2$ matrix of second-order derivatives) at $(x,y)$.

The $\delta$ is a sub-pixel correction factor we want to compute.

Since the parabola has minimum at point where its derivatives are equal to zero, we obtain derivative of right-hand side of the above Taylor expansion (with respect to $\delta$) and place it equal to zero:

$$g+H\delta=0$$

From this we can compute the correction vector:

$$\delta=-H^{-1}g$$

Finally, the desired sub-pixel location is given by $p + \delta$.

The gradient and Hessian are obtained using finite differencing:

$$g_{1}=\left( I(x+1,y)-I(x-1,y) \right)\cdot 1/2$$ $$g_{2}=\left( I(x,y+1)-I(x,y-1) \right)\cdot 1/2$$ $$H_{1,1}=I(x+1,y)+I(x-1,y)-2I(x,y)$$ $$H_{1,2}=H_{2,1}=\\ \left(I(x+1,y+1)+I(x-1,y-1)+I(x+1,y-1)+I(x-1,y+1)\right)\cdot 1/4$$

So the algorithm is:

  1. Check if the pixel is a local minimum (sorrounded by pixels with higher value).
  2. Compute $g$ and $H$ using finite differencing formulas above.
  3. Compute $\delta$.

You can repeat steps 2. and 3. for extra accuracy, but just one step is fine for our needs.

The value at sub-pixel location $I(p+\delta)$ can be obtained by well known bilinear or bicubic interpolation.

We did the interpolation with respect to $x,y$, but you can perform this interpolation with respect to scale as well. This will lead to 3-vector gradient and $3\times 3$ Hessian, but the Taylor expansion formula holds.

You can take a look on OpenSURF implementation in C++ or C# which has this implemented.

The above derivation is actually a Newton's Method used in multivariate optimization.

Libor
  • 4,135
  • 21
  • 37