6

Is it possible to classify the points inside a square? i.e. if $a \le x \le b$ and $c \le y \le d$ then label is $+1$ otherwise $0$.

Is that possible using SVMs for example?

Andre Silva
  • 3,070
  • 5
  • 28
  • 55
user45882
  • 61
  • 1
  • Do you have to use an SVM? It's possible that this could be easier with another approach. –  May 21 '14 at 01:54
  • no, I don't need to use SVMs but the data in the process tend to a square when time tends to infinity. thanks. – user45882 May 21 '14 at 06:31
  • Please describe the kinds of data you are expecting: comments to the answers suggest to me that people might have different understandings of what data you have and what you are really trying to accomplish, so it really is important to edit this question to make it clearer. – whuber May 22 '14 at 16:30
  • whuber: you are right. I need to make the problem more precise. I'll probably just post a new question with a more precise problem statement. The answers given here is still quite good and one can still learn from them. Thanks all ! – user45882 May 23 '14 at 19:11
  • Essentially the same problem is addressed in this answer. http://stats.stackexchange.com/questions/164048/can-a-random-forest-be-used-for-feature-selection-in-multiple-linear-regression/164068#164068 – Sycorax Jan 14 '17 at 19:10

2 Answers2

6

Your specification of the square as having sides parallel to the boundary makes the problem relatively straightforward - as long as you don't require it to be written as a single SVM, but can settle for something like it.

One very simple estimate of the boundary of the square is for those points which have $+1$ labels to simply take the min and max of $x$ and the min and max of $y$. (It will be too small, of course, but if there are lots of points, not my much.)

enter image description here

If you then exclude the points with "$0$" labels above the top and below the bottom of that square, you can use SVM or something similar to SVM on the left and the right half of the $x$'s (it's only a 1-D problem for each side! Easy.)

-- so on the right side, for example, you just need the smallest $x$ with a "$0$"-label above the largest "$1$"-label $x$), ...

enter image description here

(For your next approximation of the square, you could split the difference between the biggest $1$-x and the next biggest $0$-x:

enter image description here

You can then repeat the procedure for the left, top and bottom boundaries. This gives a much better estimate of the boundary than the initial one.)

Of course this won't be exactly square (indeed to this point the algorithm is really for a rectangle with sides parallel to the axes), but the 4 sets of green and pink lines (not all shown) give you boundaries inside which you want to fit the square.

So from there it's a matter of expanding the sides of the blue on the "narrow" direction of the almost-square rectangle and shrinking the "wide" direction, until it's square. Note that you don't shrink/expand the sides evenly; if you want to be SVM-ish, you'd do it so as to even up the amount of remaining "wiggle room" (distance between blue and green or pink, whichever is closer) they have (that is, you'd move the side with the biggest gap between green and pink first, until you reach the size of the next-smallest gap between blue and either green or pink, then change both of those simultaneously until you hit the next smallest gap, and so on).

(With a bit of thinking most of this step can be done very simply.)

So this does some initial processing ($\cal{O}(n)$) to find the inner and outer boxes and the blue rectangle - essentially four trivial "SVM"s, followed by a simple set of expansion/shrinkage calculations to find an actual square.

If there really is a square that perfectly separates the "$1$" and "$0$" cases, that should work quite well and give a nicely SVM-like solution. (If there's not perfect separation, you may need to actually adapt this further in order to minimize misclassification.)

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Thanks but it seems that your method works when we do know the value of the boundaries. In my case, I don't know their value, all I know is that the incoming data will fill a square when time tends to infinity and I don't know how to exactly decide about when a point is not in the square since any given point may end up converging to the square in infinite time. thanks. – user45882 May 21 '14 at 06:50
  • There should be some points that you know are outside the square else why would you think to use SVM? – Aaron May 22 '14 at 00:06
  • Where do you think my method uses the values of the boundaries? Of course you'll never know perfectly whether an unlabelled point is in the true square or not, except as $n$ goes to infinity. That's true of nearly any classification problem. – Glen_b May 22 '14 at 01:15
2

SVM is a linear classifier which means that it can only learn to decide which side of a straight line the points should go. To make a square you obviously need four straight lines. So the short answer is no a linear SVM can not learn a square.

However, you can apply a kernel function to your data to map it into a higher dimension. If you choose the right kernel, there could be a high dimensional line that corresponds roughly to a square in the lower dimensional space.

Think of it like this. Your points lie on a piece of paper and the SVM is a pair of scissors that gets to make one straight cut. You want to capture just the points in the square with that one cut. If the page is flat you can't do it. If you pinch the paper in the middle of the square so that part is raised. you could cut below the pinch and with a single cut select those points.

Aaron
  • 3,025
  • 14
  • 24