I've got a online classification problem where I predict a class label {+1, -1}
for an object and then show it to a user to get a real label. My task is to minimize a number of -1
objects shown to a user.
Obviously, the algorithm will not converge if it learns only on +1 -> -1
missclassifications, so a certain ammount of -1
objects needs to be shown to users anyway. The problem is to minimize a number of -1
objects shown to a user by selecting only the ones most valuable for learning.
The above problem is being solved in a field of 'active learning', where a learner gets to decide whether to request a laber for an object from user or not. Searching for a solution that does not imply i.i.d., I've found this work, where the learner decides on whether to require a label for an object by its margin. This is intuitive: the lesser margin means that we're more likely to make an error, which makes an object more valuable for learning. The probability to require an object depends on a margin and some parameter b
, the optimal value for which is given for a linearly separable case, which applies for my problem.
The algorithm is simple and goes as follows:
initialize weight vector w(t) with zeros
for every input vector x(t):
calculate margin p(t) = w(t-1) * x(t)
predict a class with Hebb's rule y(t) = sgn(p(t))
toss a coin with P(heads) = b / (b + |p(t)|)
if Heads:
query the label y' of x
if y' != y:
update weights w(t+1) = w(t) + y'*x
else:
w(t+1) = w(t)
Basically, I need to use this rule for querying user only if object prediction is -1
, and if it is +1
, I require a label anyway. My question is: would my algorithm converge in that setting? The perceptron algorithm itself and the modified version from the above article are symmetrical in respect to class labels. In my setting there will surely be more errors on +1
objects, than on -1
's. Does it hurt convergence? If it does, what modification to that algorithm can bring it back to 'symmetry' between positive and negative weight updates?
The modified algorithm querying +1
s in all cases is as follows:
initialize weight vector w(t) with zeros
for every input vector x(t):
calculate margin p(t) = w(t-1) * x(t)
predict a class with Hebb's rule y(t) = sgn(p(t))
toss a coin with P(heads) = b / (b + |p(t)|)
if Heads or y(t) == +1:
query the label y' of x
if y' != y:
update weights w(t+1) = w(t) + y'*x
else:
w(t+1) = w(t)
Is this algorithm going to converge? If not, what modifications can be made?
The first idea was to decrease a learning rate for +1's, but still, I don't know how to prove convergence. Do you have any ideas on how to design an algorithm so that it will surely find a linear separator in a finite number of iterations?
Thanks.
Update. If don't know how to answer the question, but have any thoughts on where you would look for a solution or any pointers to publications/books/etc, please, let me know!