First a general remark. Some datasets contain discriminative features, others much less so. It may be that all your $210$ features have very little predictive power for the classification task your are investigating.
My advice for the next step is as follows. Draw at random $50\%$ cases from your $0$ category and $50\%$ at random from your $1$ category. This way, you end up with a balanced training set over the two categories. E.g. $2810/2 = 1405$ cases of category $1$ and $1405$ cases of category $0$. The remaining cases are your test set, for later evaluation. You now have a training set with a prior of $P(0)=\frac{1}{2}$ and $P(1)=\frac{1}{2}$. Note, cases should be picked using a random generator from each of the two classes.
Try random forests and maybe C4.5 (decision trees). You can also try logistic regression and linear discriminant (that latter only if your features are continuous numbers). The two regression classifiers are more forgiving in a large feature space with many redundant inputs than say a deep learning neural network. The decision tree algorithms perform inherent feature selection, which is why they should be tried out.
Now first look at your accuracy/error rate on this balanced training set. If you have good faith now, you can use the approach here to map the posterior probabilities of your classifiers to the skewed situation you have in your real domain. Apply the classifier to your test set, with the appropriate prior probability there.
Classifiers tend to train better from balanced training sets, because the variances of their parameters become smaller.
Would be nice if you will report any progress in this forum, e.g. in a comment to your question.