I'm trying to predict the numeric value (based on different variables) and for that I want to use different methods and compare these methods (having a variety of methods is more important than finding the absolute best one). My problem is, that I'm a beginner in the field of predictive analytics and only have some basic knowledge in statistics in general, and therefore am afraid to choose methods which make no sense to use and waste time I don't have.
My data consists of about 1000 observation with 10 variables. There is only a rather small collinearity. My Model looks like: $y = a + bX + e$ (with $X$ being the Vector of my 10 variables and $b$ being the coefficients of said variables. $e$ is the error term and $a$ being a constant. I want to predict y based on my training data and compare the methods with my test data)
The methods I want to use are:
- OLS Regression: because it is the most basic thing to do
- PLS Regression: to compare it with the OLS Regression hoping that it will give better results. I'm not sure if I should use it, because I have a bunch of observation for a rather small number of variables
- LASSO Regression: another regression which promises to be better. Again I'm note sure if I should use it, because of the number of variables. If there is an obvious reason to not use it, please let me know!
- Decision Tree: Decision Trees are also a widely used methods, so I think I should compare it
- Random Forest: Random Forest as a logical next step after the Decision Tree
- Support Vector Machine: A machine learning method which promises good results
- Neural Networks: An interesting sounding method to use for which I have some basic understanding how it works.
- Genetic Algorithms: GA with the objective of minimizing the squared difference between the predicted and the real value. Is there any advantage to the normal OLS Regression? Or should I choose another objective for better results?
- Naive Bayes classifier: another machine-learning method which may give some good results
Again, my aim is to have a variety of methods to compare. Having the best predicting comes only second. Nevertheless, I don't want to use methods which are obviously unsuitable for my task and I don't have the time to get deep knowledge of methods I don't use later. Of course, I will research all methods I will compare later, but for now it would be great I you could help me out!
Are these good (or at least suitable) choice or are there some methods I shouldn't use because these make no sense here? Are there any more methods, which I didn't pay attention to, you would suggest using?