I am trying to use a Gaussian process to predict some outcomes. I'm using the squared exponential co-variance function and 0 mean. My inputs are 4 dimensional vectors and I am using Maple for all the calculations.
My problem is that when I compute the log marginal likelihood to optimize hyper-parameters, I can only use 7 data points. Since its a symbolic expression, the log likelihood becomes to large to compute even with 64GB ram. Then when I optimize the log likelihood using only 7 data points, I get terrible values that do not properly reflect the entire data-set leading to useless predictions.
Is there a way to optimize hyper-parameters using more data points that would not require me to invert and calculate log determinant of 8x8 or larger symbolic matrices? Maybe I am missing something obvious.