2

I've been hunting around looking for a way to do this problem. I need to create a program to calculate linear regression for 100 3-dimensional points. I also have the matching outcomes of the points, so it's like a training set rather than a testing set. I'm also instructed to consider the bias term, but I'm not sure what that means.

I can find good documentation for 2-dimensional points, but not for 3 dimensions. Since there's so many equations for doing XY data points, surely there's a similar equation set for XYZ points. I'm not really desiring to be given the equation outright, but more interested in an understanding of the function and how it works, as well as how it's determined. Thanks to anyone who read and who can help me with this problem.

user2828965
  • 123
  • 1
  • 1
  • 4
  • This is the website where I found some more information. It seems to only deal with 2-dimensional data though. http://easycalculation.com/statistics/learn-regression.php –  Mar 01 '14 at 20:49
  • 2
    Welcome to CV. Can you clarify your question? Are you trying to predict Z as a function of X & Y, or are you trying to predict an outcome in another dataset from X, Y & Z here? If so, is there any way to establish a correspondence b/t a given triplet in this dataset (Xi, Yi, Zi) & a particular outcome in the other dataset (Oi)? You mention a "bias term", that terminology is more common in machine learning than statistics; are you trying to train something like a neural network, or do you want regular (OLS) regression in statistics? – gung - Reinstate Monica Mar 02 '14 at 00:10
  • The Mathematica function `FindFit[...]` could achieve what you desire. – Joseph O'Rourke Mar 02 '14 at 00:25
  • I believe that the problem is asking to fit a line to 3-dimensional points. There are 100 points with 3 data values each, as well as accompanying result values. As an example: X value (0.442, 0.798, 0.708) has matching Y value (-6.228) and I need to fit a line to 100 points like the X value, while also using the Y data. I was directed here as a better location to ask my question. I was not informed that it was a research website. I apologize for any transgressions. – user2828965 Mar 02 '14 at 04:49

3 Answers3

0

QPSO would solve this; its essentially a parameter optimization problem in 3 dimensions. I've used QPSO for as many as 15 parameters.

There are several 3rd party implementations in Python, for example a black box version here:

https://pypi.org/project/qpso/

But it is not too hard to write QPSO for something like this w/ a simple "for loop" in the language of your choice. More or less you're making small random change (+/- up to 5%) to each of your 3 parameters respectively... if least sum squares is smaller than previous (gradient descent) save the new parameter list as "best" and recalculate with new random steps (drunk walk) from there. Else, return to previous "best" parameter list and attempt a new random set of changes for each parameter. Iteratively, you soon approach a solution. In 3D it should be very quick for a linear relationship.

QPSO would also allow you to optimize with different equations; potentially with more parameters should you decide the relationship is not best described linearly. For higher dimensional problems there are additional methods to speed the gradient descent; simulated annealing, elitist breeding, keeping tally of best synapses (neuroplasticity), stochastic dual coordinate descent (SDCA); etc.

litepresence
  • 101
  • 1
0

I actually discussed this yesterday.

The matrix equations Dave31415 are essentially your solution, but depending on how much data you have you may need to use some linear algebra tricks to make the problem tractable, as one of the matrices you will need to invert may be ill-conditioned.

David Marx
  • 6,647
  • 1
  • 25
  • 43
  • Yes, inverting the matrix in the normal equations isn't the best solution and isn't the way most packages actually solve linear regression. But it's still true that any way of doing it is more or less dimensionally-agnostic when expressed in matrix form. It is usually a good idea to consider ridge-regression or lasso regression rather than standard regression. – Dave31415 Mar 02 '14 at 21:44