I am enrolled in a machine learning course for machine learning where we have a lab to implement linear regression I am attempting to do it in R to get a better understanding of the material and of R for myself (i don't intend to submit this as a lab as the course doesn't use R) but am coming up against a wall
My understanding of the process is as follows
User Generates a model based on the hypothesis $h_\theta(x) = \theta^TX= \theta_0x_0 +\theta_1x_1+\dots$
Take error rate of your model by using squared error cost function, then iterate, create a new hypothesis and get the error rate of this. Continue through $n$ iterations based on the formula $J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_1^m(h_\theta(x^{(i)})−y^{(i)})^2$.
Take all the error rates you have recorded based on the cost history and use
gradient descent
to find automatically the optimal values of your hypothesis.
Using the code on R-Bloggers where the gradient descent is implement below based on vectors x
and y
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- (t(X) %*% error) / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
I was wondering if people could help me tease out the logic
Why is the number 1 applied to the matrix
X
. Is this so that X has 2 columns so that it can be multiplied by theta - y?What is the formula delta actually calculating and why is the Transpose of X being used
Conceptually I think i understand the overall process but i just need to relate this back to the R code as i want to grasp the concept before proceeding to Multiple linear regression