If I take a very basic example where my feature Matrix X is
$$ \begin{matrix} 1 & 100 & 0.25\\ 1 & 110 & 0.5\\ 1 & 120 & 0.75\\ 1 & 130 & 1\\ 1 & 140 & 1.25\\ \end{matrix} $$
and the expected output vector Y is
$$ \begin{matrix} 201.75\\ 222.5\\ 243.25\\ 264\\ 284.75\\ \end{matrix} $$
Then if y = $\theta_0$ + $\theta_1*x_1$ + $\theta_2*x_2$ clearly solves to [$\theta_0$, $\theta_1$, $\theta_2$] = [1, 2, 3], but if I apply feature scaling, I am essentially changing the values of my feature matrix X and therefore I will get very different values for [$\theta_0$, $\theta_1$, $\theta_2$] which when plugged back into the equation y = $\tilde{\theta_0}$ + $\tilde{\theta_1}*\tilde{x_1}$ + $\tilde{\theta_2}*\tilde{x_2}$ do not yield the resultant vector Y.
Now I know that feature scaling works and I am thinking something the wrong way. So I need someone to point out what is wrong with my very basic understanding above.
And if you are inclined to respond with the standard "Google It" response, please have a look at the below links I have gone through without getting the answer to this:
How and why do normalization and feature scaling work?
Is it necessary to scale the target value in addition to scaling features for regression analysis?
https://www.internalpointers.com/post/optimize-gradient-descent-algorithm
http://www.johnwittenauer.net/machine-learning-exercises-in-python-part-2/