If I have $x$, $y$ data pairs and I am fitting $y = ax + b$, so I am using a standard OLS model. My questions are:
If I take a particular value of the tuple (x,y), say $x_5$, $y_5$ pair and add it to the data set, potentially a million times, what effect does it have on the slope and intercept. What is good way to theoretically argue about the effect?. I understand that adding $x_5$, $y_5$ multiple times is same as putting that many times more weight on $x_5$, $y_5$. Does it mean that the OLS will try harder to go through that point than before hence leading to a different estimate of $a$, $b$ than before? What if the points we add $x_5$ is really close to $\bar x$, will that change things? Please note that I can test all this really easy in R, but I am looking for a theoretical explanation.
The second question comes from this post and @gung's answer (from Clustering in data), which is really informative. How to prove that resampling a particular value of $x$ say $x_5$, leads to more accurate approximation of the vertical position $f(x_5)$. What are the implications of this on design of experiments? In other words should we artificially include more observations around the values of $x$ for which we are more interested in?
If I bin my data into say 10 bins and draw a set number of samples from those bins, and then create a fit y~x, how should the slope and intercept be expected to change?