0

I am a programmer with little mathematical background who started to study statistics/ML recently. I quickly stumbled upon the random variable term and it was hard for me to understand why in statistics it is called random. For me a truly random variable must be impossible to predict, in other words it must have uniform distribution. If a variable is not uniformly distributed we can somewhat predict it. In fact in programming a random number generator is considered bad if distribution of it's generated values is not quite uniform. So I was surprised to see that we can build a good model and predict "random variables" in statistics. Looks like terminology clash to me.

I just want to check if my understanding of the random variable is correct. So my questions are:

  1. Is it true in statistics that any variable which values cannot be precisely predicted is called random variable regardless of it's distribution?
  2. If we have a good (but not perfect) model for predicting a random variable $Y$ is it still random? Or let me try to say it more formally. if we have some dependence $Y = f(X) + \epsilon$ and a model $\hat{y} = \hat{f}(x)$ which has little but non-zero error on both test and training samples is $Y$ still a random variable? Here the $\hat{y}$ is a predicted (but not observed) value of the variable $Y$; $\hat{f}$ is an approximation of $f$; $x$ is any value of independent variable $X$ and $\epsilon$ is an irreducible error variable which depends on some unknown events (basically the deviation of $Y$ around the $f(X)$).

Thank you.

Anton Jebrak
  • 101
  • 1
  • 2
    Your first question is answered at https://stats.stackexchange.com/questions/50/what-is-meant-by-a-random-variable/54894#54894. I don't think there is any "terminology clash," but rather perhaps you (and many others writing in the CS literature) might be paying insufficient attention to the terms "uniform," "independent," and "identically distributed." The second question is puzzling: could you explain what the connection is between "$\hat y$" and "$Y$"? The definition of $\hat y$ is not a "model" in any standard sense of the word: are you sure you intended to insert "Var" at the right? – whuber Nov 10 '18 at 21:47
  • 1
    "[One of the miseries of life is that everybody names things a little bit wrong, and so it makes everything a little harder to understand.](https://www.youtube.com/watch?v=EKWGGDXe5MA&feature=youtu.be&t=296)" -- Richard Feynman – littleO Nov 10 '18 at 22:23
  • @whuber Thank you, I have edited the second question. – Anton Jebrak Nov 10 '18 at 22:28
  • 1
    "For me a truly random variable must be impossible to predict, in other words it must have uniform distribution" -- the conventional usage differs from your sense that random implies uniformity. You're free to use terms how you like, but if you want to understand what people write (and to make yourself understood) it's best to understand what the conventional definitions are. [As one possible thing to consider you might notice that by your definition, a random value on the positive integers is impossible (since a uniform on the positive integers is impossible). Uniformity isn't general enough. – Glen_b Nov 10 '18 at 23:09

0 Answers0