The error $\varepsilon_1$ is random AND unobservable. Before you see your data, it follows a mean zero normal distribution (continuous random variable). After you see your data, it follows $p(\varepsilon_1|y_1) = \delta_{y_1 - \beta_0 - \beta_1 x_1}(\cdot)$ (discrete random variable). The latter is not "nice," and you can't really do anything with it because you don't have the dataset $\varepsilon_1, \ldots, \varepsilon_n$ in addition to your $Y$ and $X$ data. So when you say, "where error is $\varepsilon_1 = 2$", that isn't the supposition that is going to elucidate anything for you.
Now, assuming that you actually did observe an $\varepsilon_1$, (which is impossible), and let's say the value was $2$, like you say. This is no longer random. Many books go from upper case to lower case to convey this, but for this particular Greek letter, it's difficult. $P(\varepsilon_1=2) = 0$, yes, but this is asking a question before you observe your $\varepsilon_1$. There is no probability after you see an outcome, so don't use any $P(\cdot)$.
Here's a better way to think about it. For this setup you have posted in your question, picture two columns of data in an Excel spreadsheet or csv file. Your first column will be $y_1,\ldots,y_n$, and your second column will be $x_1, \ldots, x_n$. I am writing these in lowercase because these data are not random anymore, since you already see specific fixed values.
Your above model is equivalent to assuming each of the $Y_i$s (before you see your data for them) are independent from one another, and that their probability distributions only differ by having different means. You're assuming that, before you observe your $Y$ data, if you have some information $x_1, \ldots, x_n$, then you have different normal distributions for each $Y_1, \ldots, Y_n$. In other words, all rows are mutually independent, and for each row of data $i$
$$
Y_i\mid X_i=x_i \sim \text{Normal}(\beta_0 + \beta_1 x_i, \sigma^2).
$$
Because they are independent
$$
p(y_1,\ldots,y_n;\beta_0,\beta_1,\sigma^2) = p(y_1;\beta_0,\beta_1,\sigma^2)\cdots p(y_n;\beta_0,\beta_1,\sigma^2), \tag{1}
$$
or the joint density for your first column factors, and each $p(y_i)$ is normal with the same variance, but they all have potentially different means which depend linearly on their corresponding $x_i$s.
But then you have your data for the $Y$s: $y_1,\ldots,y_n$. So you can evaluate your joint density/likelihood (1), and move around $\theta = (\beta_0,\beta_1,\sigma^2)$ until you get good parameters (think maximum likelihood or restricted maximum likelihood or least squares).