5

In the thread Is there any statistical test that is parametric and non-parametric?, @JohnRos gives an answer saying that

Parametric is used in (at least) two meanings:

  • A - To declare you are assuming the family of the noise distribution up to it's parameters.
  • B - To declare you are assuming the specific functional relationship between the explanatory variables and the outcome.

@whuber counters that

The two meanings in the first paragraph frequently have a unified treatment in the literature: that is, there appears to be no fundamental or important distinction between them.

Question: I am failing to see exactly how and wonder if anyone could provide an explanation.

For example, I find the definition used in the tag information on (created by @whuber) similar to A:

Most statistical procedures derive their justification from a probability model of the observations to which they are applied. Such a model posits that the data appear to be related in a specific way to draws from some probability distribution that is an unknown member of some family of distributions. The family of distributions for a parametric procedure can be described in a natural way by a finite set of real numbers, the "parameters." Examples include the family of Binomial distributions (which can be parameterized by the chance of a "success") and the family of Normal distributions (usually parameterized by an expectation $\mu$ and variance $\sigma^2$). When such a description is not possible, the procedure is termed "nonparametric." Wikipedia provides a list of some non-parametric procedures.

but I cannot reconcile it easily with the description of the notion in James et al. "An Introduction to Statistical Learning" p. 21 which is similar to B:

Parametric methods involve a two-step model-based approach.

  1. First, we make an assumption about the functional form, or shape, of $f$. For example, one very simple assumption is that $f$ is linear in $X$: $$ f(X) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + > \beta_p X_p. (2.4) $$ This is a linear model, which will be discussed extensively in Chapter 3. Once we have assumed that $f$ is linear, the problem of estimating $f$ is greatly simplified. Instead of having to estimate an entirely arbitrary $p$-dimensional function $f(X)$, one only needs to estimate the $p+1$ coefficients $\beta_0,\beta_1,\dots,\beta_p$.
  2. After a model has been selected, we need a procedure that uses the training data to fit or train the model. In the case of the linear model fit train (2.4), we need to estimate the parameters $\beta_0,\beta_1,\dots,\beta_p$. That is, we want to find values of these parameters such that $$ Y \approx \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p. $$ The most common approach to fitting the model (2.4) is referred to as (ordinary) least squares, which we discuss in Chapter 3. However, least squares least squares is one of many possible ways to fit the linear model. In Chapter 6, we discuss other approaches for estimating the parameters in (2.4).

The model-based approach just described is referred to as parametric; it reduces the problem of estimating $f$ down to one of estimating a set of parameters.

Again, my question can be found above in bold print.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • One evident resolution is that the ISL framework is not probabilistic. *A fortiori* any definition or concept it could possibly supply of a "parametric" model can refer *only* to functional relationships. The classical statistical definition explicitly involves a family of probability distributions *in addition* to a functional relationship, and as such offers a richer, more flexible framework for understanding such concepts. – whuber Sep 05 '19 at 15:44
  • In response to Isabella Ghement's answer, I wish to emphasize that you are quoting only the last of a series of comments I wrote in that thread and thereby have not revealed the essential background and context needed to understand it. The other comments include references and explanations that (I hope) relieve the ambiguity in this short quotation. – whuber Sep 05 '19 at 16:02
  • @whuber, thank you for contributing to this thread! I have read your other comments in the quoted thread, too, but I still feel I do not grasp the essence as well enough. By the way, the comment I am quoting is your *first* reaction to JohnRos statement (and a very direct one), so it is not your *last* comment in that perspective. Also, if you see that your comments address my question directly enough, could you compose an answer of them and post it here? I would appreciate it. – Richard Hardy Sep 05 '19 at 16:32
  • Also, I do see that the ISL framework is not probabilistic, or at least it does not refer to the distribution of $Y$ in the quoted excerpt. Does that mean we could identify a *probabilistic* or *classical statistical* (as you say) definition of parametric vs. nonparametric that could be contrasted to a *nonprobabilistic* or *machine learning* (?) definition of these concepts? – Richard Hardy Sep 05 '19 at 16:42
  • That's right: I think that's how you can reconcile the perspectives. – whuber Sep 05 '19 at 17:15
  • One man's parameter is another man's hyperparameter. – usεr11852 Sep 05 '19 at 18:54

1 Answers1

1

The paragraph by @JohnRos seems to refer to a regression context. To simplify things, let's say that we have a single predictor $X$ in our regression model and that the model can be formulated like this:

$Y = f(X) + \epsilon$

where $\epsilon$ is a normally distributed error term such that $E(\epsilon) = 0$ and $Var(\epsilon) = \lambda(X)^2$, where both $f()$ and $\lambda()$ are unknown functions.

If we are willing to assume that both $f()$ and $\lambda()$ have parameteric forms, then the model itelf can be referred to as parametric. For example, $f(X) = \beta_0 + \beta_1*X$ and $\lambda(X) = \sigma$.

But if we think either $f()$ or $\lambda$ to be unknown, smooth, possibly nonlinear functions of $X$, whose underlying shapes will be determined from the data, then our model would include one nonparametric component and it would be incorrect to refer to it as a parametric model.

I believe this simple example invalidates @whuber's first statement. A model such as the one above is determined by specificying the functional form of both $f()$ and $\lambda()$. Only when both of these components are specified as parametric can we refer to the entire model as parametric.

Isabella Ghement
  • 18,164
  • 2
  • 22
  • 46
  • I don't understand how this example renders my comment invalid. I completely agree it is nonparametric. This suggests to me there is some ambiguity in what I wrote. That's no surprise, because the OP is quoting a *comment,* where limited space requires a terse style. To understand what it was intended to mean, please refer to the other (earlier) comments I posted in the same thread. – whuber Sep 05 '19 at 16:00
  • 1
    Isabella: *...then the model itelf can be referred to as parametric*. Hmm... Your example looks nonparametric to me in the sense that it is distribution free (cf. **A**). The error distrbution is not completely specified even if $\lambda$ is known. Variance is just one parameter of a distribution, and there can be many different distributions with the same variance. – Richard Hardy Sep 05 '19 at 16:40
  • @whuber: The part of your comment about the unified treatment of A and B makes it sound like those two issues are confounded in the literature. The example I provided suggests that one can impose parametric/nonparametric assumptions on either A or B. To me at least there is a fundamental distinction between imposing such assumptions for A and imposing them for B. – Isabella Ghement Sep 05 '19 at 17:27
  • @RichardHardy: I modified the example to include normal errors, so that should address your concern. – Isabella Ghement Sep 05 '19 at 17:52
  • 1
    The distinction is a worthwhile one, Isabella. However, my references (such as Kiefer, Lehmann, or Kendall, Stuart & Ord) indicate that if *either* of the functional or distributional assumptions is non-parametric, the problem is considered non-parametric. I wouldn't characterize this as a "confounding" so much as a *unification.* That, I recall, was the intended meaning of my comment quoted in the question. – whuber Sep 05 '19 at 18:01
  • 1
    @whuber: Excellent clarifications which line up with my answer. I initially attached a different meaning to the word *unification*. – Isabella Ghement Sep 05 '19 at 18:39
  • Thanks, Isabella, for your perspective. It looks like it applies to case **A** but not **B**; in **B**, we do not assume a family of distributions indexed by some parameters for the error term. My question is, how does @whuber think **A** and **B** are essentially the same. I do not see that connection. On the other hand, his later statement that *if either of the functional or distributional assumptions is non-parametric, the problem is considered non-parametric* comprises both **A** and **B**. We would need to edit the Wiki I am quoting to incorporate that, though. – Richard Hardy Sep 07 '19 at 08:04
  • 1
    @Richard Both cases are subsumed by the general model $Y=f(X,\theta)$ where $Y$ is the response, $X$ contains the variables, and $\theta$ parameterizes the model. In a parametric model $\theta$ lies in a finite-dimensional manifold. It is usual to split $\theta$ into two parts, one of which describes the functional relationship between $Y$ and $X$ and the other of which determines the probability law, but that splitting is not fundamental. For instance, a split would not be possible in the model $Y\sim\operatorname{Normal}(\mu+X\beta,\mu^2).$ – whuber Sep 07 '19 at 13:34
  • @whuber, that is very helpful! If you converted the last comment to an answer, perhaps adding some more context, I would gladly upvote it. – Richard Hardy Sep 07 '19 at 14:48