3

Being interested in complex systems and trying to get a beginner's understanding of the field, today I ran across this interesting article in Quanta Magazine on chaos theory and equation-free modeling. I realize that non-parametric statistics implies lack of certainty in data distributions' parameters (please correct me, if I'm wrong). I'm not sure about chaos theory, but but it seems to me that, at least, equation-free modeling is a term, closely related to non-parametric statistics.

Question: what are relations, if any, between the emphasized topics and non-parametric statistics (I am not interested in details, but rather sources and nature of relations; references are welcome)?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
  • 5
    A lot of statisticians will argue exactly what the phrase *non-parametric statistics* means exactly (I know the definition that applies to my field...), but "lack of certainty in data distributions's parameters" can be easily misinterpreted as parametric statistical inference ("I have uncertainty about $\mu$ and $\sigma$ that define my normal distribution"). – Cliff AB Oct 15 '15 at 20:02
  • 1
    @CliffAB: Thank you for pointing that out. My wording indeed could be more precise. As you can guess, the implied meaning is not the uncertainty about, but the lack of any distributional assumptions (if I can use these terms; please correct me, as I am not a statistician). – Aleksandr Blekh Oct 15 '15 at 20:07
  • 1
    Non-parametric approaches are not equation free. Think of fitting a curve where you have a function form of a curve, such as $sin(x)$, vs. where you don't have it, but specify that the curve must have minimum length. In the latter case you still have a variational *equation* despite not having the functional form of the curve. – Aksakal Oct 15 '15 at 20:24
  • Also, there's no chaos theory per se, in my opinion. – Aksakal Oct 15 '15 at 20:26
  • 2
    I'm going to get flack for this, I can see it, but...the definition of non-parametric estimator that I prefer is "an estimator whose solution cannot be characterized by a finite number of parameters". For example, the simplest NP estimator is the empirical distribution function, which places probability mass $1/n$ at each observed time point. All possible monotonic step functions cannot be described by a finite set up of parameters, unlike, say all possible normal distributions. – Cliff AB Oct 15 '15 at 20:27
  • 1
    ...but while I consider that a valid definition, it doesn't quite reveal the motivation of non-parametric estimators: to be "infinitely flexible". However, it doesn't take much exposure to NP estimation to see that this is the *goal* of NP estimators, but it is not necessarily achieved. Sorry to go off on a side tangent from your original question... – Cliff AB Oct 15 '15 at 20:30
  • @Aksakal: Thank you for clarification. If I understood you correctly, non-parametric approaches always (?) imply the existence of some _equations_, representing forms, other than a core _function_, such as a set of _constraints_ (as in your example). – Aleksandr Blekh Oct 15 '15 at 20:31
  • @CliffAB: Thank you for expanding on the topic. I think it is a bit clearer to me now, but I still need some time to digest the info properly (unfortunately, lack of my formal math/stats background is showing; hopefully, eventually, I will overcome this problem). – Aleksandr Blekh Oct 15 '15 at 20:35
  • 1
    @AleksandrBlekh, every time I write "always" Glen_B shows up with a counter example :) So, I'll say only that "non-parametric" doesn't imply there is no equation. Like in my example there could be a variational equation or some kind of min/max condition to be satisfied. So, parametric approaches define a solution on a narrowly defined class of functions such as $\sin(k x)$, while non-parametric approaches tend to be less specific like "shortest line with continuous second derivative" – Aksakal Oct 15 '15 at 20:35
  • @Aksakal maybe we should all team up and write a paper defining exactly what non-parametric actually means. – Cliff AB Oct 15 '15 at 20:45
  • @CliffAB, aren't there already papers on the subj? – Aksakal Oct 15 '15 at 20:49
  • @Aksakal: perhaps? I've seen the definitions in books but I can't say I've ever read it in an article. And I've seen plenty of disagreement in different subfields about what it actually means. – Cliff AB Oct 15 '15 at 20:53
  • 2
    Note that the tag `stochastic-processes` does not describe [chaotic dynamics](https://en.wikipedia.org/wiki/Chaos_theory), which are deterministic. If you examine the evolution in time from two nearby points in state space, the difference between the two trajectories will be random for a stochastic system, but the trajectories will diverge exponentially in time for a chaotic system. – EdM Oct 15 '15 at 21:11
  • @EdM: Thank you for your comment. IMHO, the exponential divergence might be naively explained by the presence of constantly increasing entropy along the time dimension. Looking at the Wikipedia article you've referenced, I'm curious whether chaotic dynamic systems can contain entropy component at all and, if so, how does it fit into such systems being deterministic. – Aleksandr Blekh Oct 15 '15 at 22:04
  • 3
    @AleksandrBlekh, for me a good characterization of *nonparametric* is *lack of distributional assumptions* (as you wrote above). Suppose we have variables $y$ and $x$. In a *parameteric* case, we assume the shape of the conditional distribution of $y|x$ to be known but the specific parameters unknown (e.g. Normal with unknown mean, unknown variance); we then go after estimating the unknown parameters. In a *nonparametric* case we do not assume to know the shape. – Richard Hardy Oct 16 '15 at 06:46
  • @RichardHardy: I appreciate your explanation, which is very clear. Returning to the core of my question, would you agree with Aksakal's comment above that _non-parametric_ approaches do not imply being _equation-free_? – Aleksandr Blekh Oct 16 '15 at 07:27
  • @AleksandrBlekh, I do not have enough experience to say anything authoritative on that question. I have not dealt with it much at all. – Richard Hardy Oct 16 '15 at 07:45
  • @RichardHardy: I see. Thank you for your feedback. – Aleksandr Blekh Oct 16 '15 at 07:49

1 Answers1

4

In the limit, this becomes a question of whether you consider time-series analysis to be non-parametric statistics.

The approach of the PNAS paper by Ye at al., cited in the Quanta Magazine article, might be considered a generalization of standard time-series analysis. As stated in the Supporting Information Appendix to the article: "reconstructions of a dynamic system can be made using successive lags of a single time series...if enough lags are taken, this form of reconstruction...preserves essential mathematical properties of the original system." This Appendix has (for me, at least) the clearest explanation of the approach.

What this approach adds is a weighting procedure that can deal better with the problems posed by underlying non-linear dynamics. (I will leave aside the question of whether there really is a chaos theory, as raised in comments above.) A "tuning parameter," $\theta$, is added to the model, which gives nearby points "stronger weighting, allowing the model to be adaptive to local influences and therefore, nonlinear."

For $\theta=0$, "the model reduces to an autoregressive model"; hence the first sentence of this answer. Forecast skill is then assessed as a function of $\theta$. In the data analyzed in that paper, "forecast skill peaks when $\theta$ is ~ 2, which is evidence for nonlinearity in the aggregate time series." Information from multiple time series is then used to try "to identify informative environmental variables and elucidate potential mechanisms."

I suppose that one could find an analogy between this approach and non-parametric tests that make no assumptions about underlying distributions, in that the approach of Ye et al. makes no assumptions about the form of the equations of the underlying dynamic model. But I think that drawing the analogy would be a disservice both to "non-parametric statistics," whatever that means, and to the work presented by Ye et al.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Thank you so much for taking time to answer my question. I appreciate your detailed answer (+1). It will take some time for me to process the information, considering that I'm an aspiring complex systems enthusiast. – Aleksandr Blekh Oct 15 '15 at 22:08
  • 1
    To EdM's point in the quote he cites, distributed lags of a univariate time series amount to a Koyck model in econometrics and is nothing new. Where I think EDM models do diverge significantly from traditional time series analysis is in leveraging a non-reductionistic, non-mechanistic approach to model building. This is a definite rejection of the "Occam's Razor" assumptions underlying nearly all traditional analysis. – Mike Hunter Oct 16 '15 at 11:58
  • 2
    +1. But I had a brief look at the PNAS paper (as well as at the popular article that @AleksandrBlekh linked to), and somehow it looks like a fairly simple exercise in time series analysis, but filled with buzzwords, complemented with appropriate strawmen and unbelievably oversold and over-hyped. That's to put it *politely*. I don't understand what it does in PNAS, but then it's not the first time I have this feeling about PNAS. Not surprising that people get confused about how all these fancy terms are related. – amoeba Oct 16 '15 at 22:56