3

From the article https://en.wikipedia.org/wiki/Occam%27s_razor:

Another contentious aspect of the razor is that a theory can become more complex in terms of its structure (or syntax), while its ontology (or semantics) becomes simpler, or vice versa. Quine, in a discussion on definition, referred to these two different perspectives as "economy of practical expression" and "economy in grammar and vocabulary", respectively.

Within statistics, are there some examples that can help understand where the above occurs or where this may be of importance?

As a first example, which it would be good to have critiqued and improved, is Bayesian techniques. Adding uncertainty in the initial model and the running of simulations (if conjugate priors are not available) not only adds computational cost but adds checking in terms of whether the simulation had meaningful coverage. However, this extra syntactic complexity is often worth it with the easier interpretation and extensible semantic results that Bayesian statistics provides.

Single Malt
  • 504
  • 1
  • 5
  • 15

1 Answers1

2

This is an extremely philosophical (and logical) question, and it's unclear if it can be answered in the context of Cross Validated.

The first part of the questions seems straightforward, but actually is not. How is simplicity or complexity defined? In any natural science, that is actually never an issue -- but there the correct term would be "accuracy", and there is a kind of truth that one can measure (in the broadest sense) the deviation between theory and truth.

There is the notion to measure to the complexity of a theory, similar to the "complexity" or the "informational content" of an entity. But this leads to real-world systems, as well. This might be the cross link to entropy or Shannon entropy. But this is not viewed in the sense of statistics and statistical theory. So the categorization in relation to statistics leads more towards the "Kolmogorov complexity", but this opens up an entire new field of descriptive vocabulary. The pathway to follow is most likely by reading Ray Solomonoff (starting point). It is highly interesting, but not so easy to follow. The key idea in terms of what "complexity" means is that there is a countable set, which divides a theory into easier if the set is smaller. It might be interesting to note, that this leads to one of the foundations of the modern statistical learning methodologies. So one way to describe 'artificial intelligence' follows this path.

But as a scientist I have to think of a dialogue in "Life of Galileo" by B. Brecht. IIRC there is an argument why the motion of objects is supposed to be simple in the sense of intuitive, and not "crazy" or "random". The answer is one that modern science usually despises of, which is kind of anthropocentric (e.g. the anthropic principle). So the math we use and it's properties are an artificial point of view. A different kind of math could be far simpler.

But ultimately this is not the end of the discussion, but the following is: the incompleteness theorem (another wikipedia link). However simplicity (or complexity) is defined, it has to be within some kind of formal system. And ultimately the limitations of the system cannot be overcome. Any proof of a theory being the most simple must happen within the framework, but no proof exists for the framework itself -- it's inherently incomplete. Interestingly, following this line of thinking, one can also end up with Turing machines, and their limitations; which is basically the same conclusion as Solomonoff dichotomy of completeness vs (un)computability.

For a more practical and purposeful answer, it makes much more sense to look at theories in context of an application or in terms of their usefulness. In natural sciences a theory can be more or less inaccurate. But if a certain level of accuracy is pre-set, the complexity of a theory can be assigned a metric (e.g. the number of parameters). But it should be noted that the accuracy of a method is not the same as simplicity/complexity. The question of simplicity is usually the wrong question for a given problem-solution pair.

So, to give a somewhat summarised answer: there is no free lunch. This is only partially meant as joke, since Solomonoff's work also touches on that theorem. But finally, there are many more dimensions to a particular formulation of theory than "economy of practicality" and "economy of expression". In the context of an algorithmic formulation, there are lots of practical aspects, too. Ease of implementation on a computer system factors into it, as well as checkability or ease of verification.

cherub
  • 2,038
  • 7
  • 17
  • 1
    "How is simplicity or complexity defined? In any natural science, that is actually never an issue --" I shall go alert the Complex Systems Science program at my university, they will be surprised to learn. The computer science folks and Dr Chomsky might also appreciate being looped in, as will Peter Taylor and a host of mathematical ecologists... ;) – Alexis Mar 18 '21 at 16:36
  • @Alexis The key issue for statistical models in natural science is the accuracy between theory and nature. Then you might apply Occam's razor to "shear off" the unneeded complexity. Of course, the answer is not all-encompassing, so I might edit that part to make it less short cut. I had the impression that question is difficult to answer with making the answer even more difficult. The phrase "never an issue" is probably better replaced with "a secondary issue" or "an issue of a different dimension". – cherub Mar 18 '21 at 16:58
  • The no free lunch theorem seems to doubly apply here: 1. Even if a method is fixed, then the above ratios of “economy” seem to require a trade-off. 2. Sometimes a complex (disregarding what this means as the definition of complex ironically adds complexity to this topic!) method is better, necessitating going against Occam’s razor. – Single Malt Mar 18 '21 at 18:19