4

I'm asking this out of curiosity.

In the past I have thought of an heuristic as a "quick and dirty" rule not based on data analysis, as opposed to a solution which uses machine learning or statistical models.

For example imagine I have the following classification problem: are products listed in different e-commerce websites the same physical product?

One could simply define an initial heuristic that if product titles similarity is below an arbitrary level than products are considered different otherwise they are considered the same. This would be what I call an heuristic.

On the other hand, tagging data, training a model and cross validate to get best threshold to classify products is NOT an heuristic.

However if I read Wikipedia definition of Heuristic it says that it is a practical method that does not guarantee to be optimal or perfect. This seems very general and it seems that it could be extended to machine learning.

Can anyone help me understand the distinction a bit better?

Sven Hohenstein
  • 6,285
  • 25
  • 30
  • 39
Giacomo
  • 143
  • 1
  • 7
  • 1
    Machine learning predictive models *optimize* gradient parameters only in the loosest sense of the word since this process of *optimization* is typically in the context of extremely computationally complex, massive data based on iterative methods that divide, conquer, partition, search and sift through the information across multiple parallel CPUs and threads. Given those significant constraints, ML *optimization* algorithms are rarely, if ever, mathematically exhaustive-the true meaning of *optimize*. They are approximating and satisficing (in Herbert Simon's original sense) heuristics – Mike Hunter Aug 29 '17 at 14:18
  • 2
    Note that machine learning is not limited to massive data only, although it is the massive data setting that showcases nicely the benefits of machine learning over some other methods. – Richard Hardy Aug 29 '17 at 15:44
  • @RichardHardy, without massive data machine learning doesn't work well or at all. – Aksakal Aug 29 '17 at 15:45
  • "it is a **practical** method" - emphasis mine. Machine learning is not a practical method, it's driven by theory. – Firebug Aug 29 '17 at 16:04
  • 3
    Machine learning is an anthropomorphism. Even if we the process of modifying weights with data as "learning", the process is entirely dependent on the user input. Machines are not self-aware thus cannot discover things as is said in heuristic learning. In contrast, they are highly efficient at separating signal from noise. – AdamO Aug 29 '17 at 16:11
  • 1
    @Aksakal, wrong. While complex machine learning methods naturally require large data to reduce variance, simple methods work fine for small data as variance is not a problem. (This is not only my personal experience but also material covered in an introductory course to machine learning, e.g. the famous Andrew Ng's course on Coursera.) – Richard Hardy Aug 29 '17 at 16:24
  • @RichardHardy, what are the examples of simple methods? You don't mean regression and classifications that were before the massive data. Without massive data there's no ML, that's what all successful projects demonstrate. Show me a project where ML worked with little data – Aksakal Aug 29 '17 at 16:46
  • 1
    @Aksakal I think you are pinpointing too much ML. Most application areas of machine learning concern "small data" (that is, data that can fit into typical computer memory size). Genomics and neuroimaging applications go even further down in this scale, with several successful projects working with hundreds or even dozens of samples. ML is not only about image recognition and natural language processing. – Firebug Aug 29 '17 at 16:52
  • Thanks All! I will accept Tim's answer, which I think I could summarise as: ML algorithms are heuristics which give you some guarantees for optimality in certain scenarios. Also, thanks for mentioning Ng's course on Coursera, I will give it a go! – Giacomo Aug 30 '17 at 10:11
  • 1
    @giac_man, in this understanding nothing is not heuristic. Nothing guarantees you optimality in general situations. OLS regression does not guarantee optimality under general conditions etc. So, in this understanding the term heuristic is too wide, thus useless, because it means all modeling, pretty much. In practice, we know what's heuristic and what's not when modeling, we use the word in more narrow meaning – Aksakal Aug 30 '17 at 16:50
  • I understand there is a different between the use of the term in practice and the meaning of the term in theory. In practice (work environment for example) one should be careful (ab)using the word and probably find a different one to define what I explained in my question. Not sure which one though :) – Giacomo Aug 31 '17 at 11:29

3 Answers3

4

There are parts that are heuristic in machine learning, e.g. the choice of variables (inputs) and topology of the neural net. However, to call the whole thing heuristic would be wrong, since there's optimization involved.

For instance, if you said "we observed that customers who who by apples are twice likely to buy pears too, hence let's offer them pears in the online store" - this would be a heuristic. If instead you gathered all the data on customer behavior, and run a machine learning algo to come up with shopping suggestions that would not be a heuristic anymore.

However, as I wrote in the beginning, your decision to not include the current outside temperature into the variable (feature) list would be based on heuristic, most likely.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • The optimization part by itself probably satisfies most definitions of 'algorithm', but I'm not sure that the use of (lots of) data distinguishes a heuristic from an algorithm. – Matt Krause Jan 21 '18 at 23:04
3

The problem with this kind of definition is that it is ambiguous and can be understood differently by different people and in different contexts. Wikipedia says that,

heuristic, is any approach to problem-solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals.

How do you know that the solution is optimal or perfect? When you are dealing with random phenomena, then you cannot get "perfect" results (i.e., always correct). What machine learning algorithms give you, is the "best you can get" results, given certain conditions are met. Moreover, each of the algorithms that are commonly used gives you some guarantees for optimality in certain scenarios (if they didn't, we wouldn't use them).

Heuristics have very similar, though more precise, meaning in computer science, tl;dr: they are algorithms that seek an approximate, opinionated solution rather than the exact one. In machine learning, there is usually no exact solutions, so it is not achievable by any algorithm.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • 2
    This Wikipedia definition seems far afield from other definitions and my own intuitive understanding: *heur* means "to find". Heuristic is a process of teaching by self-discovery. – AdamO Aug 29 '17 at 16:02
  • @AdamO yea, and many other dictionaries define it as you suggested. I believe that the definition offered by Wikipedia is close to as it is understood by psychologists (e.g. works of Kahneman and Tversky). – Tim Aug 30 '17 at 10:14
  • The only definitions that matter in the context of this conversation are those that directly come from the discipline of computer science. If you want to use wikipedia you want the article entitled "Heuristic (Computer Science)". https://en.wikipedia.org/wiki/Heuristic_(computer_science) – Jimbo Jul 02 '21 at 09:54
  • @Jimbo the Wikipedia article is very vague and the CS definitions are more precise, but the idea is the same. Nonetheless, I edited for CS definitions. – Tim Jul 02 '21 at 10:07
2

It is surprisingly hard to formalize what is meant by 'algorithm'; Wikipedia has a nice summary here. Some definitions require an algorithm to provably produce correct output every single time (or with some bounded guarantee).

In that sense, the applications of machine learning are usually heuristic. There is no way to prove that a customer will definitely buy the items your recommender system suggests, or that the presence/absence of certain keywords distinguishes spam from valid email. Thus, a function like bool is_spam(std::string email_text, Model &mdl) is a heuristic.

On the other hand, the...procedure implemented in that function, or the ones that finds the support vectors of your SVM, updates the weights of your deep network, collates n-grams, etc. are often algorithms. You can prove, for example, that stochastic gradient descent will stop at a local minimum (e.g., like this). The heuristic part comes from using that local minimum to solve a practical problem.

Some classification "systems" also have heuristic pieces. For example, the training data might be smoothed or segmented in some way that happens to work well for typical inputs, but isn't provably optimal.

Matt Krause
  • 19,089
  • 3
  • 60
  • 101
  • "There is no way to prove that a customer will definitely buy the items your recommender system suggests" – Jimbo Jul 02 '21 at 09:51
  • Right. That's the point I was trying to make in the 3rd paragraph. SGD is itself an algorithm: it will provably find a local minima. The heuristic part, IMO, occurs when you try to map the weights SGD found onto the real world. In other words, I'd call model.train(data, labels), and results = model.classify(email) algorithmic. However, deciding whether to (e.g.) hide the email, not fire an alert, etc based on results->prob_is_spam is often heuristic: There's no objective threshold that definitely minimizes users' total annoyance at false positive and false negatives. – Matt Krause Jul 02 '21 at 21:23
  • There are probably more definitions than people (Yuri Gurevich alone has 3 entries on that list), so it's definitely debatable. – Matt Krause Jul 02 '21 at 21:44