What is data blending?

Question

This term appears frequently in the method-related threads.

Is blending a specific method in data-mining and statistical learning? I cannot get a relevant result from google.

It seems blending is mixing up outcomes from many models and resulting in a better result. Is there any resource that helps me knowing more about it?

the question title is data (input) blending, but the question changes in-text to prediction (outcome) blending.Ways to do the second are well known, but the first, input data blending, sounds more interesting — develarist, Jul 05 '20 at 17:32

score 12 · Answer 1 · answered Jul 19 '12 at 10:38

12

Boosting (as mentioned in the linked discussion) is a method that combines a set of algorithms to get a result that is better than what you can get from any single algorithm. For example random forests is a method for combining various classification trees for a classification algorithm. This approach is formally called ensemble averaging (although the algoithm usually applies majority rule). Blending seems to be a word some people use to describe a boosting approach to classification.

answered Jul 19 '12 at 10:38

Michael R. Chernick

39,640
28
74
143

So can this be called blending,if I replace the classification trees in a normal adaboost model with other set of algorithms? – TomHall Jul 19 '12 at 10:41
Hi,Michael.Your answer is really helpful but , shame on me , I am new to stats and haven't got enough reputation to vote up your answer yet. – TomHall Jul 19 '12 at 14:01
1

Well then remember to do it after you get a few rep points. – Michael R. Chernick Jul 19 '12 at 14:11

score 10 · Accepted Answer · answered Jul 19 '12 at 12:03

10

http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf Some papers to help you further understand what blending is. I think you can also google for ensemble selection/learning, and stacking as well.

Your general understanding of 'mixing up outcomes from many models and resulting in a better result' is correct though.

answered Jul 19 '12 at 12:03

King

738
3
11

That link was taken from somewhere in the kaggle forum as well. I only saved the link of the pdf but not the discussion though... – King Jul 19 '12 at 12:42

score 0 · Answer 3 · answered Sep 15 '17 at 09:11

0

In industry data blending is not about models but about preprocessing: It is the when data is merged that comes from different sources, like one from a database and other data from CSV files.

answered Sep 15 '17 at 09:11

Make42

521
4
17

score 0 · Answer 4 · answered Apr 02 '21 at 14:13

It seems blending is mixing up outcomes from many models and resulting in a better result. Is there any resource that helps me knowing more about it?

Indeed, this is how they work. They try to give an optimal weight to (or directly learn from) the outputs of other learners. They usually achieve state of the art performance (after careful tuning) over almost all the datasets.

These weights are actually given to the "out of folds" predictions over cross validated models (to avoid giving the highest weights for models who have a perfect accuracy on the training set, such as random forests)

As the resources were quite scarce, I wrote these two articles:

Introduction to blending in Python (method and implementation oriented)

Why does blending works ? (theoretical arguments about the success of this method)

What is data blending?

4 Answers4

Linked

Related