In the context of data streams concept drift describes the phenomenon of the underlying distribution of a variable changing over time, often negatively affecting the performance of statistical models trained on perviously observed data.
Questions tagged [concept-drift]
30 questions
8
votes
1 answer
Difference between distribution shift and data shift, concept drift and model drift
Lately, I am seeing both terms used interchangeably in several scenarios.
Joaquin Quiñonero in MIT press (NIPS), Dataset Shift in ML
NIPS 2021 workshop in DistShift
Model drift: Towards Data Science
Are there differences in the definitions?…

Carlos Mougan
- 238
- 2
- 10
5
votes
1 answer
How does LightGBM deals with incremental learning (and concept drift)?
With some research I found that it updates the leaves (does not create new or remove old ones) is it right? How this happens?
Another question is when the incremental learning is done in concept shifting data, is LightGBM good
to deal with this…

Jader Martins
- 185
- 10
4
votes
2 answers
lifetime of fraud detection models
Suppose we are building/testing a fraud detection model for a specific credit card/ or a quick cash loan business. We have a lot of data to play with (say past 5years), and after careful preprocessing, model selection, and parameter-tuning, we build…

user6396
- 423
- 5
- 12
4
votes
1 answer
Is there a way to adapt machine learning models knowing ex ante that distributions will shift?
I am currently working on a topic where I know that the distributions of the output and of the covariates will shift. I know for example that some covariates will at least follow the inflation rate. The goal is to predict the output over time, based…

LouisBBBB
- 193
- 13
3
votes
0 answers
How can you determine whether there is concept drift or whether a model is affecting the distribution of the target class?
Assume that I am building a churn prediction model, and I collect observational data of customers who registered in the last 12-18 months. Assume that 50% of customers churned. Customers who are predicted to churn are receiving more favorable…

Jay Ekosanmi
- 561
- 1
- 10
3
votes
1 answer
Are ensemble learning methods for data streams restricted to online or batch learning?
Recently I'm working on some online learning algorithm (using RBF neural network ) for classification. As I read papers in this area I found there is an issue in online-learning called concept drift problem which my algorithm has and I have to find…

mkafiyan
- 237
- 1
- 8
3
votes
1 answer
Machine learning or statistical models that account for time evolution and underlying system changes
I wonder if there are some algorithms that can account for underlying system dynamics over time.
One possible situation can be the following: in a ticket reporting data, a data point arrives when a problem is reported, and a ticket log is created…

hurrikale
- 853
- 1
- 8
- 7
2
votes
0 answers
Find a representative samples from an estimated distribution by KDE
I served a Neural Network model trained on a huge (timeseries) dataset. In production, I would like to monitor the newly received data and check if there is a drift in the features using K-S testing. To run the test, I am required to provide a…

Coderji
- 71
- 5
2
votes
0 answers
Explanation(s) for unimodal distribution of prediction probability computed by Random Forest
I have a typical binary classification problem with a sample of ~700 instances where I fitted multiple classification models including logistic regression, SVM and Random Forest.
The instances are represented by legacy features, and the…

Nmws
- 21
- 2
2
votes
0 answers
Disadvantages of "moving window ensemble" approach?
Assuming online/incremental training is not available for a particular algorithm, and assuming that you have a stream of training data that may or may not change over time (eg log data), what are the disadvantages to the following approach to defend…

dvas0004
- 21
- 3
2
votes
1 answer
How do the terms non-stationarity, concept-drift and evolving data relate to each other?
I often see the terms non-stationarity, concept-drift and evolving data in the same context, as if they were interchangeable. Are they? Or is there some subtle nuance that I am missing?

au.re
- 121
- 1
2
votes
1 answer
concept drift detection
I'm working on a project that involves concept drift detection for a time series. Are there any well-known techniques/methods/algorithms that are known to be effective for this sort of problem?
Currently, I was thinking of using a Kalman filter…

Glassjawed
- 457
- 3
- 13
1
vote
4 answers
Benchmark Data-sets for Concept Drift where important predictors (independent variables) change with time or stream of observations
I'm currently searching the web and literature for streaming classification datasets with concept drift. I've found a number of synthetic datasets where over time the important predictors either change in their "predictive" nature.
For example here…

Andrew Cassidy
- 476
- 3
- 15
1
vote
0 answers
Active learning to counter concept drift
I'll be doing my thesis soon on model drift detection and possible remedies in a production environment. I'll probably be making an intuitive (hopefully!) theoretical framework with various types of model drift, root causes and solutions. Later on…

Zestar75
- 11
- 1
1
vote
0 answers
How to update a keras LSTM weights to avoid Concept Drift
I´m trying to update a Keras LSTM to avoid the concept of drift. For that I´m following the approach proposed in this paper [1] on which they compute an anomaly score and they apply it to update the network weights. In the paper they use the L2 norm…

kevin
- 111
- 3