Questions tagged [down-sample]

Using aggregate data (e.g. monthly) when data on a finer scale (e.g. daily) is available.

In digital signal processing, downsampling is the process of resampling in a multi-rate digital signal processing system. Downsampling can be synonymous with compression or describe an entire process of bandwidth reduction (filtering) and sample-rate reduction. When the process is performed on a sequence of samples of a signal or other continuous function, it produces an approximation of the sequence that would have been obtained by sampling the signal at a lower rate (or density, as in the case of a photograph).

Based on the article "Downsampling (signal processing)" in Wikipedia.

23 questions
4
votes
1 answer

Importance of a Data Point in Regression

I am doing a gaussian process regression. This regression doesnt scale well as it grows $\mathcal{O}^3$. I would like to know if there are any methods that can be used to determine the importance of the data point and its contribution to the…
user0193
  • 176
  • 7
4
votes
1 answer

Convert predicted probabilities after downsampling to actual probabilities in classification

If I use undersampling in case of an unbalanced binary target variable to train a model, the prediction method calculates probabilities under the assumption of a balanced data set. I discovered two formulas to convert these probabilities to actual…
tover
  • 153
  • 1
  • 4
3
votes
1 answer

correcting for extremely downsampled data: keras class_weight is hurting my model

I have an extremely imbalanced dataset (millions of times more negatives) for a binary classification NN model. I am aggressively downsampling solely for the purpose of making training time manageable, (not to be confused with downsampling in order…
2
votes
0 answers

Numerical differentiation (derivative) and downsampling

I have some time course data which I would like obtain the first derivative of. As it seems quite difficult to model, I do not intend to fit a function to it, but rather compute the first derivative numerically (taking the difference of each measure…
TheChymera
  • 754
  • 2
  • 10
  • 24
2
votes
0 answers

How to use cross validation when you have missing data & rare events?

I am trying to use repeated cross validation to test my classifier. Moreover, I want to use imputation due to missing values and downsampling due to unbalanced data (I have 88% of my data in the positive class and 12% in the negative class). My…
1
vote
0 answers

Forecasting monthly stock returns with daily data and down-sampling concerns

I have daily stock return data (log returns). I want to forecast returns for the next two months. I am creating forecasts with both univariate ARIMA and GARCH models with regressors. What are the dangers /assumptions I am making if I downsample the…
1
vote
0 answers

While dealing with imbalanced classes, to what extent can we upsample a minority class?

I have my training data with the following approximate distribution: Negative events : 90,000 positive events : 5,000 Training a model would require to oversample the minority class (and might also need to undersample the majority class) as the…
1
vote
1 answer

Does downsampling decrease the entropy of the data?

Suppose we have an $n-dim$ time-series $X={x_1, x_2, \cdots, x_n}$ and we resample it to $m-dim$, $\hat{X}={\hat{x}_1, \hat{x}_2, \cdots, \hat{x}_m}$, where $m < n$. Can we say this downsampling operation, always decrease the entropy, $H$, of the…
moh
  • 438
  • 5
  • 13
1
vote
0 answers

should I resample/downsample the patients in the control arms?

I have retrospectively collected clinical data of two sets of patients, one set with the diagnosis of tumor A (group A) and the other with tumor B (group B). There're 90 patients in group A, and 1100 patients in group B (as disease A is…
1
vote
1 answer

Is Tomek Link undersampling the same as Edited Nearest Neighbours with 1 neighbour?

From what I've read I've understood that undersampling the majority class with Tomek Links or Edited Nearest Neighbours with 1 neighbour should yield the same result. However, I've tried it on this library I've been working with called…
1
vote
1 answer

Kappa and downsampling, selection of data set

I have a unbalanced data set and use Cohen's kappa and AUC as performance measure. Without down sampling the Kappa value is around 0.85, with random down sampling it is 0.95. and with a house-made focused down sampling it is approx 0.75. Which data…
Matthias
  • 303
  • 1
  • 3
  • 7
1
vote
0 answers

Does downsampling affect regression results?

How is linear regression affected by downsampling the explanatory variable? To be more precise, I would sort all the values of $x$, and then split into a a number bins with equal number of points in each bin (note that each bin may have a different…
max
  • 1,254
  • 1
  • 12
  • 29
1
vote
0 answers

Bias from stratified sampling

Due to a lack of significance and the large size of the dataset (which had binomial responses with 20,000 responses out of a sample of 15,000,000) my peer has used random sampling to reduce the amount of data and import into our modelling…
1
vote
0 answers

How to determine the correlation between data sets with the same period but different sample rates?

I am trying to determine the correlation between two sets of data points which span the same time period (20 minutes) but have different resolutions. The first set was recorded at 1-minute intervals, while the second was recorded at 2-second…
0
votes
0 answers

Downsampling by sample, not by class, with lots of missing values

In R, I'm trying to downsample to the lowest number of samples, or a ratio would work as well, by class. All I see for downsampling is evening out class frequency, not sample frequency/sample rate. Example, I have a matrix with 2 classes, C1 and C2,…
John H
  • 1
1
2