Questions tagged [data-augmentation]

Data augmentation is the practice of making slight modifications to the observed data with the goal of making models trained on that data more robust.

Data augmentation is the practice of making slight modifications to the observed data with the goal of making models trained on that data more robust.

A common application is image recognition and . The task is to recognize when photos contain objects (e.g. cats). There's obviously no reason that a photo of a cat must always be taken from the same angle, so it's common to train the model on random rotations, flips, translations and crops/rescaling of the original image.

87 questions
23
votes
2 answers

Data augmentation techniques for general datasets?

In many machine learning applications, the so called data augmentation methods have allowed building better models. For example, assume a training set of $100$ images of cats and dogs. By rotating, mirroring, adjusting contrast, etc. it is possible…
21
votes
6 answers

Can a GAN be used for data augmentation?

Can a generative adversarial network (GAN) be used for data augmentation (i.e. to generate synthetic examples that are added to a dataset)? Would it have any impact on the performance of a model trained on the augmented dataset?
ErroriSalvo
  • 273
  • 1
  • 3
  • 12
20
votes
3 answers

What are the mathematically rigorous data augmentation techniques?

Imagine you have a dataset of 1000 observations. To keep things intuitive imagine they are (x,y) coordinates. They are temporary independent, so that makes it easier. You wish you had about a million observations, but you only have 1000. How should…
19
votes
3 answers

Data Augmentation strategies for Time Series Forecasting

I'm considering two strategies to do "data augmentation" on time-series forecasting. First, a little bit of background. A predictor $P$ to forecast the next step of a time-series $\lbrace A_i\rbrace$ is a function that typically depends on two…
castarco
  • 291
  • 1
  • 2
  • 7
19
votes
3 answers

How to do data augmentation and train-validate split?

I am doing image classification using machine learning. Suppose I have some training data (images) and will split the data into training and validation sets. And I also want to augment the data (produce new images from the original ones) by random…
18
votes
3 answers

Data augmentation on training set only?

Is it common practice to apply data augmentation to training set only, or to both training and test sets?
8
votes
1 answer

Structure of Generative Adversarial Networks (GAN) for mapping a simulation model

There is a simulation model of a system that I want to map as a neural network to test if a better execution time can be achieved with similar accuracy. The simulation model receives real-valued measurement data of its environment and generates a…
8
votes
1 answer

MCMC and data augmentation

I have been looking at an MCMC data augmentation question; the general form of the question is as follows: Suppose data gathered on a process suggests $X_{i} \sim \text{Pois}(\lambda)$ and a prior for the rate parameter is suggested as $\lambda \sim…
8
votes
2 answers

Why is data augmentation classified as a type of regularization?

In deep learning papers, data augmentation is often presented as a type of regularization. For example, this is explored in Chiyan Zhang and coauthor's presentation at ICLR17, Understanding deep learning requires rethinking generalization. Why is…
Gilly
  • 247
  • 3
  • 8
6
votes
1 answer

Does oversampling lead to more overfitting than classweights for really small classes?

Assume I have a couple of thousand hens that I want to classify into those that never lay an egg and those that will at some point in their life lay an egg. Assume that already works perfectly. Now there are a few hens who do lay eggs, but at some…
BigBadWolf
  • 163
  • 2
6
votes
0 answers

Does EM algorithm require us to know the joint (predictive) distribution of the latent variables $Z$ when $Z$ is two-dimensional?

In its general form the E-step of the EM algorithm finds the expectation $$ Q(\theta|\theta') =\int \log[ p(Y,Z | \theta)] p(Z|Y,\theta') d Z$$ where $Y$ the data, $Z$ the latent variables, $\theta'$ the current parameters, and $l(\theta|Y,Z) =…
6
votes
1 answer

What are some techniques to augment tabular data?

As we know we can perform data augmentation to "image dataset". We can apply random rotation, shifts, shear and flips over images. Are there techniques to augment tabular small dataset? I know the there are sampling (oversampling, undersampling)…
6
votes
3 answers

Data augmentation step in Krizhevsky et al. paper

In the paper Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012., section 4.1, the authors describe their data …
5
votes
1 answer

In a parametric model, if I do not have enough data, can I estimate the parameter, and simulate data from the estimated model and estimate again?

Suppose I have a logistic regression model $Y_i=\mathbf{1}(X_i\beta>\epsilon_i)$ to estimate, where the distribution of $\epsilon_i$ is known, $X_i$ follows distribution $F_{\theta}$ with an unknown scalar parameter $\theta$. Suppose I only have 40…
T34driver
  • 1,608
  • 5
  • 11
5
votes
1 answer

Does online data augmentation make sense?

Data augmentation is popularly done online as that is how it is typically implemented and suggested in neural network frameworks like Keras and TensorFlow. I have also seen it described in e.g. the AlexNet paper. Online data augmentation implies…
fabiomaia
  • 579
  • 3
  • 14
1
2 3 4 5 6