Questions tagged [data-augmentation]

Data augmentation is the practice of making slight modifications to the observed data with the goal of making models trained on that data more robust.

A common application is image recognition and computer-vision. The task is to recognize when photos contain objects (e.g. cats). There's obviously no reason that a photo of a cat must always be taken from the same angle, so it's common to train the model on random rotations, flips, translations and crops/rescaling of the original image.

87 questions

votes

2 answers

Data augmentation techniques for general datasets?

In many machine learning applications, the so called data augmentation methods have allowed building better models. For example, assume a training set of $100$ images of cats and dogs. By rotating, mirroring, adjusting contrast, etc. it is possible…

machine-learning predictive-models dataset independence data-augmentation

asked May 23 '15 at 11:52

mmh

votes

6 answers

Can a GAN be used for data augmentation?

Can a generative adversarial network (GAN) be used for data augmentation (i.e. to generate synthetic examples that are added to a dataset)? Would it have any impact on the performance of a model trained on the augmented dataset?

machine-learning neural-networks gan data-augmentation

asked Apr 17 '18 at 19:39

ErroriSalvo

votes

3 answers

What are the mathematically rigorous data augmentation techniques?

Imagine you have a dataset of 1000 observations. To keep things intuitive imagine they are (x,y) coordinates. They are temporary independent, so that makes it easier. You wish you had about a million observations, but you only have 1000. How should…

mathematical-statistics dataset data-augmentation

asked Mar 05 '20 at 04:26

Legit Stack

votes

3 answers

Data Augmentation strategies for Time Series Forecasting

I'm considering two strategies to do "data augmentation" on time-series forecasting. First, a little bit of background. A predictor $P$ to forecast the next step of a time-series $\lbrace A_i\rbrace$ is a function that typically depends on two…

time-series data-augmentation

asked Dec 30 '17 at 22:44

castarco

votes

3 answers

How to do data augmentation and train-validate split?

I am doing image classification using machine learning. Suppose I have some training data (images) and will split the data into training and validation sets. And I also want to augment the data (produce new images from the original ones) by random…

machine-learning classification cross-validation dataset data-augmentation

asked Oct 05 '15 at 10:43

yangjie

votes

3 answers

Data augmentation on training set only?

Is it common practice to apply data augmentation to training set only, or to both training and test sets?

machine-learning deep-learning regularization data-augmentation

asked Dec 29 '17 at 15:57

rodrigo-silveira

1,138
3
12
16

votes

1 answer

Structure of Generative Adversarial Networks (GAN) for mapping a simulation model

There is a simulation model of a system that I want to map as a neural network to test if a better execution time can be achieved with similar accuracy. The simulation model receives real-valued measurement data of its environment and generates a…

machine-learning neural-networks generative-models data-augmentation

asked Dec 15 '19 at 11:06

Emma

votes

1 answer

MCMC and data augmentation

I have been looking at an MCMC data augmentation question; the general form of the question is as follows: Suppose data gathered on a process suggests $X_{i} \sim \text{Pois}(\lambda)$ and a prior for the rate parameter is suggested as $\lambda \sim…

self-study markov-chain-montecarlo monte-carlo data-augmentation

asked Nov 13 '12 at 12:58

user9171

1,321
3
14
24

votes

2 answers

Why is data augmentation classified as a type of regularization?

In deep learning papers, data augmentation is often presented as a type of regularization. For example, this is explored in Chiyan Zhang and coauthor's presentation at ICLR17, Understanding deep learning requires rethinking generalization. Why is…

neural-networks regularization data-augmentation

asked Jul 31 '17 at 11:59

Gilly

votes

1 answer

Does oversampling lead to more overfitting than classweights for really small classes?

Assume I have a couple of thousand hens that I want to classify into those that never lay an egg and those that will at some point in their life lay an egg. Assume that already works perfectly. Now there are a few hens who do lay eggs, but at some…

machine-learning data-augmentation

asked Jun 18 '21 at 12:39

BigBadWolf

votes

0 answers

Does EM algorithm require us to know the joint (predictive) distribution of the latent variables $Z$ when $Z$ is two-dimensional?

In its general form the E-step of the EM algorithm finds the expectation $$ Q(\theta|\theta') =\int \log[ p(Y,Z | \theta)] p(Z|Y,\theta') d Z$$ where $Y$ the data, $Z$ the latent variables, $\theta'$ the current parameters, and $l(\theta|Y,Z) =…

maximum-likelihood optimization expectation-maximization data-augmentation

asked Feb 21 '20 at 19:06

tomka

5,874
3
30
71

votes

1 answer

What are some techniques to augment tabular data?

As we know we can perform data augmentation to "image dataset". We can apply random rotation, shifts, shear and flips over images. Are there techniques to augment tabular small dataset? I know the there are sampling (oversampling, undersampling)…

sampling data-transformation dataset data-augmentation

asked Jan 25 '19 at 06:35

Abdullah Al Imran

votes

3 answers

Data augmentation step in Krizhevsky et al. paper

In the paper Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012., section 4.1, the authors describe their data …

machine-learning neural-networks conv-neural-network image-processing data-augmentation

asked Oct 22 '15 at 14:49

apples-oranges

votes

1 answer

In a parametric model, if I do not have enough data, can I estimate the parameter, and simulate data from the estimated model and estimate again?

Suppose I have a logistic regression model $Y_i=\mathbf{1}(X_i\beta>\epsilon_i)$ to estimate, where the distribution of $\epsilon_i$ is known, $X_i$ follows distribution $F_{\theta}$ with an unknown scalar parameter $\theta$. Suppose I only have 40…

logistic binary-data data-augmentation

asked Jul 22 '20 at 06:30

T34driver

1,608
5
11

votes

1 answer

Does online data augmentation make sense?

Data augmentation is popularly done online as that is how it is typically implemented and suggested in neural network frameworks like Keras and TensorFlow. I have also seen it described in e.g. the AlexNet paper. Online data augmentation implies…

neural-networks gradient-descent data-augmentation

asked Mar 25 '19 at 15:32

fabiomaia

2 3 4 5 6 Next