A lot of the datasets presented to us in the company at which I'm currently an intern are very large (many millions of rows / Gigabytes, or even Terabytes of data).
While running machine learning experiments, I find myself wanting to use (cross validated) grid searching algorithms to optimize hyper-parameters for the models I train. Time-wise, this is a very costly affair to do on the datasets described above.
Therefore I found myself wondering whether it would be a valid approach to take a smaller, random (or maybe stratified?) subsampling of the dataset to use in parameter tuning, so I can use these parameters to train the final model on a large portion of the dataset, or even the dataset as a whole?