Suppose I have two random variables Y and X, where Y is given as one point while X is given as a distribution. I am trying to predict Y based on X, however I cannot put the whole distribution of X in a column as I do not have one value. I could estimate some statistic(s) for the whole distribution and use only that, however I loose too much information this way. Are there some options for such cases, to be able to include more (ideally all) information about the distribution in a standard tabular form to be used in standard statistical modeling?
As an example, suppose I am trying to predict the weather tomorrow (Y), and my X is a distribution of possible values obtained through simulations. How could I include as much information as possible about X, while still keeping the column numbers to a minimum, so as to avoid high-dimensional data?
Imagine that a simulation is run each day and produces a fixed number of samples for X, let's say 1000, which depicts a probabilistic outcome of Y for tomorrow. I am able to run such a simulation because I partly understand the process which generates Y. The simulations results - the whole distribution of X - is important, since it carries a lot of information about the range, shape, modes, etc. of the possible outcomes, and I postulate that a better prediction of Y can be obtained using information about the whole distribution, rather than using just one estimate.
Note1: the distributions of X are arbitrary and do not follow any standard ones.
Note2: the values change over time and are not stationary.