Questions tagged [aggregation]

Refers to "lumping together" potentially inhomogeneous groups of data.

Aggregation refers to "lumping together" potentially inhomogeneous groups of data. The laws of total expectation and variance can be thought of as providing a way to calculate the mean and variance of an aggregated data set, if the variable being conditioned on ($Y$ in the Wikipedia articles) is the grouping variable being aggregated over.

When aggregating data, the resulting distribution is marginal to the original datasets.

241 questions
17
votes
1 answer

Quantiles from the combination of normal distributions

I have information on the distributions of anthropometric dimensions (like shoulder span) for children of different ages. For each age and dimension, I have mean, standard deviation. (I also have eight quantiles, but I don't think I'll be able to…
15
votes
4 answers

How to aggregate by minute data for a week into hourly means?

How would you get hourly means for multiple data columns, for a daily period, and show results for twelve "Hosts" in the same graph? That is, I'd like to graph what a 24 hour period looks like, for a weeks worth of data. The eventual goal would be…
Scott Hoffman
  • 253
  • 1
  • 2
  • 7
14
votes
6 answers

Fast ways in R to get the first row of a data frame grouped by an identifier

Sometimes I need to get only the first row of a data set grouped by an identifier, as when retrieving age and gender when there are multiple observations per individual. What's a fast (or the fastest) way to do this in R? I used aggregate() below…
lockedoff
  • 1,795
  • 2
  • 12
  • 19
14
votes
1 answer

How do you choose a unit of analysis (level of aggregation) in a time series?

If you can measure a time series of observations at any level of precision in time, and your goal of the study is to identify a relationship between X and Y, is there any empirical justification for choosing a specific level of aggregation over…
Andy W
  • 15,245
  • 8
  • 69
  • 191
12
votes
2 answers

What statistics are preserved under aggregation?

If we have a long, high resolution time series, with lots of noise, it often makes sense to aggregate the data to a lower resolution (say, daily to monthly values) to get a better understanding of what's going on, effectively removing some of the…
naught101
  • 4,973
  • 1
  • 51
  • 85
12
votes
2 answers

Should I run separate regressions for every community, or can community simply be a controlling variable in an aggregated model?

I am running an OLS model with a continuous asset index variable as the DV. My data is aggregated from three similar communities in close geographic proximity to one another. Despite this, I thought it important to use community as a controlling…
11
votes
6 answers

How to find summary statistics for all unique combinations of factors in a data.frame in R?

I want to calculate a summary of a variable in a data.frame for each unique combination of factors in the data.frame. Should I use plyr to do this? I am ok with using loops as opposed to apply() ; so just finding out each unique combination would be…
humble Student
10
votes
1 answer

How to combine regression models?

Say I have three data sets of size $n$ each: $y_1$ = heights of people from the US only $y_2$ = heights of men from the whole world $y_3$ = heights of women from the whole world And I build a linear model for each with factors $x_i$, $i = 1,...,…
10
votes
1 answer

Random Forest Probabilistic Prediction vs majority vote

Scikit learn seems to use probabilistic prediction instead of majority vote for the model aggregation technique without an explanation as to why (1.9.2.1. Random Forests). Is there a clear explanation for why? Further is there a good paper or…
user1745038
  • 256
  • 1
  • 3
  • 10
8
votes
3 answers

Intraclass correlation and aggregation

Imagine that: You have a sample of 1000 teams each with 10 members. You measured team functioning by asking each team member how well they think their team is functioning using a reliable multi-item numeric scale. You want to describe the extent to…
7
votes
0 answers

Accuracy of aggregate vs. disaggregate forecasting

I've found a few interesting articles online on this topic, but none which appear to be too cut and dry. My question is coming up with an accurate predictive forecast based on forecasting individual component parts, then adding then up (or whatever…
user45867
  • 241
  • 2
  • 6
7
votes
0 answers

What techniques are there available for averaging misaligned multivariate time series?

I want to get an average time series for a set of multivariate (2-3 coordinates) time series. My aim is finding the usual pattern of several processes. I researched the literature a bit and I only reached this paper that showed a DTW based approach…
Jon Nagra
  • 353
  • 3
  • 10
7
votes
2 answers

Aggregation of Correlations Coefficients (Spearman)

in an analysis of survey data, I have to deal with multilevel/three-dimensional data. Now, I need to aggregate correlation coefficients found on the individual level (between individual rank-orders) and then compare these coefficients. The original…
BurninLeo
  • 471
  • 3
  • 13
7
votes
2 answers

What is the terminology for data aggregated via summed totals versus data aggregated via means?

The two types of data differ in that if you decide to decrease the temporal (time) resolution of the first type of data you take the mean of lower the resolutions. With the second you take the sum over the lower resolutions. Here is a concrete…
josh
  • 3,119
  • 4
  • 12
  • 14
6
votes
1 answer

How to make a combination (aggregation) of quantile forecast?

Framework. Fix $\alpha\in ]0,1[$. Imagine you have $n$ $\alpha$-quantile forecast methodologies that give you, at time $t$ for look ahead time $t+h$, an estimation of the quantile of wind power. Formally, for $i=1,\dots,n$, you know how to produce…
1
2 3
16 17