I'm looking at how temperature affects length. My length variable is the mean length calculated for every year, it is derived from ~10,000 data points. Not every year had the same sampling effort (e.g. 1998 n=300 vs 2001 n=2078).
A colleague suggested that since my 20 length data points are in fact derived from ~10,000 data points and each year had a different sample size I should consider using sample size as a weighting variable. I am not sure how to best implement this. I came across this post, which made sense and listed exactly my case "analyzing data in an aggregated form, such as the weight variable encodes how many original observations each row in the aggregated data represents". However, I am a bit confused, as this goes on to use frequency as a weighting variable.
I am uncertain as to how to calculate a weighting variable for my data? And once its calculated can I use it in the weights
argument in lm()
?
Or does it make more sense to calculate a weighted arithmetic mean for each year as in this post and use that in place of the mean length I used before?