I am conducting a word analysis project where I find the relative frequency of a certain word over time in a corpus of film reviews. The corpus changes in size over time so someone suggested to me that I conduct a weighted regression to take this into account, as the variance of the occurrence of the word will be higher in years when the corpus is smaller. Now, I had thought that a weighted regression was merely a normal regression but with weights attached to each observation (so relative frequency of word y in year x is weighted by the size of the corpus in that specific year). I looked it up online, and a weighted regression turns out to be a different beast entirely. I need a standard deviation for Y for each year. Yet, in this project I only have one observation for each year: total occurrence of a word divided by number of words in that year. What is there that can vary within a year? Am I misunderstanding how a weighted regression works? Is a weighted regression in truth not suitable for my project?
Hope this is clear. This is really driving me nuts