I have come across a usage of the t-statistic that I don't understand. It comes from a government panel. It is work related, and I can't provide the original reference, but I do want to try to understand better what is happening in the calculation.
Basically we are trying to predict a condition using observations from a variety of sites. In January (for example) we have 30 years of temperature measurements from 8 locations within a valley. These are run through a principle components regression in order to produce coefficients for an equation to predict moisture in April. We also have years of moisture records without accompanying temperature readings.
The official forecast next year will then be determined by getting the new measurements on January 1, feeding them into the equation and adding up all of them sub-products. The mean forecast in the training period is within a couple of percent of the mean observed value over the full 80 year data set. We do not necessarily expect the moisture to be normally distributed, nor are the temp readings. (These are not the actual natural phenomena in question)
I am clear this far.
There is also a desire to have a confidence interval: We are 95% sure that the soil moisture will be less than or equal to value $Moisture_{95\%} $.
$Moisture_{95\%} $ is calculated as follows:
30 years of record went into generating the forecast equation, so there are 29 degrees of freedom. $df = 29$
As part of the PCR process, the cross-validated standard error for the forecast equation was generated. The $CVSE$ is 15-20% of the mean forecast.
We lookup 29 and 0.05 on a table of t-stats. $t_{29, 0.05} = 1.699$
We put in this years temp measurements to get the $Moisture_{fcst}$
These are then combined with $Moisture_{95\%} = Moisture_{fcst} + CVSE*t_{29, 0.05}$
This whole approach almost makes sense to me, but there are two points I am unclear on:
Why is the df on the t-stat lookup equal to 29 (number of years in the training period-1)? I had expected that the df would be related to the # of principal components in the regression.
If $t_{\beta_{0}} = \frac{\beta-\beta_{0}}{s.e.{\beta}}$ (wikipedia} why is it ok to use the CVSE instead of the s.e., and treat $\beta_{0}$ as a "non-random, known constant" although it is actually the output of a forecast?