2

I have data sets of the returns of two indexes in the same market (two different sets of stocks constituting each index), with 496 observations for each. I want to compare if the means are statistically different. I believe the variances are different, so I think I have to check if the variances are statistically different first. How would I do these things?

MånsT
  • 10,213
  • 1
  • 46
  • 65
James
  • 21
  • 1
  • 2
  • This question didn't completely make sense to me. Please make sure it still asks what you want to know. Further, can you provide more detail about your data & what you want to know from it & thus, what you want to know from us? Are you wondering how to do a t-test?, if t-tests are valid if the variances differ?, how to adapt a t-test if the variances differ? something else? – gung - Reinstate Monica Jul 31 '12 at 02:02
  • Do you need to know how to run a t-test in your software? (What software are you using?) Are you wondering what a t-test is / how it works? – gung - Reinstate Monica Jul 31 '12 at 03:14
  • If two data sets of returns with different variances and different means, is it valid to use t-test, and if yes? how? I'm using excel – James Jul 31 '12 at 03:18
  • My data is 496 observations of two sets of returns, the sets represents two indexes, I got the descriptive statistics, and both have different means and variances, I want to compare the difference between the means, is it statistically significant ?? or I would check the Sharpe Ratio difference is is statistically significant ? the bottomline I want to check if the outperformance of is statistically significant – James Jul 31 '12 at 03:41

2 Answers2

2

The t test is primarily employed for data that is paired ( e.g. N independent pairs of readings before and after some activity) or for N independent readings on two characteristics (indices). You don't have independent observations since you have time series data that is most probably auto-correlated (within structure). The cross-correlation coefficient (among structure) also requires independent (within structure) draws as there is a requirement for joint normality which requires statistical independence of the draws. Again time series data is by it's very nature not usually independent. Please see "Why Do We Sometimes Get Nonsense Correlations between Time-series?" (1926), an investigation of a form of spurious correlation, in http://en.wikipedia.org/wiki/Udny_Yule AND http://empslocal.ex.ac.uk/people/staff/dbs202/cat/stats/corr.html for more. The best way to determine the relationship between two time series is to review How to identify transfer functions in a time series regression forecasting model?.

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • What do you mean with "auto-correlated (within structure)". Why would two time-series be correlated if they measure for example temperature in chamber 1 and temperature in chamber 2? – Herman Toothrot Sep 20 '17 at 09:36
  • yes ..within structure . If you had two series that one knew were unrelated .. you might still want to test the "equivalence" of the two ARIMA models via the Chow test. – IrishStat Sep 20 '17 at 10:54
  • I have read the material you have pointed to but I am still not sure how to proceed, here is my case https://stats.stackexchange.com/questions/304072/compare-time-series-of-measured-properties-to-control-no-forecasting – Herman Toothrot Sep 21 '17 at 14:07
  • please post the data in a csv file to your latest question – IrishStat Sep 21 '17 at 15:07
2

IrishStat makes a good point. If your data are two time series you have to take correaltion into account when comparing means. But the time series modeling is even more important because if the series are nonstationary because of a time-changing mean it may not make sense to just compare the averages over the length of the series. How the indices change with time would be more likely to be what you are interested in.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • As you pointed out we are often interested in testing WHEN the means differ or how the indices change over time. If two time series have the "same" mean AND at some point in time the means "differ",this phenomenon would be detected via Intervention Detection procedures as a level/step shift OR intercept change. A level/step shift variable is a sequence of zeroes followed by a sequence of ones. – IrishStat Jul 31 '12 at 13:28
  • @IrishStat Very true. My other point is that the Box-Jenkins procedure to look for a very slowly decaying autocorrelation function to indicate a trend in the mean idenitfies polynmial trends through first differencing or repeated firat differencing. – Michael R. Chernick Jul 31 '12 at 13:38
  • OR a series that has a step/level shift.This caveat was ignored by Box and Jenkins as they assumed that the underlying/observed series was free of unspecified deterministic structure such as Pulses, Level Shifts , Seasonal Pulses and/or Local Time Trends. They handled time trends using a suitably difference model with a steady state constant while implicitely ignoring time trends of the form 0,0,0,0,1,2,3,4,5 as an unspecified predictor series.Note that a step is the difference of a time trend while a pulse is the difference ofa step. Converselya time trend is the first sum of a step – IrishStat Jul 31 '12 at 14:32
  • @IrishStat To be fair Box and Tiao did a lot of work on intervention analysis and didn't it at least get some coverage in the later editions pf teh Box-Jenkins book? – Michael R. Chernick Jul 31 '12 at 16:07
  • You are quite correct . I should have said the seminal work was at fault and starting in 1979 following the work of G.C. Tiao and I. Chang , model identification became robustified (see I. Chang "arima model specification in the presence of outliers"). – IrishStat Jul 31 '12 at 16:20
  • More formally : Chang, I., and Tiao, G.C. (1983). "Estimation of Time Series Parameters in the Presence of Outliers," Technical Report #8, Statistics Research Center, Graduate School of Business, University of Chicago, Chicago. – IrishStat Jul 31 '12 at 18:58