I have collected data for 3 decades for 5 things. I want to see if there is a change in these five things over the period of three decades. For example if A was 20 out of 80 in decade1, 40 out of 120 in decade 2 and 50 out of 70 in decade 3. now i want to check if the change in A was statistically significant over the period of 3 decades. do i use chi square, annova or any thing else. i have already made trend analysis diagrams and equations, but dont have a statistical test ! :(
-
You might want to clarify: you have five items, with three (count) measurements taken over time? How are these five items related, if at all? (E.g. they are mutually exclusive and collectively exhaustive?) – Wayne Mar 20 '13 at 14:17
-
@Wayne : these are all types of news items. for instance , number of political, cultural, etc news . so i have 5 categories of different types of news and and their values for 3 decades – user22089 Mar 20 '13 at 17:35
-
So the types of news are exclusive, it sounds like. That is no news article is both political and cultural, correct? But the types are not collectively exhaustive: there are other types of news that you don't count. And you have one count for each decade -- three counts total -- for each news type, right? – Wayne Mar 20 '13 at 18:15
-
@Wayne : yes that's right. types of news are exclusive. And they are collectively exhaustive also because all news items have been put in either of the 5 types depending on the dominant theme. So they are mutually exclusive and collectively exhaustive. and yes i have three counts total for each news type. Now what statistical test do i use to study if there is a significant change in each type of news in three decades? – user22089 Mar 20 '13 at 18:44
1 Answers
I believe you'd use a Pearson's chi-squared contingency table test (of independence) on your 3x5 table. In R the chisq.test
does what you want (put the data into a table).
I suspect that the collectively exhaustive part might affect the answer, though. Perhaps now that we know what you have, someone more knowledgeable than I can chime in.
EDIT 3: Upon clarification, it sounds like you want to look at the percentages of Type A over the three decades. This kind of regression is mentioned in another thread. And the bottom line is to do a GLM regression with a pro bit link function.
EDIT 2: I don't know what tests are more powerful than others, but if you did the independence test, above, and you cannot reject independence, that's a sign that there is not a significant trend. If you could reject independence, then the number of articles by type does change significantly through the decades: though that doesn't mean there is a trend to the changes.
Also, you say that you want to look at type A and see if it's trended over the decades. Do you actually want to work with just the count of A, or with the percentage of A type stories? That is, since your types cover all possible news, I'd assume all of the types would increase over time. The question might be, as a percentage of all news, is type A significantly less now than it was 30 years ago.
Last, do you need statistics at all? If you look at stories of type A and see 30, 70, 80, you can say that they have increased over time. Period. You only need statistical tests when uncertainty is involved.
EDIT: What I say above would indicate that the data count types are not independent of the decade. To actually show a trend, you'd use a linear regression. In the case of count data, I believe you'd want to use a generalized linear model (glm in R) with a Poisson link function. Look for the statistical significance (or not) of the slope coefficient.
Though with only 3 data points, a "trend" is going to be hard to prove unless it is very large. (That is, you have low power.)
-
thank you. but i cant' run the test of independence because there are no different row and column values. there are just three values i have for each category for 3 different decades. now i want to check if there is a statistically significant difference in these values. i think the chi sq goodness of fit can be used but i am not sure because the total number of news or N in each decade is different. – user22089 Mar 20 '13 at 19:33
-
@user22089: The rows are your categories and the columns are your decades are they not? – Wayne Mar 20 '13 at 19:46
-
but decades dont' have values. they are just 1, 2, 3. For e.g if i said out of 100 people , 60 say yes and 40 say no. then i run a chi sq goodness of fit , with 60 and 40 as my observed frequency and 50, 50 as my expected. so i get to find out if thr is significant difference between the responses of yes and no. Test of independence is run if i say how many males and females said yes and no, i.e. association between gender and response. Here in my case i cant' study the association between decade and type of news. i want to see each news separately, did it change in 3 decades? – user22089 Mar 20 '13 at 20:07
-
No decades don't have values, but neither do your news types. You have a two-dimensional contingency table of counts of news article by type and decade. I was proposing a test of the independence of news type and decade (i.e. has the type of news stayed the same across the decades). You're saying you want to do five tests, one of each type across decades. Sounds to me like you'd have to make some kind of multiple-comparison correction. – Wayne Mar 20 '13 at 20:40
-
in case i want to do five tests, one of each types across decades, then what is multiple-comparison correction ? :( – user22089 Mar 20 '13 at 20:45
-
@Wayne -- a good start, but a basic chi-square test of independence wouldn't answer the question of whether there was a trend per se (although with only three time points, plotting or determining trends is pretty limited.) – James Stanley Mar 20 '13 at 23:30
-
@James Stanley : Thank You. If i have more than three time points, say eight, n have made trend analysis plots on a software called Minitab and have got a trend analysis equation also , that tells me the average increase per unit of time. Now i just want to know if the difference in values at each point of time is statistically significant or not. Isnt' chi sq goodness of fit used to check the same? the only problem is that the total number is different in each decade. like political news 20 out of 80 in decade1, 40 out of 120 in decade 2 and 50 out of 70 in decade 3 and so on. – user22089 Mar 21 '13 at 03:48
-
@ wayne ( EDIT) I did use linear regression and have a slope coefficient for a trend equation. how do i know the statistical significance thru that ? – user22089 Mar 21 '13 at 12:30
-
@user22089: That depends on your statistical package, which should indicate a p-value for whether the slope is statistically significantly different from zero. (That is, does the confidence interval of the slope coefficient include zero or not.) But with three data points, I'd be suspicious of any "trend". That's very little data. If you could get data by year, that'd give you 30 data points (per type) which would be a better number with which to calculate a trend. – Wayne Mar 21 '13 at 13:10
-
@Wayne : Thank you so much. I in fact have a data collected for 8 decades, and thru the systematic collection of sample i have 240 points but i clubbed them into decades and had eight points. Was just trying to understand what stat to use thru the smaller example. Now for instance for a particular type of news, the trend equation that i got with eight points was Yt = 16.93 + 6.82*t . i have been told to read it as , mean value of news features increased by 6.82 units in time t which stand for a decade. do u think its any good? – user22089 Mar 21 '13 at 13:20
-
@wayne : and i am using minitab for this part of the study and it dint give a p value ! :( – user22089 Mar 21 '13 at 13:23
-
-
@wayne : Type A was 20 out of 80 in decade1, 40 out of 120 in decade 2 and 50 out of 70 in decade 3. Through percentages i could easily conclude that A increased or decreased over time. But i am not sure if that is going to be enough. so i plotted trend analysis and got trend equations and discussed average increase per decade. I am not sure if this is enough and have been asked to prove any result with a statistical test. Also, i plotted a 3x5 cross tab and attempted the test of independence and got a p value of 0.000. so should i conclude that decades and types of news are associated ? – user22089 Mar 21 '13 at 13:55
-
@Wayne : I used GLM in spss, put the type of news in dependent and decades in independent .Got a table of output where Unstandarized Coefficients : B (constant) is 16.929 and Decade 6.821. i got a plot for same data as in another software with trend equation as Yt = 16.93+ 6.82*t. I see a similarity in both results. But when i put decade as independent variable i coded it as 1,2,3,4,5,6,7,8 for eight decades , like 1 =1931-39, 2=1941-49 and so on, now what if the software assumes them as real numbers while getting the output rather than seeing them as decades. – user22089 Mar 23 '13 at 01:05