1

I have some data on mean kinship values of a single population for a number of consecutive years. After plotting this data, I saw that the kinship coefficient is decreases a bit until 1994/1995, after which it starts to increase.

`years <- 1980:2019
kinship <- c(0.0178, 0.0182, 0.0175,0.0173, 0.0177, 0.0174, 0.0171, 0.0177, 0.0174, 0.0177, 0.0169, 0.0173, 0.0176, 0.0181, 0.0191, 
             0.0194, 0.0201, 0.0209,0.0215, 0.0222, 0.0229, 0.0239, 0.0245, 0.0257, 0.0263, 0.0263, 0.0268, 0.0273, 0.0276, 0.028,
             0.0292, 0.0299,0.031, 0.032, 0.032, 0.032, 0.0335, 0.0345, 0.0349, 0.0343)
data <- data.frame(years, kinship)
plot(years, kinship)

I am interested in whether the kinship increases at a different rate between 1980 and 1994 compared to between 1995-2019. But I am struggling to choose a method for statistical testing.

period1 <- data[1:15,]
period2 <- data[16:40,]

At first, I thought I could use a paired t-test but I do not have equal observations in each period. I could make period2 smaller, so that there are equal observations to period1. But would a paired t-test still test for a difference between the slopes in each period? I know that I can also test the slopes with linear regression, but I am not familiar with linear regression on dependent data. Or would it be better to not separate the data, but add another variable 'Period' in a column with values 1 and 2 to specify the period, and then run regression analysis or ANOVA test?

I am just looking for some direction as I am not too familiar with statistics. Thanks in advance!

  • I am seeing a slight decrease until 1991 followed by a positive slope. – Rui Barradas Jun 10 '21 at 12:06
  • @RuiBarradas You are absolutely right, apologies. I have edited it. – pedigreeanalyst Jun 10 '21 at 13:38
  • If you know the breakpoint in advance (the current answer *estimates* the breakpoint), then [this post](https://stats.stackexchange.com/questions/61805/standard-error-of-slopes-in-piecewise-linear-regression-with-known-breakpoints) could be of interest (see Glen_b's answer in particular). – COOLSerdash Jun 10 '21 at 14:54

1 Answers1

1

You could use a segmented regression model:

library(segmented)
fit0 <- lm(kinship ~ years, data = data)
fit1 <- segmented(fit0, seg.Z = ~ years)
summary(fit1)
#breakpoint in 1991

anova(fit0, fit1)
#strong significance

curve(predict(fit1, newdata = data.frame(years = x)), add = TRUE)

resulting plot

This analysis neglects the auto-correlation within your time series but that should be fine if the effect is that strong.

Roland
  • 5,758
  • 1
  • 28
  • 60