For a school project, we are asked to determine the expected pass rate of a driving test based on yearly data provided from 2010 to 2019. I am wondering if there is any statistical justification to determine whether I should look at the data for a particular year or several years.
I imagine I can begin by using hypothesis testing to test if mean pass rate for different years are significantly different. If not then they might share the same mean and therefore it is sensible to looking at the years combined.
However, I feel there should be some better/ formal way to investigate this but this was not visited at the moment during my classes.So I will be really grateful if anyone can shed some light on the issue.
To elaborate further, if I want to justify using 2019 mean pass rate as the expected pass rate for someone who will take the test this year instead of the mean rate from 2010 to 2019, how might one formally do that in the statistical sense. Or is it always better to include more data?
Data-set: https://www.gov.uk/government/statistical-data-sets/car-driving-test-data-by-test-centre