2

I am perfoming a robust z-score analysis to remove trends of temporal evolution of a certain variable so that I can compare different years.

It is robust because there are outliers.

My data is somehow a population.

I am applying this method for each year. In each year there is a variable number of data. Some of them have 30 points. Others 500.

What is the minimum size of each year “population” for the median to be a representative statistical measure of the population?

Seiji
  • 65
  • 4

1 Answers1

1

The "ideal" population size is the sample size: In that case the data is a full census of the population, so there is no inference required, and the observed data is the population. The sample median is then the population median, since the sample is the population. The larger the divergence between the population size and the sample size, the greater the degree of inference required to understand the population measures.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • I have another question. Suppose now each of these sets of years represents an area. Let’s say that each data point represents the productivity measured in papers produced by a Biochemistry teacher of an university in a year. But there are also teachers in other subjects, represented by other sets of years with their respective productivities. Suppose that not only I want to perform a z score for each of these years, but also for the areas. Do you think it’s reasonable to compare z scores of different areas with different population sizes? – Seiji Feb 27 '19 at 03:17
  • Say, for example, comparing a year with 200 Physics data points (productivity of 200 professors) with 10 Mathematics data points – Seiji Feb 27 '19 at 03:17