0

I am not a Statistician and I would like to ask for help in understanding how to model the Italian Covid-19 infection data.

In particular, in Italy, every day t the number of new detected infections I(t) is reported. Thus a daily time series I(t) is available for a number of months.

For instance: Jan 1, 2021 11,000 I(Jan 1, 2021), Jan 2, 2021 10,500 I(Jan 2, 2021), Jan 3, 2021 10,850 I(Jan 3, 2021), and so on.

Each I(t) value for a generic day t is the sum of the number of people who feel sick, goes to the hospital to be tested, and result positive, plus the number of their close contacts (relatives/friends) who also test positive.

I(t) is not obtained by random sampling over the Italian population.

Also I(t) is an under estimate of the real daily infections.

Assuming that the number of True Infected people on day t, is represented by TI(t), is it possible to estimate TI(t) from I(t)?

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
  • Please don't repost your previous question: once is enough. All you have to do is edit the original. You will get more out of this site by reviewing our [help] first. – whuber Jan 22 '21 at 21:15
  • the suggestion to either edit or repost the question was given by the StackExchange crm inside the light blue 'help' window displayed together the text of the question. – Filippo Neri Jan 23 '21 at 03:51

1 Answers1

0

how to estimate epidemic real infection data from partially observable new infections?

A better question would be Can one estimate epidemic real infection data from partially observable new infections? and the answer will be 'no, you can not'. Or at least, it is not possible without any additional data or information.

The detected infections, the confirmed infections, are the tip of the icebergs. For literal icebergs we know that roughly 10% is above the water. However for the figurative covid-19 infections iceberg we do not know the fraction of the visible part. In addition, it might change from place to place and time to time.

The use of the statistics about detected infections are not useful for indicating the actual absolute number of infections, but they are useful to indicate relative changes in number of infections as function of time and space. These changes can be use to evaluate the success of interventions or to make more accurate forecasts about the future hospitalisations.

Also in this aspect (relative changes) these figures might be unreliable because the changes in the detected infections might not need to reflect changes in the actual number of infections. Other causes might be changes in measurement methods or changes in amount and composition of people that show up for testing.

A way to deal with these issues is to use additional data that allows to estimate the ratio between the reported infections and the actual infections. For instance occasional random sampling might help to estimate the ratio between reported infections and actual infections and can also tell how it changes from place to place and time to time. (or at least something with less bias may help to estimate the ratio).

In some very special cases one might have a theoretic model in which some secondary aspects of the data (e.g. variance from day to day or rate of increase) might tell something about the fraction. This is not the case for epidemiology (in astronomy there are quite a lot of different methods to measure distance, and to go from relative distance to absolute distance, some of these methods are based on theoretical considerations or semi-theoretical with some additional emperical figures), although one might make guesses about certain relationships (e.g. what happens with the relationship between reported and true cases when we change the PCR testing in some way).

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161