I am relatively new to statistics but am very keen to learn the best approaches with working examples as I prefer to learn this way. We are clearly living in an unprecedented time with COVID-19 and I felt this might be a suitable subject to apply myself to.
The problem
My end goal is to identify Health Boards which are outliers in terms of the mortality rate (COVID-19) for the total number of patients diagnosed with COVID-19 in a bed the health board manages. I define an outlier as a health board which has a mortality rate, associated with COVID-19, which cannot be associated with chance variability alone.
Datasets
- The total number of patients in hospitals diagnosed with COVID-19 within each health board.
- The total number of deaths in patients diagnosed with COVID-19 in hospital within each health board
- The total number of people in the tested population diagnosed with COVID-19 located in the health board.
Methodology
I have initially created a funnel plot, as suggested in "The Art of Statistics: Learning from Data" by David Speigelhalter. I have plotted the mortality rates (COVID-19 deaths per total number of deaths in patients diagnosed with COVID-19 in hospital) against the total number of deaths in patients diagnosed with COVID-19 in hospital. I have used 95% and 99.8% control limits and this clearly shows "outliers". It is clear that part of this is due to increased infection rates in particular health board areas which is where the 3rd dataset comes in. This is also where I need guidance...
Question
I have performed multiple linear regression on these two independent variables with the total number of deaths in patients diagnosed with COVID-19 in hospital within each health board as the dependent. This shows a promising correlation with strong p-values. Is using the residuals a suitable method for identifying significant outliers or am I doing the wrong step here to answer my question?