I'm trying to evaluate whether a measure is improving with time.
This broadly reflects my real life data...
df <- data.frame(ch = rep(1:10, each = 12), # care home id
month_id = rep(1:12), # month using the system over the course of a year (1 = first month, 2 = second month...etc.)
totaladministrations = rbinom(n=120, size = 1000, prob = 0.6), # administrations that were scheduled to have been given in the month
missed = rbinom(n=120, size = 20, prob = 0.8), # administrations that weren't given in the month (these are bad!)
beds = rep(rbinom(n = 10, size = 60, prob = 0.6), each = 12), # number of beds in the care home
rating = rep(rbinom(n= 10, size = 4, prob = 0.5), each = 12)) # latest inspection rating (1. Inadequate, 2. Requires Improving, 3. Good, 4 Outstanding)
# Summary measures
df$missed_pct <- df$missed / df$totaladministrations * 100 # missed meds as a percentage of all scheduled administrations
df$missed_dm <- df$missed / 30.416 # missed meds daily mean for the month
# classifications
df$ch_size <- car::recode(df$beds, "lo:29 = 1; 30:36 = 2; 37:hi = 3", as.factor = TRUE)
str(df)
> str(df)
'data.frame': 120 obs. of 9 variables:
$ ch : int 1 1 1 1 1 1 1 1 1 1 ...
$ month_id : int 1 2 3 4 5 6 7 8 9 10 ...
$ totaladministrations: int 555 586 607 604 598 597 588 585 573 570 ...
$ missed : int 14 16 18 15 16 15 15 19 17 14 ...
$ beds : int 36 36 36 36 36 36 36 36 36 36 ...
$ rating : int 1 1 1 1 1 1 1 1 1 1 ...
$ missed_pct : num 2.52 2.73 2.97 2.48 2.68 ...
$ missed_dm : num 0.46 0.526 0.592 0.493 0.526 ...
$ ch_size : Factor w/ 2 levels "2","3": 1 1 1 1 1 1 1 1 1 1 ...
head(df)
> head(df)
ch month_id totaladministrations missed beds rating missed_pct missed_dm ch_size
1 1 1 555 14 36 1 2.522523 0.4602841 2
2 1 2 586 16 36 1 2.730375 0.5260389 2
3 1 3 607 18 36 1 2.965404 0.5917938 2
4 1 4 604 15 36 1 2.483444 0.4931615 2
5 1 5 598 16 36 1 2.675585 0.5260389 2
6 1 6 597 15 36 1 2.512563 0.4931615 2
tail(df)
> tail(df)
ch month_id totaladministrations missed beds rating missed_pct missed_dm ch_size
115 10 7 575 15 31 0 2.608696 0.4931615 2
116 10 8 590 18 31 0 3.050847 0.5917938 2
117 10 9 590 18 31 0 3.050847 0.5917938 2
118 10 10 578 14 31 0 2.422145 0.4602841 2
119 10 11 590 19 31 0 3.220339 0.6246712 2
120 10 12 567 18 31 0 3.174603 0.5917938 2
Here, care home 1 was scheduled to give 574 drug administrations in month 1, and missed 13 of them (i.e. 13 failures and 561 successes). There are 37 beds in this care home meaning its size is 3 (i.e. larger than 2 and 1) and was rated 3 at the last inspection (3 being 'good').
$beds
is a mostly constant figure. This is the number of beds a care home has registered with the authorities to provide care for. It does not reflect the daily occupancy rate (i.e. there may be less residents in the care home on a given day). It can change but doesn't tend to do so that much so for the purpose of this I'm happy to treat it as a fixed effect. $beds
determines $ch_size
.
$rating
is given by the authority after an inspection, and is technically the 'most recent rating' where 1 = 'Inadequate', 2 = 'Requires Improving', 3 = 'Good' and 4 = 'Outstanding'. To me this is a categorical ordinal variable and can be treated as such in a model. Inspections are carried out at intervals ranging from every few months to every two years (frequency is determined by the last rating, where lower rated care homes are inspected more frequently). I'm happy to treat this as a fixed effect for now to keep it simple.
My real life data has about 700 care homes. The variance in sample data posted here probably doesn't reflect real life variance (because I'm newish to R and don't know how to do that) but hopefully it doesn't need to for this question as I'm interested in model construction rather than interpretation.
I want to see if missed medications improve (i.e. reduce) over time and whether this is affected by its size and/or rating.
I need help with:
- Which is the better response measure to use:
missed
(Number of Missed Medications),missed_pct
(ratio missed/total, expressed as a percentage), ormissed_dm
(daily mean)?
Bonus points awarded for which models to subsequently use.
Note: I have excluded individual resident data for this project because of resource constraints and I wanted to keep it simple and do what I can with the summary count data. There may well be underlying factors also affecting missed medications, such as the ratio between number of care givers and number of care receivers on a given shift ... but that's a question for another day / project.