Regression model for count data with "endogenously" right-censored data

Question

I have the following problem - I try to estimate a tolerance for negatives from a dataset of subjects. Notice negatives here is a count variable, i.e. it takes on values 0, 1, 2, 3...

The data I have reveals for a given dose of negatives how many negatives the subjects have consumed. The issue is that the data is what I would call "endogenously right-censored", i.e. I only observe their tolerance for the negative when they consume less negatives then the dose.

So

subject	time	dose	consumed	tolerance observed
1	sometimestamp	3	2	yes
1	sometimestamp	5	5	no
1	sometimestamp	2	0	yes
2	sometimestamp	3	3	no
3	sometimestamp	5	0	yes
4	sometimestamp	5	4	yes

I am wondering what would be a good regression model to estimate and predict this tolerance? Is there maybe a literature where these types of problems are discussed?

Why are there three rows for dose = 1?; does each row correspond to a subject? Also, what is the mechanism that determines the value of 'consumed'?; are they assigned a number to consume or something else? — psboonstra, Sep 30 '21 at 12:50
Hey: sorry - the table was incorrect. Each row corresponds to a subject. But I can have multiple rows per subject (at different points in time). Regarding the mechanism: so they get dose and then they basically can decide what to consume up to the dose, i.e. if dose=5, they can consume 0,1, 2, 3, 4 or 5. Is that helpful? — clog14, Sep 30 '21 at 13:22
the study design is unclear to me. subject 1 was assigned 3 doses and they decided to consume 2 of the 3 dose levels. And then later they were assigned 5 doses and consumed all 5? And later again they were given 2 doses and chose to consume 0? Who is in charge of determining the value of dose and how many times a subject will participate? what causes a subject to decide to consume 0 dose levels or consume all assigned dose levels? Also, it looks like the last column (tolerance observed) is equal to no if the dose column equals the consumed column. is that the definition of tolerance observed? — psboonstra, Sep 30 '21 at 17:08
hey, thanks. maybe unclear - there is also context information available which explains why one person might have difference tolerance levels. Participation on the subject side lies entirely on the subject side. Regarding the dose: the dose is actually random. regarding what causes the subject to eat the doses: they get an reward and basically need to decide if the reward is worse the dose. last question: yes. — clog14, Oct 01 '21 at 06:58

psboonstra · Accepted Answer · 2021-10-04T13:06:40.457

It seems that you are in a discrete failure time setting. Time is measured by number of doses taken by a subject (time 1 = first dose; time 2 = second dose, and so on), and you are interesting in the distribution of 'time to tolerance'. Formally, let $X$ be the number of doses consumed, and I think you are interested in estimating the survival function $Pr(X > k)$, $k=1,2,\ldots$. For subject $i$ for whom you observe a tolerance, your likelihood contribution would be $\Pr(X = x_i)$.

As you observe, some observations are censored, and you only know that $x_i > n_i$, where $n_i$ is the subject's assigned maximum dose. If you are willing to assume that the censoring process is not informative, then you can just use their likelihood contribution in estimating the model, which would be $\Pr(X > n_i)$.

There are a few things I would be concerned about, however, when it comes time to estimating the model. First, it's not clear to me that the censoring mechanism is non-informative. Do subjects know in advance what the value of dose is and, if so, is there a chance they will 'persevere' to finish all of their assigned dose levels? Second, I would want to know more about why subjects appear in the data more than once. You would want to account for the presumed correlation between observations from the same subject, to be sure, but, beyond that, why do these subjects appear in the data multiple times in the first place? Does that offer information about their tolerance?

Here are some other relevant questions on SE and external links that might be helpful as you start to fit the model.

Prediction: Discrete-Time Event History (Survival) Model in R

https://data.princeton.edu/wws509/notes/c7s6

https://link.springer.com/content/pdf/10.1007%2F0-387-34232-X.pdf

https://www.rensvandeschoot.com/tutorials/discrete-time-survival/

hey - thanks a bunch. This answer actually helped me enormously to make some progress. — clog14, Oct 27 '21 at 12:06

Regression model for count data with "endogenously" right-censored data

1 Answers1