why there is no “error” term in survival analysis?

Question

Where is the error term behind the following model:

$$h_i(t) = h_0(t) \exp \left ( \sum_{k = 1}^p \beta_k z_{ik} \right )$$

Apart from linear model based on normal distribution, I did not see any other model has error term. — user158565, Jul 31 '19 at 20:24
the logistic model does not have a Gaussian distribution (normal distribution) but But it has the error term. — masoud Zarei, Jul 31 '19 at 20:55
chapter 14 , the book Applied linear statistical model , author Kutner — masoud Zarei, Jul 31 '19 at 21:08
When I have time, I will write a long Answer to your Question to cover how to write statistical model. — user158565, Jul 31 '19 at 21:11
See my answer [here](https://stats.stackexchange.com/questions/403672/how-can-i-test-for-autocorrelated-errors-in-logistic-regression/405105#405105) and its links for the corresponding situation with glm's — kjetil b halvorsen, Aug 10 '19 at 08:10
Whether you can find some models with an (implicit or otherwise) error term doesn't alter the fact that most models don't include an explicit error term (though in some cases you might write one - say as a latent variable - it's not usually necessary to the model). The common element of a statistical model is some kind of random variable, but it isn't necessarily present as an explicit error term. Most typically when modelling some variable as a function of others, you'll be dealing with conditional distributions of the response, such as conditional distributions of survival time. — Glen_b, Sep 01 '19 at 01:07

score 9 · Answer 1 · answered Jul 31 '19 at 20:29

The distributional assumptions behind a relative risk model are hidden in the baseline hazard function $h_0(t)$. If you specify a form for this function, then you completely specify the distribution of your data.

For example, $h_0(t) = \phi \psi t^{\phi - 1}$ corresponds to the Weibull distribution.

AdamO · Answer 2 · 2019-08-01T17:13:59.533

There absolutely is an "error" in survival analysis.

You can define the "time to event" according to a probability model with some $$g(T) = b (X, t) + \epsilon(X,t)$$

where $g$ would usually be something like a log transform. Of course requiring $\epsilon$ to be normal, identically distributed, or even stationary is a rather strong assumption that just doesn't play out in real life. But if we allow $\epsilon$ to be quite general, the Cox proportional hazard model is a special case of the above display. Is this an abuse of notation? Maybe. Note we are mpt guaranteed any of the desirable properties of independence between the parameters. But if we think carefully about what an error is, it's not that it doesn't exist, it's just not a helpful notation to facilitate scientific investigation.

This "fully parametric" approach can be very efficient when it's true. A fully parametric "Weibull" model is actually a lot like a linear regression model for survival data, where the scale parameter is a lot like an error variance (dispersion parameter)

You could predict survival time for a given subject, subtract that from observed survival time, and this "residual" can be flexibly modeled using semiparametric splines to describe the distribution and mean-variance relationship. More commonly, we use the difference of predicted and observed cumulative hazard (Schoenfeld) residuals and their theoretical basis to infer the appropriateness of the proportional hazards assumption.

Theoretically, $\hat{S}^{-1}(T) \sim B(0,1)$. That is, the survival times under a quantile-transform, follow a stationary Brownian Bridge. So there is a relation between the probability model and a fundamentally random process. One could inspect diagnostic plots to assess the adequacy of $\hat{S}$ as an estimator of $S$.

You're right, (like Cox-Snell residuals, martingale residuals),We predict in the statistics so there is an error term. can it be said that survival analysis itself is random implicit? — masoud Zarei, Jul 31 '19 at 21:34
@masoud a) "random implicit" is not well defined. b) if you're thinking of the Cox model specifically, post-estimating the baseline hazard function, you can predict a survivor function for each individual and use that to "predict" many useful things like restricted-mean survival, median survival. c) with fully parametric survival models (like Weibull) you can predict mean survival. — AdamO, Aug 01 '19 at 15:16
I was wrong about Survival Analysis "error term", but I found with your explanation and friends, Thank you very much, dear friend, from your Manufacturer Comments. — masoud Zarei, Aug 01 '19 at 21:21

masoud Zarei · Answer 3 · 2019-08-03T12:53:23.143

Simple Linear Regression Model

\begin{equation} Y_i=B_0+B_1 X_i+ε_i \end{equation}

Where

$Y_i$ is the value of the response variable in the ith trial

$ε_i $ is a random error term with mean $E[ε_i]=0$ and variance $σ^2 [ε_i ]=σ^2$

\begin{equation} E[Y_i ]=B_0+B_1 X_i \end{equation}

Consider the simple linear regression model

\begin{equation} Y_i=B_0+B_1 X_i+ε_i\\ Y_i=0,1 \end{equation}

Where the outcome $Y_i$ is binary, taking on the value of either 0 or 1. The expected response $E[Y_i]$ has a special meaning in this case. Since $E[ε_i]=0$ we have:

\begin{equation} E[Y_i ]=B_0+B_1 X_i \end{equation} Consider $Y_i$ to be a Bernoulli random variable for which we can state the probability distribution as follows:

\begin{equation} P(Y_i=1)=π_i \end{equation} \begin{equation} P(Y_i=0)=1-π_i \end{equation}

\begin{equation} E[Y_i ]=B_0+B_1 X_i= π_i \end{equation}

Simple Logistic Regression Model

First, we require a formal statement of the simple logistic regression model. Recall that when the response variable is binary, taking on the value 1 and o with probabilities π and 1-π , respectively, Y is a bernoulli random variable with parameter $E[Y]=π$. We could state the the simple logistic regression model in model the following fashion:

$Y_i$ are independent Bernoulli random variable with expected Value $E[Y_i ] =π_i$ , where:

\begin{equation} E[Y_i ] =π_i= exp( B_0+B_1 X_i)/(1+exp(B_0+B_1 X_i)) \end{equation}

Poisson Distribution

\begin{equation} f(Y)=(μ^Y exp(-μ))/Y! \end{equation}

$E[Y]=μ$

$σ^2 [Y]=μ$

Poisson Regression Model

The poisson regression model, Like any nonlinear regression medol, can be stated as follows:

\begin{equation} Y_i=E[Y_i ]+ε_i \\i=1,2,…..,n \end{equation}

The mean response for the $i$th case, to be denoted now by $μ_i$ for simplicity, is assumed as always to be a function of the set of predictor variables ,$ X_1,…..,X_(p-1)$. We use the notation $μ$($X_i$,$B$) to denote the function that relates the mean response $μ_i$ to $X_i$, the values of the predictor variable for case $i$ , and B, the values of the regression cofficients. Some commonly used functions for poisson regression are:

\begin{equation} μ_i= μ(X_i,B)=(X_i,B) \end{equation}

\begin{equation} μ_i= μ(X_i,B)=exp(X_i,B) \end{equation}

\begin{equation} μ_i= μ(X_i,B)=log_e(X_i,B) \end{equation}

That , this models called Generalized Linear Model (GLM).

Survival analysis

Consider an AFT model with one predictor X. The model can be expressed on the log scale as: \begin{equation} log (T)= a_0+a_1 X+ε \end{equation}

Where $ε$ is a random error following some distribution.

T (Exponential, Weibull, Log-logistic and Lognormal )

log (T) (Extreme value, Extreme value, Logistic and Normal)

but cox proportional hazard model, The distributional assumptions behind hidden in the baseline hazard function $h_0 (t)$

Good. Seems you realized that there is no $\epsilon$ in logistic regression. — user158565, Aug 03 '19 at 21:46
I mean was the total model, like simple linear regression. you say that the expected value is not error trem, this right. — masoud Zarei, Aug 04 '19 at 19:35

user158565 · Answer 4 · 2019-08-03T21:42:18.353

This Answer is limited to frequentist statistics and statistical model without random effect.

In fact, the statistical modeling is to find the conditional distribution of response variable conditional on fixed values of the covariates, i.e., distribution of $Y|X=x$. When writing the statistical model, abiding following 3 steps will keep you from mathematical mistakes.

Find the form of the distribution of $Y|X=x$.
List the parameters that determine the distribution.
Write down how the covariate determine the parameters through the unknown constant parameters.

Example 1: Subject = 5-16 year old boys (indexed by $i$), response variable $Y$ = height, Covariate $X$ = age.

E1-1: Distribution form: $Y_i|X_i \sim Normal$

E1-2: Parameters for normal: mean $\mu_i$ and variance $\sigma_i^2$

E1-3: Functions for parameters: $\mu_i = \mu_0+\beta X_i$ and $\sigma_i^2=\sigma^2$

It is the same as $Y_i = \mu_0 +\beta X_i +\epsilon_i$ and $\epsilon_i \sim N(0,\sigma^2)$

Example 2: Subject = Men older than 65 years (indexed by $i$), response variable ($Y$) = death or alive in the next full year, covariate ($X$) = age.

E2-1: Distribution form: $Y_i=\text {death}|X_i$ follows Bernoulli with parameter $\pi_i$.

E2-2: Parameter for Bernoulli: $\pi_i$, the probability that i-th person dies in next year

E2-3: Function of parameters: $\pi_i = \frac{e^{\beta_0+\beta_1X_i}}{1+e^{\beta_0+\beta_1X_i}}$ or $log(\frac {\pi_i}{1-\pi_i}) = \beta_0+\beta_1X_i$

It is logistic regression and no $\epsilon$ after $\beta_0+\beta_1X_i$.

Example 3: 10 minuets at a specific street from 6:00 am to 9:00 am, $Y_i=$ # of cars passed the street at specific place, $X=int((\text{begining time - 6:00 in minuets})/10)$

E3-1: Distribution: Poisson

E3-2: Parameter: $\lambda_i$

E3-3: Function of parameter: $\lambda_i = e^{\beta_0+\beta_1X_i}$ or $log(\lambda_i)=\beta_0+\beta_1X_i$

It is Poisson regression and also no $\epsilon$ after $\beta_0+\beta_1X_i$.

OP's question:

Distribution: Any probability distributions belong to proportional hazard family

Parameters: Depend on distribution, but we do not need to know because of proportional hazard assumption.

Function of parameter: $$h_i(t) = h_0(t) \exp \left ( \sum_{k = 1}^p \beta_k z_{ik} \right )$$ Need to know the fact that hazard determines the probability distribution of the survival time.

Obviously, there is no position for $\epsilon$.

If you still do not believe there is no $\epsilon$ in logistic, Poisson and Cox proportional hazard model, you can consider following two questions.

In the linear model, $\epsilon$ appears in the process of model establishment. In the final conclusions, we can and need to estimate the variance of $\epsilon$. We also can estimate $\epsilon$ itself by $Y-\hat Y$. We also know that $Var(\hat \beta = (X'X)^{-1}\sigma$.

In other 3 kinds of models, if you insist there is $\epsilon$, why it does not appear in the process of model establishment? What is the effect of $\epsilon$ on the model? Did and could we estimate anything related to $\epsilon$?

So if you insist there is $\epsilon$ in that 3 kind models, then $\epsilon$ acts like a ghost, when you want it, it would appear,; when you do not it, it would disappear. But in mathematical statistics, this kind of ghost is not allowed in the model.

You may ask why it is acceptable that baseline hazard function $\lambda_0(t)$ also appears in the model specification and disappears in the model fitting process and final results. The reason is in the process of model establishment, the $\lambda_0(t)$ is cancelled under the assumption of proportional hazard. If you really interesting in $\lambda_0(t)$, you can get its estimate, $do not like $\epsilon$ which cannot be estimated.

Why linear model $Y\sim N(X\beta, \sigma^2)$ has an alternative expression $Y=X\beta + \epsilon$, $\epsilon \sim N(0,\sigma^2)$, and other models do not have alternative expression with $\epsilon$?

(will continue)

But the logistic model does have an expression with $\epsilon$: $Y = \mathbb{1}\{X\beta + \epsilon > 0\}$ with $\epsilon$ logistically distributed. The scale parameter of $\epsilon$ is not separately identified from $\beta$ in this model, and thus set to some convenient constant. This formulation of the model is standard in economics and allows for a unified treatment of logit, probit, multionomial logit etc — CloseToC, Aug 03 '19 at 23:33

why there is no “error” term in survival analysis?

4 Answers4