Calculate spline terms of a logistic regression using published knots and formula

Question

I try to calculate spline terms of a logistic regression to generate a linear predictor/ prediction formula for the model "Lymph Node Involvement (Cores)"

The source (https://www.mskcc.org/nomograms/prostate/pre_op/coefficients) states the forumula to calculate the spline terms sp1var and sp2var as follows:

I tried to calculate sp1var and sp2var using the published knots in R:

var=10 # PSA = 10 for example
knot1= 0.2
knot2=4.7
knot3=7.2
knot4=96.53

sp1var <- max (var -knot1)^3 - max(var-knot3)^3 * ((knot4 - knot1)/(knot4-knot3)) + max(var - knot4)^3 * ((knot3 - knot1) / (knot4-knot3))


sp2var <- max (var -knot2)^3 - max(var-knot3)^3 * ((knot3 - knot2)/(knot4-knot3)) + max(var - knot4)^3 * ((knot3 - knot2) / (knot4-knot3))

however, If I calculate the probability (according to Make prediction equation from logistic regression coefficients), I get a wrong result:

# define the Intercept
Intercept = -5.37368223


# define the Coefficients
cAGE = 0.00906354
cPSA = 0.21239809
cPSAs1 =-0.00132481
cPSAs2 = 0.00356913
cGLE = 3.03232465 #for gleason grade 5
cCLI = 0.71055042 #for clinical stage 3+
cPOS = 0.05499551 # no. of positive cores
cNEG = -0.11987793 # no. of negative cores

# define predictors
PSA= 10
age=50
npos=10 # no. of positive cores
nneg=10 # no. of negative cores

# calculate the probability
z = Intercept + age * cAGE + PSA * cPSA + sp1var * cPSAs1 + sp2var * cPSAs2 + cGLE + cCLI + npos *cPOS + nneg * cNEG

exp(z)/(1 + exp (z))

# result : 0.8962046

# expected: 0.39 (https://www.mskcc.org/nomograms/prostate/pre_op)

Do I misinterpret the stated formulas?

Sextus Empiricus · Accepted Answer · 2020-10-12T14:55:54.920

1

Max function needs to include 0

You should use the max function like

max(var-knot1, 0)

instead of

max(var-knot1)

Typo in the function

You need to use the 1st line elow instead of the 2nd line. (there is a difference in using knot3 vs knot4)

sp2var <- max (var -knot2)^3 - max(var-knot3)^3 * ((knot4 - knot2)/(knot4-knot3)) + max(var - knot4)^3 * ((knot3 - knot2) / (knot4-knot3))
sp2var <- max (var -knot2)^3 - max(var-knot3)^3 * ((knot3 - knot2)/(knot4-knot3)) + max(var - knot4)^3 * ((knot3 - knot2) / (knot4-knot3))

When you use this then the result will be the same.

This type of use of the maximum function means effectively

$$\max(x,0) = \begin{cases} x & \quad \text{if} \quad x\geq0 \\ 0 & \quad \text{if} \quad x<0 \end{cases}$$

and is a way to get these splines defined as a function of piecewise polynomials.

edited Oct 12 '20 at 14:55

answered Oct 12 '20 at 07:40

Sextus Empiricus

43,080
1
72
161

thank you very much, this is of great help. How could the probability difference be explained? Could it be because of a different rounding definition? – captcoma Oct 12 '20 at 07:53
1

@captcoma Those coefficients have a high precision, so a roundoff error seems unlikely to me. It might be possible that their online model uses slightly different coefficients. Either because the coefficients are newer or older, or because there is some error. – Sextus Empiricus Oct 12 '20 at 08:45
I adjusted max as described and it works for the example given above, therefore your post answered my question. However, with the same setting, but PSA=20, I get 0.99 (expected 0.56). Is there anything else that I am missing? Could this be conneced to the difference described above? – captcoma Oct 12 '20 at 11:43
1

@captcoma there was also an additional typo in your equation for sp2var – Sextus Empiricus Oct 12 '20 at 14:56

Calculate spline terms of a logistic regression using published knots and formula

1 Answers1

Max function needs to include 0

Typo in the function

Linked