0

i'm trying to find predictive risk factors i already found out that young age at diagnosis is a risk factor ( binary logistic regression) But now i want to know the exact age when the risk is highest. Is there a way to find out at what age the risk is higher than other ages..

( i already tried to make age groups and compare them, but i want to know the exact age... In the end i want to end up with:

children with an age of 0-2 years have more chance to develop ... than children with an age > 2 years

Tip
  • 1
  • 1
    If you are the same user as the "Tippi" who is proposing edits to this post, then please visit https://stats.stackexchange.com/help/merging-accounts to merge your accounts. That will enable you to edit the post directly. Thank you. – whuber Mar 01 '19 at 14:09

1 Answers1

2

1/ If you treat age as a categorical variable (factor), then you can only identify the "age group" with the largest effect 2/ If you want to identify the "exact age" with the largest effect, then you will need to treat age as a continuous variable AND to allow for a non-linear relationship between age and your DV (Very important!). For example, if you specify a quadratic relationship (Y = age + age**2) it will be possible to find the optimum of the curve.

Nicolas K
  • 859
  • 7
  • 14
  • 1
    Your focusing on modeling age as a continuous variable is wise. A restricted cubic spline can be a better choice than a quadratic; straightforward modeling with restricted cubic splines can be provided by statistical packages. Splines give a flexibility that can capture more interesting non-linear behaviors than quadratics. Coefficients for quadratic terms can be difficult to interpret unless the continuous variable has first been centered to within the range of the observed values. Of course, a proper risk model will presumably include covariates besides age. – EdM Mar 01 '19 at 16:54
  • I agree with EdM comment. Eventually you could start with a model treating AGE as a categorical variable and then look at how effect changes across the different age groups - This will already tell you how (non-)linear the relationship is. If the effect happens to be linear then the answer to your initial question will be "max age" (But check for potential outliers - Might be that the effects of extreme age groups are driven by only few people) – Nicolas K Mar 01 '19 at 17:10
  • Binning a continuous predictor into categories is [seldom a good idea](https://stats.stackexchange.com/q/68834/28500), particularly to start. Arbitrary boundaries between bins can wreak havoc with interpretation. Continuous modeling with splines or polynomials is almost always a better choice. – EdM Mar 01 '19 at 17:38
  • In addition to splines and polynomials, I'd also suggest that the OP can consider non-parametric regression, for example as implemented in Stata: https://www.stata.com/new-in-stata/nonparametric-regression/ – Weiwen Ng Mar 04 '19 at 17:25