Background/study system:
One of my MS students is studying the biomechanics of strand breakage in Spanish moss (an epiphyte--or plant that lives on other plants). Spanish moss has strands that can grow into larger clumps of strands ("festoons")...where new strands originate from renewal shoots at each node. Old strands often form a wire support from which new strands grow. The number of a nodes depends on the age of the strand, but she has found that most plants have 7 to 13 nodes (mode = 10, counting from proximal point where it joins the "wire" to the distal end of the strand).
When a single strand is pulled, it always breaks at a node (never at the inter-node) and this appears to an important mechanism by which Spanish moss reproduces (vegetatively): a strand can break off and (if it lands in a good spot) can keep growing. For each strand, she has made cuts at the midpoint around each node and examined material properties (such as "yield strength" and "work to breakage"). We want to regress these material properties against nodal position to determine if they change along a strand.
Statistical problem:
Ultimately, I would like to set this up a a mixed-effects model with nodal position as a fixed effect and strand as a random effect.
My question pertains to how the node variable should be treated in a regression analysis. Nodal position seems to be discrete (1,2,3...13) but it has no natural zero point, which would make it difficult to interpret the y-intercept. I have seen a lot of sources that discuss "truncated" variables and left-"censored" variables, but this seems to be a slightly different problem. It could also be argued that this is a form of an integer distribution without zero, but I cannot seem to find any discussion of such a probability distribution or how one would interpret regression results for such a distribution.
Alternatives:
My original thought was to treat this as an ordinal variable. My concern is that I am losing information and that the results will be harder to interpret because there will be a separate regression coefficient for each node. Adding random effects or accounting for a nonlinear response would make the results even more complicated.
I have also considered treating nodal position as a categorical variable with simple coding or something like Helmert coding, but with this approach, you are throwing away information about the order of the nodes and you again have lots of regression coefficients.
A compromise that has been suggested is defining some measure of top-middle-end, but this also comes with some loss of information and would require some sort of control for variation in strand length.
A previous study transformed the predictor by dividing the distance to the node (from the proximal end) by the total length of the strand. This simplifies the regression analysis and controls for strand length, but I am not sure if this is a good idea biologically since nodal properties for the same proportions may not be the same if the number of nodes per strand varies...and I am not sure if this is a good idea statically since you are making a naturally discrete variable continuous and, if you transform it (as described), you end up with a disproportionate number of data points at 100%.