You could treat this as a recurrent-event situation, in which each individual "consumption" represents a separate event.* Or you could treat this as a count-based model, modeling the number of events in each observation-time period. The best choice depends on the nature of your data.
First, make sure that your data adequately represent the information you have. For example, I can't tell whether Customer 2 was last seen at time period 10, or if a longer period of time has elapsed but there hasn't been any consumption since time period 10. You data set needs to keep track of the total elapsed time for each individual, even times without consumption.
Then look hard at the patterns of cumulative consumption over time, which will tend to smooth out the variability among time periods. Do that for a large number of individual customers. What types of patterns do you see? Does cumulative consumption tend to plateau at long times? If so, then it might make sense to think about a finite "Customer Lifetime." Alternatively, does cumulative consumption tend to keep rising over time? In that case there might not be a well-defined "Customer Lifetime" from your perspective; it might make sense just to estimate a mean rate of activity instead.
The way to proceed thereafter depends on the activity patterns that you see. For a continually, albeit randomly, rising cumulative consumption, a Poisson or negative binomial model model might work for estimating rates, with customers treated as random effects. Each customer would then having a characteristic underlying activity rate, with a distribution of rates among customers. That's a pretty standard type of generalized linear model. You would model the counts per time period, potentially using time period itself as a predictor to see if rates are systematically changing over time.
If such a model fits your data adequately, then for a new customer you could try to estimate the rate from initial behavior.
If cumulative consumption does tend to plateau in time, you could use a recurrent-events survival model that takes into account the multiple customers. This type of model would also have to incorporate the censoring in time of your observations. For example, if it's been only 10 time periods since Customer 2 entered your data set, that Customer doesn't provide any information on customer behavior beyond 10 time periods. You don't know what her future consumption might be, you can't just assume there would be no further activity. Survival models take that into account.
If you have information about the customers besides their order histories, then your models could use them as covariates to potentially improve predictions for individuals.
In response to comments:
this (also) implies every customer in the training dataset has to be observed for the same n periods, right?
No. Individuals can be observed over different periods of time. You set separate time = 0
references for each individual, typically the time that the individual first entered your data set. Then, for initial assessment of your data, you plot cumulative event numbers for each individual as a function of time relative to that individual's starting time. Plots for some individuals will just go out to longer times than others.
Whether you model this from a consumption-rate perspective or a "Customer Lifetime Value" perspective, you can use whatever information you have. For example, if you are estimating rates per time period in a mixed model, you use information on the patients that you have for each time period. If you are modeling total counts, you can take the total observation time for an individual into account with an offset term in a regression model. A survival-analysis recurrent-events approach naturally takes the "censoring" at the last observation time for an individual into account.
in this recurrent-events survival models why does cumulative consumption have to plateau with time.
It doesn't. You need to find out whether that's the case before you can decide how to model. If cumulative consumption keeps on going up indefinitely for individuals, then there is no sign of a finite "customer lifetime" and you need to focus instead on the consumption rate and whether the rate has any patterns as a function of time. If there is a plateau in cumulative consumption, then there might be a finite "customer lifetime" that could be modeled in your wish to estimate a "Customer Lifetime Value."
I was having troubles from the literature I saw to understand how to incorporate this intensity dimension
The way you presented your situation, it seems that the "consumption" can be modeled as counts of events. For example, that could be modeling clicks on ads on a web site, with each click representing a unit of "consumption." Each "consumption" event is essentially the same, but an individual can have multiple such events within a time period.**
For a point process, the instantaneous underlying event rate is actually called the "intensity." From that perspective, count-based models inherently model intensity. How best to do that depends on the nature of your data: whether you should model different customers as having different but individually constant baseline intensities, or whether you need to model intensities as a function of time (including time as a predictor in your model, in a form suggested by your knowledge of the subject matter, or in a flexible form like a spline).
where you suggest the Poisson or negative binomial model - could you maybe cite a reference here where this is discussed in a (somewhat?) similar context?
Once you know the terms to search for, finding references becomes a bit easier. That can help whether you are analyzing this from a survival/recurrent events perspective or from a point-process/count-based perspective.
For identifying references that might help, you can think about each of your repeated "consumption" events as being analogous to repeated asthma attacks, repeated hospital admissions, etc., in the medical literature. Or you can think about count data over time that are less event-based, like counts of a type of cell in the blood of patients at successive clinical visits, or counts of RNA molecules of different types within individuals over time. The choice depends again on the nature of your data.
As noted in a revised part of the answer above, if you are modeling counts per time period you could have a fairly standard generalized linear mixed model based on an underlying Poisson or negative binomial process. The standard lme4 package in R provides tools for both. There is a wealth of information readily available about how to use those tools.
A DuckDuckGo search on recurrent event
recently turned up many freely available reviews. Yadav et al. provide an "Overview"; Thomsen et al. illustrate approaches on a particular data set; Reliawiki has nice illustrations of cumulative event plots; Amorim and Cai provide a tutorial emphasizing epidemiology; Rogers has a nice overview in a slide deck.
A search on negative binomial point process mixed model
covers many aspects of your situation from the point-process/count perspective. The mixed model
term allows for taking differences among individuals into account efficiently. The negative binomial
term allows for the variance of counts to be something other than equal to the number of counts that is required by a Poisson model, something that often is needed in practice. That search turned up a paper on modeling CD4 cell counts over time in patients, one on modeling tree regrowth following fires, and one on RNA-seq counts over time in individuals.
*For a recurrent-event survival-type approach, it might be simpler to use the times of individual events rather than grouping events into time-period bins as are displayed here.
*If the natures of events can differ, then you have a multi-state recurrent events scenario. If individual events have different magnitudes of "consumption," then I think the things get more complicated if you can't easily fit them into a multi-state model (e.g., into "small," "medium," and "large" events). There is an R package PtProcess that's used for seismology, a field in which point processes differ continuously in magnitude, and might be useful (although I have no experience with it).