I think a reasonable model that you could still estimate from this data is that the number of kids on a school is a percentage of the number of kids in the neighborhood. This percentage might have a trend, so perhaps including a linear trend for that is a good try.
#kids ~ binom(n = #pop, p = inverse_logit(a_school + b_school * (#year)))
This is a timeseries model in that time is involved, but I wouldn't be looking at AR, MA or ARIMA for this if that's your idea. Something that is not in this model is that schools might have a maximum number of kids allowed each year. If there is such system, it will influence the numbers and the predictions of course. Also, in this simple model, you assume each kid has an independent chance of picking this school, but they might not be independent so you might see overdispersion. Also, in this simple model, you are assuming the school only gets kids from the neighborhood, but if there is schoolgoing happening over larger distances, it's not included in the model. Whether these things will actually affect the quality of your predictions remains to be seen.
Now if there's important structural things going on, like a neighborhood expansion, or the opening or closing of a school nearby, these things are fit (badly probably) with the trend term, so if you know something about this, you should definitely include that, perhaps with a indicator term. Let's say you know the neighborhood in 2017 is twice the size of the neighborhood in 2016. Then a good bet will be a doubling of the number of kids on the school, and no model, be it ARIMA or linear or neural network whatever, will see it coming based on the numbers up to that year.
Your question is about long format. The model you are describing is
#kids ~ norm(a + b_neighborhood * neighborhood + b_population * population, sigma)
I think the coefficient of the population should vary by school. Some schools have a large percentage of the kids in a neighborhood, some only a small number. So a single coefficient here is probably not optimal. Also normal errors is not apriopriate, allthough it could work fine.
The model you are describing, by the way, unless I'm missing something, is not a hierarchical model as commonly understood by statisticians. That term is used for a model with random effects, see https://en.wikipedia.org/wiki/Mixed_model. Including random effects in your model can be very beneficial as well, depending on the correlations between the schools.