In my case the “treatment” is not binary, so do I add the airbnb-effect through adding it like a control variable or do I have to multiply it with "treat" and "time" in the interaction term? (“beta3(treat x time x logairbnb)”)
Your measure of intensity can replace your treatment indicator. Multiply it with your post-treatment variable. Here is the basic specification using explicit variable notation:
$$
\text{Revenue}_{ct} = \alpha + \gamma \text{Intensity}_{ct} + \lambda \text{Post}_{t} + \delta \left(\text{Intensity}_{ct} \times \text{Post}_{t} \right) + X'_{ct}\beta + \epsilon_{ct},
$$
where $\text{Intensity}_{ct}$ denotes a treated city's dosage (i.e., Airbnb supply). $\text{Post}_{t}$ is a post-treatment time dummy equal to 1 in all quarters after treatment commences in both treatment and control groups. Unless $\text{Intensity}_{ct}$ is time-invariant (i.e., varies across groups but not over time), you should add your estimate of intensity to your interaction term (i.e., $\gamma + \delta$) and you have your treatment effect.
I should note that only your treated cities should undergo some level of intensity, while your control group should represent the absence of intensity. For example, your intensity variable should equal 0 for Bremen and only take on positive values for Berlin. Thus, it can replace the binary treatment indicator. This is important, as your intensity variable should reflect reality as closely as possible. In your setting, you're assessing the dosage of Airbnb supply over time and across cities. Treatment can also represent a simple jump in intensity without any variation over time, in which case your variable would only be $c$-subscripted (e.g., $\text{Intensity}_{c}$). Either way, it should match the intensity experience in treated cities.
I encourage you to peruse the following article by Green and colleagues (2014). They investigated the effects of new legislation liberalizing the closing times of local bars on traffic accidents in England and Wales. Their main analysis employed the classical difference-in-differences equation with a dichotomous treatment. Later, they replaced the main treatment indicator with a dose treatment, which was a measure of the number of extended licenses within the local jurisdiction (see Table 5, p. 196). The only difference between their study and yours is their dose was time-invariant (i.e., varied across jurisdictions but not over time), and thus their main effect for intensity (dose) would be dropped if estimated via fixed effects. It wouldn't matter if the intensity variable was absorbed though, as only the interaction term is of substantive interest.
(2) I think I have to control for seasonality. Do I have to add a term in the form of “treat_i * quarter_t” to account for the difference of these two cities?
Not necessarily. Do you observe any cyclical patterns in the raw data? Again, the foregoing paper is a great resource (see Table 1 , column 4). In addition to assessing a does treatment, you could also multiply your city (group) effect with quarter dummies.
I would graphically inspect the trends in the raw data to see how your outcome is evolving through time. You could most certainly incorporate some categorical measure of "season" into your model with two or possibly four levels. Review the discussion section of this post. Some of the more experience members offer some great insight into more complicated ways of modeling time.
(3) If the two cities have different trends from start, how do I implement a city specific (quadratic) time trend to not violate the trend assumption?
How much of a divergence are you observing? I wouldn't recommend framing a pre-treatment difference in trend as a mere statistical problem that you must overcome. Your earlier comments indicate a stable inter-temporal evolution of the group trends. Maybe a statistical adjustment doesn't necessarily need to be a component of your main specification. That being said, it is worthwhile to see if your estimate of a treatment effect holds after the inclusion of city-specific linear (quadratic) time trends. In your case, this would amount to multiplying a city effect with a continuous linear and quadratic time trend variable. A linear trend might be more than enough, but that is for you to decide. Here is one specification:
$$
\text{Revenue}_{ct} = \alpha_{0c} + \alpha_{1c}t + \alpha_{2c}t^{2} + \gamma \text{Intensity}_{ct} + \lambda \text{Post}_{t} + \delta \left(\text{Intensity}_{ct}
\times \text{Post}_{t} \right) + X'_{ct}\beta + \epsilon_{ct},
$$
where $\alpha_{0c}$ represents city fixed effects. I should note that with two cities, city fixed effects is equivalent to including a simple treatment indicator equal to 1 for Berlin, 0 otherwise. To add city-specific linear and quadratic time trends, multiply the city effect with continuous and quadratic time trend variables, separately. Don't go crazy, though. I wouldn't advise estimating this in one big fat equation. Try the main specification without the time trends, then build upon the base model. They can serve as a good robustness check down the road.
Now suppose you acquired data on multiple cities around the globe and the timing of treatment varied across jurisdictions. This equation can generalize to the following model, which is more closely aligned with what I am seeing in the paper you referenced:
$$
\text{Revenue}_{ct} = \alpha_{0c} + \alpha_{1c}t + \alpha_{2c}t^{2} + \lambda_{t} + \delta \text{Airbnb}_{ct} + X'_{ct}\beta + \epsilon_{ct},
$$
where you regress $\text{Revenue}_{ct}$ on a series of $C - 1$ dummies for cities (i.e., $\alpha_{0c}$), a series of $T - 1$ dummies for quarters (i.e., $\lambda_{t}$), and your intensity measure (i.e., $\text{Airbnb}_{ct}$). Two cities results in 1 city effect; 9 years (36 quarters) should result in 35 separate quarter effects. The city and quarter effects replace your main effects in the first specification, respectively. Your interaction term is now implicit in the coding of $\text{Airbnb}_{ct}$. To make this clear, $\text{Airbnb}_{ct} = \text{Intensity}_{ct} \times \text{After}_{t}$. You could instantiate this variable manually before tossing it into the model. I specified it explicitly to show how it is coded. Again, it still represents your interaction term for earlier. In essence, your measure of Airbnb supply should equal its precise dosage if it is a treated city and it is in the quarters after treatment goes into effect, 0 otherwise. I could have used the variable $\text{Post}_{t}$, but this equation is used more generally in settings where treatment may start and end at difference times in different cities, and thus "post-treatment" isn't standardized across jurisdictions. Again, your intensity measure should reflect reality as closely as possible.
It is rare to find a difference-in-differences application where all entities receive a dosage. In cases with a dichotomous treatment, the variable of interest should equal 1 if a city is treated and is in a post-treatment exposure epoch, 0 otherwise. In settings involving dosage, you should replace any city-quarter combination equal to unity with its appropriate dosage, 0 otherwise. Airbnb hit the market in 2014 in both cities, to which all cities experienced some jump in intensity in the last quarter of 2018. In sum, I would believe we need some explicit method of disambiguating the exposed from the unexposed. If Bremen, for example, experienced a low dosage post-shock, then you would be comparing cities with low versus high market penetration. Just be explicit about what variation you are trying to exploit. Do you know of any cities without any market penetration? This might not be a concern, but I would also consult with people in your specific field to gain further insight.
I was able to get my hands on the un-gated copy of a paper by Acemoglu and colleagues (2004) which assessed cross-state mobilization rates of men during World War II and its impact on female labor supply. Their "interaction" (see equation 8, p. 521) estimates whether states with higher mobilization rates (i.e., high versus low-mobilization states) during World War II saw a stronger rise in females' weeks worked from 1940 to 1950. Note: $m_{s}$ in equation 8 is a state's "mobilization rate" (i.e., continuous treatment); it varies across each state. They found higher mobilization rates were associated with an increase in female labor market participation. Peruse the top answer here for a more in-depth appraisal of this model.
As a final concern, I worry you have too few degrees of freedom to investigate more complex models. In a setting with 2 cities observed over 9 years, you only have 72 city-quarter observations. Including city-specific time trends is already very econometrically demanding, so don't overdue it. Moreover, you've said nothing thus far about the inclusion of covariates, so be careful as your ratio of observations-to-parameters is already scanty.
The use of difference-in-differences with dose treatments is becoming quite popular. In addition to the aforementioned paper, this article by Pedraja-Chaparro and colleagues (2015) would likely interest you. They use the classical difference-in-differences equation where treatment impacts all units at the same time. For other use cases of dose treatments, review page 18 this dissertation or this working paper.