In most difference-in-differences applications, the treatment is permanent. By "permanent" I mean once the treatment 'turns on' (i.e., switches from 0 to 1) it stays on (i.e., no reversals) until the panel ends. Other times, the treatment is removed (i.e., reverses) after a finite exposure period but never switches back on. On other occasions, though less common, the treatment is pulsating 'on' and 'off' repeatedly over time. In each case, you may or may not have units that never become exposed to the treatment. And, well, other times units may be observed in the treatment condition and may or may not undergo a withdrawal phase; this is often the case where the panel is unbalanced. In your setting, you may have a mixture of some, or all, of these conditions.
In my opinion, it is often a safe choice to select controls with a similar treatment history before each exposure. Imagine a setting where country A and country B introduce a national policy (i.e., treatment) at time $t$ for the purposes of stimulating economic growth, say in the year 2000. At the conclusion of 2002 country A's policy is withdrawn (i.e., treatment reversal) while country B's policy is permanent. Now suppose the policy is reintroduced in country A in 2006 but the exposure phase is transient, thus country A eventually undergoes a second reversal. In this setting, country A has multiple sunrise/sunset phases. Note how under a generalized difference-in-differences framework, country B is serving as a counterfactual for country A before the second exposure. But suppose you observe heterogeneous effects in country B downstream, say after 2000 but before 2006. Is country's B's post-period a suitable counterfactual for country A's pre-period before the onset of the second exposure? As derived in papers you cite, generalized difference-in-differences estimates are weighted averages of individualistic treatment effects where some of the weights may be negative. New econometric routines may impose uniform weights, but not all can accommodate reversals—at least not yet (e.g., Goodman-Bacon [2018], Callaway and Sant'Anna [2020], etc.). Here are some of the issues we discover in practice when investigating the effects of intermittent treatments, whereby some units are allowed to move in/out of the treatment condition as time progresses.
First, there's the possibility of carryover effects, which is akin to a violation of the stable unit treatment value assumption (SUTVA) along the temporal dimension. In simpler terms, the potential outcome for a unit in the present period could be affected by a previous exposure. It's actually a form of temporal interference. Second, there's little guidance on how we should handle units undergoing a reversal near the end point. In my fake example, country A reverses its policy after the second exposure but never switches back into the treatment condition before the end of the panel. Again, I don't think the estimators you cite can handle this right-censorship problem. The "relative periods" before the next exposure are unknown since country A didn't experience a third intervention. How should we think about a counterfactual for the subset of countries experiencing a reversal, or where some start as treated and then subsequently 'turn off' (i.e., switch from 1 to 0)? Recent work by Imai et al. 2020 and Liu et al. 2021 seem promising. I will try to talk about each in turn.
Imai and colleagues propose a matching estimator (see the PanelMatch
package in R) which allows multiple units to receive treatment at any point in time, and units can switch their treatment status multiple times over the course of the panel. They replicate the work of Acemoglu et al. 2019 which investigate the effects of democracy on economic growth. Below is a treatment variation plot using the data by Acemoglu and colleagues; it actually comes pre-loaded with the PanelMatch
package and can be reproduced with relative ease. I removed the country codes and years. What's most important is to see the variation across units and over time. Red tiles indicate periods where "treatment" is applied (i.e., democratic rule) to a given unit and blue tiles indicate "control" periods (i.e., autocratic rule). I imagine your data closely resembles this plot.

Here, we see a mixture of never-adopter units, always-adopter units, single-adopter units, and multi-adopter units. The estimator used by Imai and colleagues selects matched sets of control units that have an identical treatment history up until time $t$. The sets are further refined by adjusting for previous outcomes $L$ periods in the past and other time-varying covariates, if any. This allows you to partially control for carryover effects. You could also manually specify a future treatment sequence $F$ periods going forward. For example, $F = 0$ represents the contemporaneous effect while $F = 3$ implies the treatment effect on the outcome three time periods after the treatment
is administered. The treated (matched control) observations are those who remain under the treatment (control) condition throughout $F$ time periods after the administration of the treatment whereas the matched control units receive no treatment at least for $F$ time periods after the treatment is given. Your matched sets only include observations from the same time period, which implies exact matching on time period. This may be important in your setting if you suspect time-specific confounders.
The estimators proposed by Liu and colleagues can be found in the fect
package in R (i.e., Fixed Effects Counterfactual Estimators). In particular, they describe three estimators which can handle the rather odd treatment patterns you're observing in your study; they call the recurring on/off treatment exposures a "general pattern treatment structure" in their paper, though I would argue that these patterns are far from "general" in most practical applications. The approach they use takes observations under the treatment condition as missing, use data under the control condition to build models, and impute counterfactuals of treated observations based upon the estimated models. The estimators result in a uniform weighting scheme and relaxes the strict exogeneity assumption. In simulations their estimators perform better than the conventional two-way fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Their package also produces a new dynamic treatment effects plot, along with several neat diagnostic tests, to help researchers gauge the validity of the identifying assumptions. In their working paper they replicate studies with non-uniform exposures, including the intermittent treatment patterns specific to some of your units.
These two R packages seem most applicable to your study. Peruse the associated documentation to understand their functionality and how to apply them to your specific use case. New estimators may emerge in the near future, but I would argue that the routines proposed by Imai and colleagues and Liu and colleagues are the most amendable to pulsating treatment patterns.