TL;DR: For clustering time-series data, as a baseline, you can try using Dynamic Time Warping (DTW) as a distance metric for your favorite clustering algorithm, say $k$-means.
DTW is commonly used in applications like ASR (automatic speech recognition), and other sequence alignment problems. I believe the Python package tslearn
implements DTW, though I have not used it myself.
What is DTW?
At a very high level, DTW is a measure of how "aligned" two time-series are, so it can capture features like "rise in pitch over time + decay in energy;" i.e. two sequences are "close" if you can stretch/compress a source sequence in the time dimension into the target sequence. To compute this, let's define $x, y$ as sequences of length $n, m$ respectively, and $\pi = [\pi_1, \pi_2, \dots \pi_N]$, where $\pi_k = (i_k, j_k)$, a tuple of indices into $x, y$ respectively. Intuitively, $\pi$ defines a "path" through sequences $x, y$ by pairing up indices. Then you minimize the following objective:
$$ \underset{\pi}{\min} \sqrt{\sum_{(i, j) \in \pi} d(x_i, y_j)^2} \quad\text{subject to}$$
$$ i_{k-1} \leq i_k \leq i_{k-1} + 1,\;\; j_{k-1} \leq j_k \leq j_{k-1} + 1 \quad\text{for}\quad (i_k, j_k) \in \pi, 0 < k \leq N;\\
i_0 = 0, j_0 = 0;\\
i_K = n, j_K = m.$$
where $d$ is some distance metric (say $\ell_1, \ell_2$). The final two constraints are just boundary conditions that specify that the path $\pi$ must align with the beginnings and ends of $x, y$ simultaneously; the top constraint specifies that indices $i_k, j_k$ must be monotonically non-decreasing in $k$.
This schematic from Wikipedia illustrates this quite succinctly:

We can imagine each location where a dashed line originates/terminates as an element of a discrete time-series; these correspond to the various $i_k, j_k$ defined in $\pi$.
What about the curse of dimensionality?
Your intuition that $k$-means runs into the curse of dimensionality is generally correct in practice, and often becomes a problem when you're using Euclidean distance (see this previous answer for more details), but the curse of dimensionality is more nuanced than "more dimensions = bad;" it is important to realize that the curse of dimensionality affects different metrics in different ways. Very informally, DTW is a metric that is less "susceptible" to the same dimensionality problems with squared Euclidean distance, as it better preserves its "meaning" in high dimensional spaces.
More concretely, using nearest-neighbors search as an illustrative example, under reasonable assumptions about your data-generating distribution, data points tend to become uniformly distant from one another in Euclidean distance as your dimensionality (or in the context of time-series, sequence length) increases; i.e., most points are "almost-nearest" neighbors. DTW does not suffer as severely from this problem as sequence length increases. A more mathematical way to gauge the severity of the curse of dimensionality is shown in Section 3.4 of this paper.