At the moment your model is not clearly defined, so it is premature to seek the distribution of the count vector $X$. You are seeking to generalise from the multinomial distribution, which is the distribution of count values over categories for an underlying sequence of independent categorical random variables with a specified probability vector. In your generalisation you say you want to impose maximums on each of the counts, but you also want to keep the specified probabilities used in the multinomial distribution. (You also mis-specify the form of the probabilities; with your formula the elements of the probability vector don't add to one.)
In order to make your model well-defined you will need to fix the form of your probability vector so that its elements add to one. (Presumably you meant to have an additional element $p_0 = 1/(1+\sum_k \exp(v_k))$.) Let $\mathbf{k} = (k_0,...,k_J)$ be the imposed maximum counts across the categories, and let $\dot{k} \equiv \sum_j k_j$ be the total maximum count. (I will use lower case here since this is assumed to be fixed rather than random.) You then have a sequence of values $W_1,...,W_{\dot{k}}$ such that:
$$\mathbb{P}(W_i = j|W_1,...,W_{i-1}) = \frac{\exp(v_j) \cdot I_{i,j}}{I_{i,0} + \sum_j \exp(v_j) \cdot I_{i,j}},$$
where $X_j(i) \equiv \sum_{r=1}^{i} \mathbb{I}(W_i = j)$ are the category counts after the $i$th value is selected, and the indicator $I_{i,j}$ is defined as $I_{i,j} \equiv \mathbb{I} ( X_j(i) < k_j )$. Notably, after observing $\dot{k}$ values from this model, the category counts must necessarily be at the maximums, so $\mathbf{X}(\dot{k}) = \mathbf{k}$.
This is a complicated generalisation of the multinomial model, with the latter occurring in the special case where $\mathbf{k} = (\infty,...,\infty)$, so that there are no maximums on the category counts.
Writing the distribution of $\mathbf{X}$: The count vector $\mathbf{X}$ is a function of the underlying categorical variables $W_1,...,W_N$. Denote this function by $T:\mathbf{W} \mapsto \mathbf{X}$. For any given $N \leqslant \dot{k}$ we define the set of $N$ category vectors:
$$\mathscr{S}(\mathbf{x}, \mathbf{k}) \equiv
\Bigg\{ (w_1,...,w_N) \in \{ 0,...,J \}^N \Bigg| T(\mathbf{w}) = \mathbf{x} \leqslant \mathbf{k} \Bigg\}.$$
The set $$ gives us the set of all vectors $w_1,...,w_N$ that lead to an admissible count vector $\mathbf{x}$ (admissible in the sense that none of the counts are above the specified maximum).
$$\begin{equation} \begin{aligned}
\mathbb{P}(\mathbf{X} = \mathbf{x})
&= \mathbb{P}(\mathbf{W} \in \mathscr{S}(\mathbf{x}, \mathbf{k})) \\[6pt]
&= \sum_{\mathbb{w} \in \mathscr{S}(\mathbf{x}, \mathbf{k})} \mathbb{P}(\mathbf{W} = \mathbf{w}) \\[6pt]
&= \sum_{\mathbb{w} \in \mathscr{S}(\mathbf{x}, \mathbf{k})}
\prod_{i=1}^N \mathbb{P}(W_i = w_i|W_1 = w_1,...,W_{i-1} = w_{i-1}) \\[6pt]
&= \sum_{\mathbb{w} \in \mathscr{S}(\mathbf{x}, \mathbf{k})}
\prod_{i=1}^N \frac{\exp(v_{w_i}) \cdot I_{i,{w_i}}}{I_{i,0} + \sum_j \exp(v_j) \cdot I_{i,j}}. \\[6pt]
\end{aligned} \end{equation}$$
As you can see, this is a horrendously complicated expression. Unless $N$ and $J$ are both very small, the summation in this expression will get very complicated indeed. Technically this is a closed form expression, since it is a finite sum of terms involving elementary operations. However, the expression is complicated heavy by the fact that it involves a sum over a combinatorial set that is hard to construct, and the summand involves a product of terms that each key of counts from the previous elements of the category vector.
I am unaware of any way this can be simplified further. In view of this, there does not appear to me to be any simple closed form expression for the distribution of the count vector $\mathbf{X}(N)$ (except for the trivial case where $N=\dot{k}$). You would therefore need to generate the distribution of the count vector by simulation from the model. It is unlikely that there can be any efficiency gain beyond simulating the underlying values in the model from the above categorical distributions.