I completely concur with Sycorax's comment that Adrian Raftery's 1988 Biometrika paper is the canon on this topic.
- How to derive analytically the negative log-likelihood (and its
first-order conditions)?
The likelihood is the same whether or not $n$ is unknown:
$$L(n|y_1,\ldots,y_I)=\prod_{i=1}^I {n \choose y_i}p^{y_i}(1-p)^{n-y_i}
\propto \dfrac{(n!)^I(1-p)^{nI}}{\prod_{i=1}^I(n-y_i)!}$$
and the log-likelihood is the logarithm of the above
$$\ell(n|y_1,\ldots,y_I)=C+I\log n!-\sum_{i=1}^I \log (n-y_i)!+nI\log(1-p) $$
Maximum likelihood estimation of $n$ is covered in this earlier answer of mine and by Ben.
- What is an uninformative prior for $n$ in this case (e.g., for $p$ one
can use a Uniform$(0,1)$)?
Note that the default prior on $p$ is Jeffreys' $\pi(p)\propto 1/\sqrt{p(1-p)}$ rather than the Uniform distribution. In one's answer in the Bernoulli case, kjetil b halvorsen explains why using a Uniform improper prior on $n$ leads to the posterior being decreasing quite slowly (while being proper) and why another improper prior like $\pi(n)=1/n$ or $\pi(n)=1/(n+1)$ has a more appropriate behaviour in the tails. This is connected to the fact that $n$, while being an integer, is a scale parameter in the Bernoulli distribution, in the sense that the random variable $Y\sim\mathcal B(n,p)$ is of order $\mathrm O(n)$. Scale parameters are usually modeled by priors like $\pi(n)=1/n$ (even though I refer you to my earlier answer as to why there is no such thing as a noninformative prior).
- Is there a conjugate prior for $n$?
Since the collection of $\mathcal B(n,p)$ distributions is not an exponential family when $n$ varies, since its support depends on $n$, there is no conjugate prior family.
- What if the prior on $n$ is improper, i.e. discrete prior on
$\{y_\max,\mathbb N\}⊂\mathbb N$? Is there a proper solution?
It depends on the improper prior. The answer by kjetil b halvorsen in the Bernoulli case shows there exist improper priors leading to well-defined posterior distributions. And there also exist improper priors leading to non-defined posterior distributions for all sample sizes $I$. For instance, $\pi(n)\propto\exp\{\exp(n)\}$ should lead to an infinite mass posterior.