Note that in a proper definition of $k$-means the distance $\Delta$ has to be the Euclidean distance, despite the fact that in some literature these days it is defined using any distance. The reason is that only for the Euclidean distance (or equivalent distances) the means are actually the optimum centroids. You can try to solve the optimisation problem with other distances, but then the centroids would need to be defined differently, and the term $k$-means would no longer be justified. You can also solve the optimisation problem assuming that the centroids are means (which is how your notation looks like), but this will give you an overall suboptimal solution in the non-Euclidean case that could be improved by choosing better centroids, and is therefore not a good method.
Regarding your question, there are three different issues here.
(2a) As there are only finitely many clusterings, one could in principle run through all clusterings and find the optimum solution of the objective function $W_P$, which means that this always exists, although there may be situations in which it is not unique (meaning that two different clusterings may end up with exactly the same value of $W_P$, however with continuous data this will hardly ever happen).
(2b) In practice finding the optimum solution of (2a) may be computationally very hard, so normally (unless the dataset is very small) algorithms are used that only are guaranteed to find a local optimum, which is not necessarily the global one.
(2c) The fact that the optimum of $W_P$ is mathematically guaranteed to exist on a finite data set does not mean that the resulting clustering is "good" in any other respect. The optimisation of $W_P$ defines what, according to $k$-means, a good clustering is, and according to this definition there is always a "best" clustering. However you may be interested in other aspects of clustering such as separation of clusters from other clusters that are not directly taken into account in the definition of $W_P$, and in this respect the "best" $k$-means clustering may indeed not be good. This actually does not only apply to very "messy" datasets, but also to datasets that are intuitively nicely clustered, but where clusters have, for example, strongly different within-cluster variation or are nonlinear. In such situations $k$-means would not be appropriate as a clustering method (at least if you want to find the nice clusters rather than the optimum of $W_P$).