When doing kNN you need to keep one thing in mind, namely that it's not a strictly, mathematically derived algorithm, but rather a simple classifier / regressor based on one intuition - the underlying function doesn't change much when the arguments don't change much. Or in other words the underlying function is locally near-constant. With this assumption, you can estimate the value of underlying function in any given point, by a (possibly weighted) mean of the values of nearest k points.
Keeping this in mind, you can realize there is no clear imperative on what to do when there is no clear winner in majority voting. You can either always use an odd k, or use some injective weighting.
In the case of neighbours 3 to 5 being at the same distance from the point of interest, you can either use only two, or use all 5. Again, keep in mind kNN is not some algorithm derived from complex mathematical analysis, but just a simple intuition. It's up to you how you want to deal with those special cases.
When it comes to weighting, you base your algorithm on the intuition that function doesn't change much when arguments don't change much. So you want to give bigger weights to points that are closer to point of interest. A good weighting would be for example $\frac{1}{||x-y||^2}$, or any other that is relatively big when distance is small, and relatively small when distance between points is big (so probably an inverse of some continuous metric function).
There has also been a nice paper by Samory Kpotufe and Abdeslam Boularias this year on NIPS touching on the issue of finding the right weighting. Their general intuition, is that the underlying function varies differently in different directions (i.e., its different partial derivatives are of different magnitude), hence it would be wise to in some sense change the metrics / weighting according to this intuition. They claim this trick generally improves performance of kNN and kernel regression, and I think they even have some theoretical results to back up this claim (although I'm not sure what do those theoretical results actually claim, I didn't have time to go through the whole paper yet). The paper can be downloaded for free from their sites, or after Googling "Gradient Weights help Nonparametric Regressors". Their research is directed towards regression, but I guess it applies to classification to some extent as well.
Now, you will probably want to know how you can find the right k, metric, weighting, action to perform when there are draws and so on. The sad thing is, that it's basically hard to arrive at the right hyperparameters after some deep thinking, you will probably need to test different bunches of hyperparameters and see which ones work well on some validation set. If you have some computational resources, and want to arrive at the right parameters automatically at a good set of hyperparameters, there is a recent idea (that I like very much) to use Gaussian processes for derivative-free optimization in that setting.
Let me elaborate - finding the set of hyperparameters (i.e., that minimize error on validation data), can be viewed as an optimization problem. Unfortunately, in this setting we can't get the gradient of the function we try to optimize (which is what we usually want to do, to perform gradient descent or some more advanced methods). Gaussian processes can be used in this setting, for finding sets of hyperparameters, that have big chances, to perform better than the best ones we have found up to the point. Hence you can iteratively run the algorithm with some set of hyperparameters, then ask the Gaussian process for which ones would be best to try next, try those ones, and so on.
For details, look for paper "Practical Bayesian Optimization of Machine Learning Algorithms" by Jasper Snoek, Hugo Larochelle and Ryan P Adams (also to be found on either their websites or via Google).