According to Wikipedia on Simulated Annealing,
For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended.
I looked up the paper, Simulated Annealing: A Proof of Convergence, but unfortunately I didn't really understand how it related too well--it seemed to discuss a specific type of problem. I tried the experiment myself on the Four peaks problem, always starting from T=1E11 and as I upped the number of iterations, increased the cooling rate accordingly (cooling rate would be given by a number $0<c<1$, and finding the next T would just be c*T at each iteration). However, even with this, SA always converged to an inferior local optimum. I used the standard Boltzmann distribution probability.
So my question is, is the Wikipedia quote correct? If so, how can I decide on what the cooling schedule should be?