As others have suggested, the problem with a model of this form is that having so many tunable parameters is a recipe for over-fitting, and it is very likely that unless the dataset is very large you will get better generalisation performance from a more simple model. I would recommend using a single scale parameters $\gamma$ rather than having one for each datapoint, and using regularisation on the model weights $a_i$ (e.g. ridge regression.
To answer your question directly, it is possible to choose the scale parameters $\gamma_i$ and the centers $c_i$ by gradient descent optimisation of the regularised loss function, using essentially the back-propagation algorithm used in multi-layer perceptron neural networks. ISTR an early paper by Andrew Webb on this topic. However this brings all the problems associated with MLPs, such as local minima and long training times, and losses the main advantage of Radial Basis Function networks, i.e. efficient training procedure. I think Prof. Sheng Chen and co-workers at Southampton University have also been working on algorithms for this sort of thing (IIRC the papers were published in IEEE Transactions Neural Networks). However, I suspect the over-fitting problem is the fatal flaw in this approach and I didn't find the experimental evaluation very convincing.
The usual approach to tuning the regularisation (ridge) and kernel parameters is to minimise the leave-one-out cross-validation error (or GCV), which can be computed efficiently for this sort of model (including gradient information). However, if you have more than two or three parameters to tune this way you are very likely to over-fit the LOOCV model selection criterion, see
G. C. Cawley and N. L. C. Talbot, Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages 841-861, April 2007 (www)
and
G. C. Cawley and N. L. C. Talbot, Over-fitting in model selection and subsequent selection bias in performance evaluation, Journal of Machine Learning Research, 2010. Research, vol. 11, pp. 2079-2107, July 2010. (www)
For choosing the centers, there are a number of ways you can go about it, the most effective way is to choose a subset of the training data, either randomly, or by greedy optimisation of the training criterion, or to greedily span the space of the basis unctions (e.g. Fine and Scheinberg), or the Nystrom method. This problem has been well studied in the machine learning litterature, and a really thorough empirical study would be really useful. However, again, the key problem is that anytime you make a choice about the model or optimise a parameter based on a statistic estimated from a finite sample of data (e.g. a training criterion, a CV estimate or Bayesian marginal likelihood), you invite over-fitting, and the more choices you make, the greater the risk.
The standard approach in RBF neural networks (I think the paper is by Moody and Darken) is to perorm a cluster analysis of the data and place the centers on the cluster centers.
In short, there are a variety of ways this problem can be solved, but it is questionable whether it is going to be a good way to approach a practical application.