Short answers:
1) As you said the difference between the two is only in the spatial structure.
2) A lot of people work to find an equivalent mathematical formulation between the two, especially in the Bayesian framework. See for example the work of Rue of the paper of Lindgren http://www.math.ntnu.no/inla/r-inla.org/papers/spde-jrssb-revised.pdf
3) i do not see how. If you use the spatial autoregressive model, for each point in an area you predict the same value, while with the kriging, the mean on the entire are is the same but not the value of the process, you predict a different value for each point
Long answers:
Obviously, since the two processes are defined by the type of spatial interaction/structure, they are profound different. In the spatial autoregressive model, two spatial points $Y_1$ and $Y_2$ are dependent if they are close in some sense. As an example we can think about two states, what happen to the state $Y_2$ can depend on what happen in $Y_1$ only if they share a border. Generally to specify a spatial autoregressive model you have to specify also a matrix of proximity (or neighborhood). From a computational point of view the spatial autoregressive model is convenient since the covariance matrix between the observations is a sparse matrix and then if we need its inversion this can be done efficiently. On the other hand with this kind of model we are not able to predict the value of the process on some non-observed locations because we can not estimate the correlation between. This kind of models (the auto-model) should be used only for process that are spatially discrete, i.e. its realization can be observed only on specific locations, but they are used also for continuous process.
In the kriging model we suppose that the dependencies is continuous, between $Y_1$ and $Y_2$ the correlation generally depends on the distance between the observations, if they are close the correlation is higher. With this model we are able to make prediction on a new location since we know the value of the correlation between the process on the new location and the observed process (it depends on the distance). On the other hand in this case the covariance matrix is not sparse, unless you use a correlation function that goes to zero after a certain distance, its inversion is computationally intensive.
Since the computational advantage of the spatial autoregressive model (or the auto model in general), some people start to think about the possibility of approximate a continuous process with a discrete one, the paper of Lindgren I linked above is one of the best result in this field.
If you need some book where the autoregressive model and the Kriging are well exlained, i suggest Hierarchical Modeling and Analysis for Spatial Data (http://www.amazon.com/Hierarchical-Modeling-Monographs-Statistics-Probability/dp/158488410X)