Although not within the domain of geostatistics, for question #2, I would casually say the most frequent weighting function used in my field (Criminology) would be a a binary weighting scheme. Although I have rarely seen a good theoretical or empirical argument to use one weighting scheme over another (or how one defines a neighbor in a binary weighting scheme either). It may simply be because of historical preference and convienance that such a scheme is typically used.
There is a distinction that should be drawn between data driven approaches to constructing spatial weights and the theory based approach to deriving spatial weights. You are currently performing the former, and in this approach you are implicitly treating the estimation of spatial weights as a problem of measurement error, and hence should use techniques to validate your measurements (which is considerably complicated due to the endogeneity of the spatial weights). Using a weighting scheme based on some of the chance variation in the data and using it in subsequent causal models is synonymous with other fallacies related to inference and data snooping. Unfortunately I have no good references of spatial weight models validated in any meaningful way besides the extent of the auto-correlation, which to be frank isn't all that convincing of an empirical argument. Spatial dependence can be the result of either causal processes (i.e. the value at one point in space affects the value at another point in space), or it can be the result of other measurement errors (i.e. the measured support of the data do not match the support of the processes that generate those phenomena).
This is oppossed to theory based construction of spatial weights (or "model-driven" in Luc Anselin's terminology), in which one specifies the weight matrix a priori to estimating a model. I did not read the Fauchald paper you cited, but it appears in the abstract they have plausible theoretical explanations for the observed patterns based on some optimal foraging strategy.
For readings I would suggest Luc Anselin's book, Spatial Econometrics: Methods and Models (1988), particularly chapters 2 and 3 will be of most interest. Also as another work with a similar viewpoint to mine (although it will likely be of less interest) is an essay piece by Gary King, "Why context should not count". I would also suggest another paper as it appears they had similar goals to yours, and defined the weights for a lattice system based on variogram estimates (Negreiros, 2010).