If you're fitting only nominal categorical predictors (and models of full order), the link function will be of essentially no consequence --- in the sense that it doesn't alter the fit.
Here's an example using log and identity links with a Poisson glm. First the data (y is the response, a count, and x1f and x2f have the levels of the factors):
y 157 909 249 144 876 248 34 205 62 26 243 48
x1f 1 1 1 1 1 1 2 2 2 2 2 2
x2f 1 2 3 1 2 3 1 2 3 1 2 3
Here's the fitted values for the full model with interaction:
fitted(glm(y ~ x1f+x2f+x1f:x2f, family=poisson(link="log")))
1 2 3 4 5 6 7 8 9 10 11 12
150.5 892.5 248.5 150.5 892.5 248.5 30.0 224.0 55.0 30.0 224.0 55.0
fitted(glm(y ~ x1f+x2f+x1f:x2f, family=poisson(link="identity")))
1 2 3 4 5 6 7 8 9 10 11 12
150.5 892.5 248.5 150.5 892.5 248.5 30.0 224.0 55.0 30.0 224.0 55.0
We see the fit didn't change even though the link function did.
If you're fitting categorical models which leave some interactions out (such as a main effects only model), then the link function can matter, because under some link functions, those interactions may indeed disappear (leaving the smaller model suitable and more easily interpreted) --- but then those simpler, additive models won't be suitable for other link functions.
Continuing the earlier example, omitting the interactions:
fitted(glm(y ~ x1f+x2f, family=poisson(link="log")))
1 2 3 4 5 6 7 8
145.65183 900.94330 244.90487 145.65183 900.94330 244.90487 34.84817 215.55670
9 10 11 12
58.59513 34.84817 215.55670 58.59513
fitted(glm(y ~ x1f+x2f, family=poisson(link="identity")))
1 2 3 4 5 6 7 8
238.72879 618.67616 268.07978 238.72879 618.67616 268.07978 21.90564 401.85300
9 10 11 12
51.25663 21.90564 401.85300 51.25663
Now we see the fitted values are indeed different. In this case, the log link gives a reasonable fit, but the identity link gives quite a poor fit.
If you're fitting continuous predictors, then it may matter quite a bit --- even ignoring the issue of interactions. One example would be with binomial GLMs --- in many cases, the fit with a probit and a cloglog link can look quite different, even though they both have $g$ taking $(0,1)$ to $\mathbb{R}$.
How much it might matter really depends on the specifics of the problem and your tolerance of deviation.
In many cases ease of interpretation matters more than differences in fit (at least where those differences tend to be small), but you have competition between how easy the link function is to deal with and how interpretable the linear predictor is, and you also have the issue of potential lack of fit: if your curve relating the mean of your binomial variates to the predictor(s) isn't symmetric, it might be much more interpretable to choose a more suitable link than to expand the model class.