Q1: Are there common or accepted methods for dealing with non stationary environment in Reinforcement learning in general?
Most basic RL agents are online, and online learning can usually deal with non-stationary problems. In addition, update rules for state value and action value estimators in control problems are usually written for non-stationary targets, because the targets already change as the policy improves. This is nothing complicated, simply use of a learning rate $\alpha$ in updates when estimating values, effectively a rolling geometric mean as opposed to averaging over all history in an unweighted fashion.
However, this addresses longer-term non-stationarity, such as the problem changing between episodes, or over an even longer time scale. Your description looks more like you wish to change the reward structure based on actions the agent has taken, within a short timescale. That dynamic response to actions is better framed as a different more complex MDP, not as "non-stationarity" within a simpler MDP.
An agent cannot learn changes to the environment that it has not yet sampled, so changing reward structure will not prevent the agent from returning to previously-visited states. Unless you are using something like a RNN in the agent, the agent will not have a "memory" of what happened before in the episode other than whatever is represented in the current state (arguably using a RNN makes the hidden layer of the RNN part of the state). Across multiple episodes, if you use a tabular Q-learning agent, then the agent will simply learn that certain states have low value, it will not be able to learn that second or third visits to the state cause that effect, because it has no way to represent that knowledge. It will not be able to adjust to the change fast enough to learn online and mid-episode.
Q2: In my gridworld, I have the reward function changing when a state is visited. All I want my agent to learn is "Don't go back unless you really need to", however this makes the environment non-stationary.
If that's all you need the agent to learn, perhaps this can be encouraged by a suitable reward structure. Before you can do that, you need to understand yourself what "really need to" implies, and how tight that has to be logically. You may be OK though just by assigning some penalty for visiting any location that the agent has already or recently visited.
Can/Should this very simple rule be incorporated in the MDP model, and how?
Yes, you should add the information about visited locations into the state. This immediately will make your state model more complex than a simple grid world, increasing the dimensionality of the problem, but it is unavoidable. Most real-world problems very quickly outgrow the toy examples provided to teach RL concepts.
One alternative is to frame the problem as a Partially Observable Markov Decision Process (POMDP). In that case the "true" state would still include all the necessary history in order to calculate the rewards (and as this is a toy problem on a computer you would still have to represent it somehow), but the agent can attempt learn from restricted knowledge of the state, just whatever you let it observe. In general this is a much harder approach than expanding the state representation, and I would not recommend it here. However, if you find the idea interesting, you could use your problem to explore POMDPs. Here is a recent paper (from Google's Deep Mind team, 2015) that looks at two RL algorithms combined with RNNs to solve POMDPs.
Q3: I have been looking into Q-learning with experience replay as a solution to dealing with non stationary environments, as it decorrelates successive updates. Is this the correct use of the method or it is more to deal with making learning more data efficient?
Experience replay will not help with non-stationary environments. In fact it could make performance worse in them. However, as already stated, your problem is not really about a non-stationary environment, but about handling more complex state dynamics.
What you may need to do is look into function approximation, if the number of states increases to a large enough number. For instance, if you want to handle any back-tracking and have a complex reward-modifying rule that tracks each visited location, then your state might change from a single location number to a map showing visited locations. So for example it might go from $64$ states for an $8 \times 8$ grid world to a $2^{64}$ state map showing visited squares. This is far too high to track in a value table, so you will typically use a neural network (or a convolutional neural network) to estimate state values instead.
With a function estimator, experience replay is very useful, as without it, the learning process is likely to be unstable. The recent DQN approach for playing Atari games uses experience replay for this reason.