In the standard textbook RL setting we usually use the MDP framework where we assume that the current state is independent of the the whole history given the previous state. Obviously, in real life this is not always a valid assumption and can often become a reason for an RL algorithm failing in a specific environment. Yet, the majority of current RL research assumes the Markov property. Why is that?
EDIT: I am aware of higher-order MDPs as mentioned in the comments. My question was more related to what is currently being done in practice by state-of-the-art RL algorithms. For example, DDPG with non-image observations (i.e. low-level observation such as torque, acceleration, etc) considers only the last observation (without any observation augmentation). DQN applied on Atari and its derivatives indeed use several previous images, but the main reason is to infer velocities and movement of pixels (i.e. make the image observations equivalent to the low-level observations mentioned earlier).
Indeed, the trick of applying observation augmentation is used sometimes, but still very rarely. Also, the number of previous states considered is often very small and manually tuned. But apart from empirical testing, how do we know that using a big number, say 50, is not the better choice (putting aside the computational complexity of using 50 images as input to a NN). Furthermore, these models do not really account for the actions that were previously taken. I guess what I am trying to ask is why are we not trying to use some more automated approach for determining these dependencies, for example something like LSTM (apart from the fact that training such model becomes more difficult)?