I have a pdf say $p(x)$. Now, I apply some transformation (may be linear or non-linear) to the variable $x$ say $g(x)$. Let the new pdf be called $p(y)$. For, a small change in $x$ say $dx$, there will be some change in $g(x)$ or $y$ say $dy$. Since the area under the curve has to be same, $p(x)dx = p(y)dy$.
I was studying Bishop Machine Learning and Pattern Recognition and on page 18, it says under nonlinear change of variable, a pdf transforms differently from a normal function. I think it will also change differently for a linear transformation. Secondly, in the book it says, $$p(y)=p(x)\left| \frac{dx}{dy} \right|$$ I also don't understand the mod. $p(x)$, $p(y)$, $dx$, $dy$ can't be negative.