In the book pattern recognition and machine learning (formula 1.27), it gives
$$p_y(y)=p_x(x) \left | \frac{d x}{d y} \right |=p_x(g(y)) | g'(y) |$$ where $x=g(y)$, $p_x(x)$ is the pdf that corresponds to $p_y(y)$ with respect to the change of the variable.
The books says it's because that observations falling in the range $(x, x + \delta x)$ will, for small values of $\delta x$, be transformed into the range $(y, y + \delta y)$.
How is this derived formally?
Update from Dilip Sarwate
The result holds only if $g$ is a strictly monotone increasing or decreasing function.
Some minor edit to L.V. Rao's answer $$ \begin{equation} P(Y\le y) = P(g(X)\le y)= \begin{cases} P(X\le g^{-1}(y)), & \text{if}\ g \text{ is monotonically increasing} \\ P(X\ge g^{-1}(y)), & \text{if}\ g \text{ is monotonically decreasing} \end{cases} \end{equation}$$ Therefore if $g$ is monotonically increasing $$F_{Y}(y)=F_{X}(g^{-1}(y))$$ $$f_{Y}(y)= f_{X}(g^{-1}(y))\cdot \frac{d}{dy}g^{-1}(y)$$ if monotonically decreasing $$F_{Y}(y)=1-F_{X}(g^{-1}(y))$$ $$f_{Y}(y)=- f_{X}(g^{-1}(y))\cdot \frac{d}{dy}g^{-1}(y)$$ $$\therefore f_{Y}(y) = f_{X}(g^{-1}(y)) \cdot \left | \frac{d}{dy}g^{-1}(y) \right |$$