One heuristic way to look at this is to consider the probability density as a scaled probability by considering an "infinitesimally small" region encompassing a point. For any infinitesimally small distances $\Delta_X > 0$ and $\Delta_Y > 0$ you have:
$$\begin{align}
\Delta_X \times f_X(x) &= \mathbb{P}(x \leqslant X \leqslant x + \Delta_X)
\quad \quad \quad \quad (1) \\[12pt]
\Delta_Y \times f_Y(y) &= \mathbb{P}(y \leqslant Y \leqslant y + \Delta_Y)
\quad \quad \quad \quad \ (2) \\[12pt]
\end{align}$$
Now, suppose we consider a point $y$ where $g^{-1}$ is differentiable. To facilitate our analysis, we will define the infinitesimal quantity $\Delta_X \equiv g^{-1}(y + \Delta_Y) - g^{-1}(y)$. We then have:
$$\begin{align}
f_Y(y)
&= \frac{\mathbb{P}(y \leqslant Y \leqslant y + \Delta_Y)}{\Delta_Y}
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{from } (2) \\[6pt]
&= \frac{\mathbb{P}(y \leqslant g(X) \leqslant y + \Delta_Y)}{\Delta_Y} \\[6pt]
&= \frac{\mathbb{P}(g^{-1}(y) \leqslant X \leqslant g^{-1}(y + \Delta_Y))}{\Delta} \\[6pt]
&= f_X(g^{-1}(y)) \times \frac{g^{-1}(y + \Delta_Y) - g^{-1}(y)}{\Delta_Y}
\quad \quad \quad \quad \text{from } (1) \\[8pt]
&= f_X(g^{-1}(y)) \times \frac{\Delta_X}{\Delta_Y} \\[12pt]
&= f_X(g^{-1}(y)) \times (g^{-1})'(y) \\[12pt]
\end{align}$$
(The step from the third to the fourth line follows from taking $x = y+\Delta_Y$ and applying equation $(2)$ to express the probability as a scaled density.)
Alternatively, letting $\Delta_X$ be the free infinitesimal and defining $\Delta_Y \equiv g(x+\Delta_X) - g(x)$ then we have:
$$\begin{align}
f_X(x)
&= \frac{\mathbb{P}(x \leqslant X \leqslant x + \Delta_X)}{\Delta_X}
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{from } (1) \\[6pt]
&= \frac{\mathbb{P}(g(x) \leqslant g(X) \leqslant g(x + \Delta_X))}{\Delta_X} \\[6pt]
&= \frac{\mathbb{P}(g(x) \leqslant Y \leqslant g(x + \Delta_X))}{\Delta_X} \\[6pt]
&= \frac{\mathbb{P}(g(x) \leqslant Y \leqslant g(x) + \Delta_Y)}{\Delta_Y} \times \frac{\Delta_Y}{\Delta_X} \\[6pt]
&= f_Y(g(x)) \times \frac{\Delta_Y}{\Delta_X}
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \ \ \text{from } (2) \\[8pt]
&= f_Y(g(x)) \times g'(x) \\[12pt]
\end{align}$$
Now, this argument can be tightened to give a formal demonstration of the result, but the heuristic version shows how the derivative term arises. It arises from the fact that the region $[y, y+\Delta_Y]$ for the original random variable $Y$ corresponds to the region $[g^{-1}(y), g^{-1}(y + \Delta_Y)]$ for the random variable $X$. The derivative term is just the ratio of the lengths of the latter region over the length of the former region, when $\Delta_Y$ is small.