Quoting my other answer regarding feature engineering in general
So while in many cases you could expect from the algorithm to find the
solution, alternatively, by feature engineering you could simplify
the problem. Simple problems are easier and faster to solve, and need
less complicated algorithms. Simple algorithms are often more robust,
the results are often more interpretable, they are more scalable (less
computational resources, time to train, etc.) and portable. [...]
Moreover, don't believe everything the machine learning marketers tell
you. In most cases, the algorithms won't "learn by themselves". You
usually have limited time, resources, computational power, and the data
has usually the limited size and is noisy, neither of these helps.
Yes, deep learning models can learn feature crosses (aka interaction terms) alike features by themselves. However by providing them by yourself, you simplify the problem to be solved, so you can expect it to converge faster. The fact alone that neural networks can learn from nearly raw data does not mean that we should drop all attempts for feature engineering.
A similar argument can be made about unsupervised learning algorithms: if we can learn without labels, why bother? We bother because learning from labeled data is easier, faster, needs simpler algorithms, less data, it's easier to debug, it's less tricky to train since by providing the labels you point the algorithm into the desired direction. The same applies to feature engineering.
A simple example is provided in my other answer referred above, it's learning XOR function from the data. With feature cross, it can be solved using a trivial model, while without you would need to build a much more complicated model (e.g. multi-layer network).