Last year, Ilya Sutskever and collaborators came out with a paper about a recurrent LSTM net that learns sequence to sequence mappings for machine translation. It's somewhat surprising that the authors used LSTM instead of Hessian-Free to train this net since the first author was one of the innovators behind the development of Hessian-Free methods for recurrent nets (citation).
I was wondering if anyone has tried Hessian-Free for learning sequence to sequence mappings for machine translation. If so, does it work? Is its performance inferior to LSTM's in some way?